Home An optimal transport-based characterization of convex order
Article Open Access

An optimal transport-based characterization of convex order

  • Johannes Wiesel EMAIL logo and Erica Zhang
Published/Copyright: October 18, 2023
Become an author with De Gruyter Brill

Abstract

For probability measures μ , ν , and ρ , define the cost functionals

C ( μ , ρ ) sup π Π ( μ , ρ ) x , y π ( d x , d y ) and C ( ν , ρ ) sup π Π ( ν , ρ ) x , y π ( d x , d y ) ,

where , denotes the scalar product and Π ( , ) is the set of couplings. We show that two probability measures μ and ν on R d with finite first moments are in convex order (i.e., μ c ν ) iff C ( μ , ρ ) C ( ν , ρ ) holds for all probability measures ρ on R d with bounded support. This generalizes a result by Carlier. Our proof relies on a quantitative bound for the infimum of f d ν f d μ over all 1-Lipschitz functions f , which is obtained through optimal transport (OT) duality and the characterization result of OT (couplings) by Rüschendorf, by Rachev, and by Brenier. Building on this result, we derive new proofs of well known one-dimensional characterizations of convex order. We also describe new computational methods for investigating convex order and applications to model-independent arbitrage strategies in mathematical finance.

MSC 2010: 60G46; 60G42; 91G80

1 Introduction and main result

Fix two probability measures μ , ν P ( R d ) with

x μ ( d x ) < and y ν ( d y ) < .

Recall that μ and ν are in convex order (denoted by μ c ν ) iff

f d μ f d ν for all convex functions f : R d R .

As any convex function is bounded from below by an affine function, the aforementioned integrals take values in ( , ] . The notion of convex order is very well studied (see, e.g., [5,26,32,35] and the references therein for an overview). It plays a pivotal role in mathematical finance since [36] established that μ c ν if and only if ( μ , ν ) – the set of martingale laws on R d × R d with marginals μ and ν – is non-empty. This result is also the reason why convex order has taken the center stage in the field of martingale optimal transport (OT) (see, e.g., [24,6,7,12,18,19,21,23,28] and references therein). Furthermore, convex order plays a pivotal role in dependence modeling and risk aggregation (see, e.g., [8,17,34,37,40]). Over the past 50 years or so, several other properties of convex order and extensions of [36] have been found. To mention just a few, [15,16] give an equivalent condition of convex order of μ and ν . This condition is based on the so-called fusions of probability measures and is most instructive in the case of finitely supported measures μ and ν . Furthermore, [25, Section 4] give a constructive proof of Strassen’s theorem in the univariate case.

While there is an abundance of explicit characterizations of convex order available in one dimension (i.e., d = 1 ) – (see, e.g., [35, Chapter 3]), the case d > 1 seems to be less studied to the best of our knowledge. The main goal of this article is to fill this gap: we discuss a characterization of convex order that holds in general dimensions, and is based on the theory of OT. OT goes back to the seminal works of [24] and [22]. It is concerned with the problem of transporting probability distributions in a cost-optimal way. We refer to [31] and [38,39] for an overview. For this study, we only need a few basic concepts from OT. Most importantly, we will need the cost functionals

C ( μ , ρ ) sup π Π ( μ , ρ ) x , y π ( d x , d y ) and C ( ν , ρ ) sup π Π ( ν , ρ ) x , y π ( d x , d y ) .

Here, Π ( μ , ν ) denotes the set of probability measures on R d × R d with marginals μ and ν . Our main result is the following:

Theorem 1.1

Assume that μ , ν P ( R d ) have finite first moments. Then,

(1) inf f C 1 ( R d ) f d ν f d μ = inf ρ P 1 ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) ,

where

P 1 ( R d ) { ρ P ( R d ) : supp ( ρ ) B 1 ( 0 ) }

and B 1 ( 0 ) denotes the unit ball in R d , as well as

C 1 ( R d ) { f : R d R convex , 1 -Lipschitz } .

Theorem 1.1 states that convex order of μ and ν is equivalent to an order relation C ( , ) on the space of probability measures. Contrary to standard characterizations of convex order using potential functions or cumulative distribution functions, it holds in any dimension and can be seen as a natural generalization of the following result:

Corollary 1.2

Denote the 2-Wasserstein metric by:

W 2 ( μ , ν ) inf π Π ( μ , ν ) x y 2 π ( d x , d y ) .

If μ and ν have finite second moment, then they are in convex order if and only if

(2) W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 y 2 ν ( d y ) x 2 μ ( d x )

holds for all probability measures ρ on R d with bounded support.

Corollary 1.2 has an interesting history. To the best of our knowledge, it was first stated in [11] for compactly supported measures μ and ν . The proof in [11] relies on a well known connection between convex functions and OT for the squared Euclidean distance due to [33] and [10] together with a certain probabilistic first-order condition (see [11, Proposition 1]).

Interestingly, Carlier’s result does not seem to be very well known in the literature on stochastic order. We conjecture that this is mainly due to his use of the French word “balayée” instead of convex order, so that the connection is not immediately apparent. For this reason, one aim of this note is to popularize Carlier’s result, making it accessible to a wider audience, while simultaneously showcasing potential applications. As it turns out, Corollary 1.2 is at least partially known to the mathematical finance community: indeed, the “only if” direction of Corollary 1.2 was rediscovered in [4, Equation (2.2)] for (not necessarily compactly supported) probability measures μ and ν with finite second moments. Recently, Carlier’s result has also been used in [14] for statistical estimation of convex order for compactly supported probability measures with the so-called input convex maxout neural networks. This note differs from Carlier’s work in three aspects: first, as convex order is classically embedded in P 1 ( R d ) – the space of probability measures on R d with finite first moments – and does not require moments of higher order or compact support assumptions (see, e.g., [27]), Theorem 1.1 is simultaneously more concise and arguably more natural than Corollary 1.2. Second, our proof of Theorem 1.1 (and thus also Corollary 1.2) follows a different route than Carlier’s original proof, who argues purely on the space probability measures (i.e., the “primal side” in OT). Instead, we directly work with the result of [33] and [10] – in particular, the classical OT duality. Finally, we discuss three implications of Theorem 1.1: we first give a proof of a characterization of convex order in one dimension through quantile functions. Then, we use Theorem 1.1 to derive new computational methods for testing convex order between μ and ν . For the computation, we exploit state-of-the-art computational OT methods, which are efficient for potentially high-dimensional problems. These have recently seen a spike in research activity. We refer to [29] for an overview. Finally, we discuss the applications of Theorem 1.1 to the theory of the so-called model-independent arbitrages (see [1, Definition 1.2]).

This article is structured as follows: in Section 2, we state examples and consequences of Theorem 1.1. In particular, we connect it to some well known results in the theory of convex order. The proof of the main results is given in Section 3. Sections 4 and 5 discuss numerical and mathematical finance applications of Theorem 1.1, respectively. Remaining proofs are collected in Section 6.

2 Discussion and consequences of main results

To sharpen intuition, let us first discuss the case d = 1 . By Theorem 1.1, we can obtain a new proof of a well known representation of convex order on the real line (see, e.g., [35, Theorem 3.A.5]). Here, we denote the quantile function of a probability measure μ by:

F μ 1 ( x ) inf { y R : μ ( ( , y ] ) x } .

Corollary 2.1

For d = 1 , we have

μ c ν 0 x [ F μ 1 ( y ) F ν 1 ( y ) ] d y 0

for all x [ 0 , 1 ] , with equality for x = 1 .

The proofs of all results of this section are collected in Section 6. We continue with general d N and give a geometric interpretation of Corollary 1.2 by restating it as follows: μ c ν holds iff

(3) W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 W 2 ( ν , δ z ) 2 W ( μ , δ z ) 2

for all ρ P ( R d ) with bounded support, where δ z , z R d is a Dirac measure. Indeed, varying ρ over Dirac measures in (2) implies that the means of μ and ν have to be equal; equation (3) then follows from simple algebra. This implies, in particular, that the difference between squared Wasserstein cost from ν and μ to ρ is maximized at the point masses. More generally, it is natural to ask if one can restrict the class of ρ P 1 ( R d ) to a subclass with a more tractable representation. This is true in d = 1 , where it can be easily checked from Corollary 2.1 that it is enough to consider ρ = x δ 0 + ( 1 x ) δ 1 for x [ 0 , 1 ] . For d > 1 , this question is beyond the scope of this article and we leave it for future research. Theorem 1.1 can also be reformulated as: μ c ν iff

sup π Π ( μ , ρ ) x , z π ( d x , d z ) sup π Π ( ν , ρ ) y , z π ( d y , d z ) ,

i.e., for any ρ P ( R d ) with bounded support, the maximal covariance between μ and ρ is less than the one between ν and ρ . This provides a natural intuition for a classical pedestrian description of convex order, namely, that “ ν being more spread out than μ .”

We next give a simple example for Corollary 1.2.

Example 2.2

Let us take μ = δ 0 and ν with mean zero. Now, recalling (4) and bounding W 2 ( ν , ρ ) from above by choosing the product coupling, we obtain that for any ρ with finite second moment,

W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 = W 2 ( ν , ρ ) 2 x 2 ρ ( d x ) y 2 ν ( d y ) 2 x , y ν ( d x ) ρ ( d y ) = y 2 ν ( d y ) = y 2 ν ( d y ) x 2 μ ( d x ) .

In conclusion, we recover the well known fact δ 0 c ν .

We now state two direct corollaries of Corollary 1.2. We consider the cost c ( x , y ) x y 2 2 and recall that a function f is c -concave, if

f ( x ) = inf y R d ( g ( y ) c ( x , y ) )

for some function g : R d R . We then have the following:

Corollary 2.3

We have

g d ν g d μ for a l l c -concave f u n c t i o n s g : R d R

if and only if

W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 for a l l ρ P ( R d ) with c o m p a c t s u p p o r t .

Corollary 1.2 also directly implies the following well known result:

Corollary 2.4

If μ c ν , then

W 2 ( μ , ν ) 2 y 2 ν ( d y ) x 2 μ ( d x ) .

In particular, μ c ν implies

sup π Π ( μ , ν ) x , y π ( d x , d y ) x 2 μ ( d x ) .

3 Proof of Theorem 1.1 and Corollary 1.2

Let us start by setting up some notation. We denote the scalar product on R d by , . We write for the Euclidean norm on R d . The ball in R d around x of radius r > 0 will be denoted by B r ( x ) .

In order to keep this article self-contained, we summarize some properties of OT at the beginning of this section (see [38, Chapter 2.1] for a more detailed treatment). First we set

P p ( R d ) ρ P ( R d ) : x p μ ( d x ) <

for all p 1 . By definition, we have for any ρ P 2 ( R d ) that

(4) W 2 ( μ , ρ ) 2 = x 2 μ ( d x ) + y 2 ρ ( d y ) 2 sup π Π ( μ , ρ ) x , y π ( d x , d y ) .

In this section, we thus (re-)define the cost function c ( x , y ) x , y and recall that the convex conjugate f * : R d R { } of a function f : R d R is given by:

f * ( y ) sup x R d ( y , x f ( x ) ) .

The subdifferential of a proper convex function f : R d R { } is defined as:

f ( x ) { y R d : f ( x ) f ( x ) y , x x for all x R d } .

It is non-empty if x belongs to the interior of the domain of f . We have

(5) f ( x ) + f * ( y ) x , y = 0 y f ( x ) .

Finally, we recall the OT duality (see [33, (12)])

(6) C ( μ , ρ ) = sup π Π ( μ , ρ ) x , y π ( d x , d y ) = inf f g c f d μ + g d ρ = inf f g c , f , g proper, convex f d μ + g d ρ

and the existence of an optimal pair ( f , f * ) of (lower semicontinuous, proper) convex conjugate functions (see, e.g., [33, proof of Theorem 1]). Replacing μ by ν in the aforementioned display, we obtain a similar duality for C ( ν , ρ ) .

3.1 Proof of Theorem 1.1

We now prove Theorem 1.1. As μ , ν P 1 ( R d ) and ρ P 1 ( R d ) , the domain of the optimizing potential f for C ( μ , ρ ) (resp. C ( ν , ρ ) ) is R d in this case. We write

f sup x R d sup y f ( x ) y .

Proof of Theorem 1.1

As μ and ν have finite first moment and ρ is compactly supported, C ( μ , ρ ) , C ( ν , ρ ) < follows from Hölder’s inequality. We now fix ρ P 1 ( R d ) and take an optimal convex pair ( f ˆ , g ˆ ) in (6) for C ( ν , ρ ) . Next, we apply [33, Theorem 1], which states that ρ = f ˆ ( x ) * ν .[1] Furthermore, as supp ( ρ ) B 1 ( 0 ) , we conclude f ˆ 1 and

C ( ν , ρ ) C ( μ , ρ ) f ˆ d ν + g ˆ d ρ f ˆ d μ + g ˆ d ρ = f ˆ d ν f ˆ d μ inf f C 1 ( R d ) f d ν f d μ .

Taking the infimum over ρ P 1 ( R d ) shows that

inf ρ P 1 ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) inf f C 1 ( R d ) f d ν f d μ .

On the other hand, fix f C 1 ( R d ) and set g f * . Define ρ ˆ f * μ and note that ρ ˆ P 1 ( R d ) . Then, again by [33, Theorem 1], we obtain optimality of the pair ( f , g ) for C ( μ , ρ ˆ ) , and thus,

f d ν f d μ = f d ν + g d ρ ˆ g d ρ ˆ + f d μ C ( ν , ρ ˆ ) C ( μ , ρ ˆ ) inf ρ P 1 ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) .

Taking the infimum over f C 1 ( R d ) shows

inf f C 1 ( R d ) f d ν f d μ inf ρ P 1 ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) .

This concludes the proof.□

3.2 Proof of Corollary 1.2

We now detail the proof of Corollary 1.2. We start with a preliminary result, which is an immediate corollary of Theorem 1.1.

Corollary 3.1

Assume μ , ν P 1 ( R d ) . Then, we have

(7) inf f c o n v e x f d ν f d μ = inf ρ P ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) ,

where P ( R d ) denotes the set of probability measures with bounded support. In particular,

f d μ f d ν for a l l c o n v e x f u n c t i o n s f : R d R

if and only if

C ( μ , ρ ) C ( ν , ρ ) for a l l ρ P ( R d ) .

Proof

Multiplying both sides of (1) by k > 0 yields

inf f C k ( R d ) f d ν f d μ = inf ρ P k ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) ,

with the definitions

P k ( R d ) = { ρ P ( R d ) : supp ( ρ ) B k ( 0 ) }

and

C k ( R d ) { f : R d R convex , f k } .

Taking k , we obtain

inf f convex, Lipschitz f d ν f d μ = inf ρ P ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) .

Finally, any convex function f : R d R can be approximated in a pointwise sense by convex Lipschitz functions. Thus,

inf f convex f d ν f d μ = inf ρ P ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) .

The claim thus follows.□

Remark 3.2

If μ , ν P p ( R d ) for some p 1 , then by Hölder’s inequality and density of finitely supported measures in the q -Wasserstein space, we also obtain

inf f convex f d ν f d μ = inf ρ P q ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) ,

where 1 p + 1 q = 1 .

Proof of Corollary 1.2

Recall from (4) that

C ( μ , ρ ) = 1 2 x 2 μ ( d x ) + z 2 ρ ( d z ) W 2 ( μ , ρ ) 2 , C ( ν , ρ ) = 1 2 y 2 ν ( d y ) + z 2 ρ ( d z ) W 2 ( ν , ρ ) 2 .

Combining this with (7) from Corollary 3.1 yields

inf f convex f d ν f d μ = inf ρ P ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) ) = 1 2 inf ρ P ( R d ) y 2 ν ( d y ) + z 2 ρ ( d z ) W 2 ( ν , ρ ) 2 x 2 μ ( d x ) z 2 ρ ( d z ) + W 2 ( μ , ρ ) 2 = 1 2 inf ρ P ( R d ) W 2 ( μ , ρ ) 2 W 2 ( ν , ρ ) 2 + y 2 ν ( d y ) x 2 μ ( d x ) .

Thus,

f d μ f d ν for all convex functions f : R d R inf f convex f d ν f d μ 0 inf ρ P ( R d ) ( W 2 ( μ , ρ ) 2 W 2 ( ν , ρ ) 2 ) x 2 μ ( d x ) y 2 ν ( d y ) sup ρ P ( R d ) ( W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 ) y 2 ν ( d y ) x 2 μ ( d x ) .

The claim follows.□

4 Numerical examples

In this section, we illustrate Theorem 1.1 numerically. We focus on the following toy examples, where convex order or its absence is easy to establish.

Example 4.1

μ = N ( 0 , σ 2 I ) and ν = N ( 0 , I ) for σ 2 [ 0 , 2 ] for d = 1 , 2 .

Example 4.2

d = 2 and μ = N ( 0 , I ) and ν = N ( 0 , Σ ) for

Σ = 2 1 + s 1 + s 2

for s [ 1 , 1 ] .

Example 4.3

μ = 1 2 ( δ 1 s + δ 1 + s ) and ν = 1 2 ( δ 1 + δ 1 ) for s [ 1 , 1 ] .

Example 4.4

μ = 1 4 ( δ ( 1 s , 0 ) + δ ( 1 + s , 0 ) + δ ( 0 , 1 + s ) + δ ( 0 , 1 s ) )

and

ν = 1 4 ( δ ( 1 , 0 ) + δ ( 1 , 0 ) + δ ( 0 , 1 ) + δ ( 0 , 1 ) )

for s [ 1 , 1 ] .

A general numerical implementation for testing convex order of the two measures μ , ν P 1 ( R d ) in general dimensions and the examples discussed here can be found in the Github repository https://github.com/johanneswiesel/Convex-Order. In the implementation, we use the POT package (https://pythonot.github.io) to compute OT distances.

Let us set

V ( μ , ν ) inf ρ P 1 ( R d ) ( C ( ν , ρ ) C ( μ , ρ ) )

and note that by Theorem 1.1, we have the relationship

μ c ν V ( μ , ν ) 0 .

Clearly, the computation of V ( μ , ν ) hinges on the numerical exploration of the convex set of probability measures P 1 ( R d ) . We propose two methods for this: our first method only considers finitely supported measures ρ , which are dense in P 1 ( R d ) in the Wasserstein topology. It relies on the Dirichlet distribution on the space R g 1 , g N , with density

f ( x 1 , , x g ; α 1 , , α g ) = 1 B ( α ) i = 1 g x i α i 1

for x 1 , , x g [ 0 , 1 ] satisfying i = 1 g x i = 1 . Here, α 1 , , α g > 0 , α ( α 1 , , α g ) and B ( α ) denotes the Beta function. Fixing g grid points { k 1 , , k g } in B 1 ( 0 ) , we consider any realization of a Dirichlet random variable ( X 1 , , X g ) as a probability distribution assigning probability mass X i to the grid point k i , i { 1 , , g } . This leads to the following algorithm:

Algorithm 1. Basic algorithm for the indirect Dirichlet method
Input: probability measures μ , ν , maximal number of evaluations N , number of grid points g
Output: V ( μ , ν )
Generate a grid G of B 1 ( 0 ) of g equidistant points, and consider a Dirichlet random variable modeling ρ supported on G . Use Bayesian optimization to solve
inf [ C ( ρ , ν ) C ( ρ , μ ) ]
over the set of Dirichlet distributions on R g 1 . Terminate after N steps.
return inf [ C ( ρ , ν ) C ( ρ , μ ) ] .

The main computational challenge in Algorithm 1 is the efficient evaluation of C ( ρ , ν ) and C ( ρ , μ ) . For this, we aim to write C ( ρ , ν ) and C ( ρ , μ ) as linear programs. We offer two different variants of Algorithm 1:

  • Indirect Dirichlet method with histograms: if we have access to finitely supported approximations a and b of μ and ν , respectively, and the measure ρ is supported on G as mentioned earlier, then we solve the linear programs C ( a , ρ ) and C ( b , ρ ) as is standard in OT theory.

  • Indirect Dirichlet method with samples: here, we draw a number of samples from μ and ν , respectively, and denote the respective empirical distributions of these samples by a and b . As before, we assume that we have access to a probability measure ρ supported on G . We then solve the linear programs C ( a , ρ ) and C ( a , ρ ) .

An alternative to Algorithm 1 is to directly draw samples from a distribution ρ P 1 ( R d ) . We call this the direct randomized Dirichlet method (see Algorithm 2).

Algorithm 2. Direct randomized Dirichlet method
Input: probability measures μ , ν , maximal number of evaluations N
Output: V ( μ , ν )
Draw samples from μ and ν and denote the empirical distributions of these samples by a and b , respectively. Draw samples from a Dirichlet distribution and randomize their signs, under the constraint that the empirical distribution ρ of these samples is an element of P 1 ( R d ) . Use Bayesian optimization to solve
inf [ C ( ρ , ν ) C ( ρ , μ ) ]
over the set of these distributions. Terminate after N steps.
return minimal value of inf [ C ( ρ , μ ) C ( ρ , ν ) ] .

We refer to the Github repository for a more detailed discussion, in particular for the implementation and further comments. For each example stated at the beginning of this section and each pair ( μ , ν ) , we plot V ( μ , ν ) for the three methods discussed earlier (see Figures 1, 2, 3).

Figure 1 
               Values of different estimators of 
                     
                        
                        
                           V
                           
                              (
                              
                                 μ
                                 ,
                                 ν
                              
                              )
                           
                        
                        V\left(\mu ,\nu )
                     
                   plotted against 
                     
                        
                        
                           σ
                        
                        \sigma 
                     
                   for Example 4.1. Both plots use 
                     
                        
                        
                           N
                           =
                           100
                        
                        N=100
                     
                   samples.
Figure 1

Values of different estimators of V ( μ , ν ) plotted against σ for Example 4.1. Both plots use N = 100 samples.

Figure 2 
               Values of different estimators of 
                     
                        
                        
                           V
                           
                              (
                              
                                 μ
                                 ,
                                 ν
                              
                              )
                           
                        
                        V\left(\mu ,\nu )
                     
                   plotted against 
                     
                        
                        
                           s
                        
                        s
                     
                   for Example 4.2. The plot uses 
                     
                        
                        
                           N
                           =
                           100
                        
                        N=100
                     
                   samples.
Figure 2

Values of different estimators of V ( μ , ν ) plotted against s for Example 4.2. The plot uses N = 100 samples.

Figure 3 
               Values of different estimators of 
                     
                        
                        
                           V
                           
                              (
                              
                                 μ
                                 ,
                                 ν
                              
                              )
                           
                        
                        V\left(\mu ,\nu )
                     
                   plotted against 
                     
                        
                        
                           s
                        
                        s
                     
                   for Example 4.3 (left) and 4.4 (right). Both plots use 
                     
                        
                        
                           N
                           =
                           100
                        
                        N=100
                     
                   samples.
Figure 3

Values of different estimators of V ( μ , ν ) plotted against s for Example 4.3 (left) and 4.4 (right). Both plots use N = 100 samples.

Discounting numerical errors, all estimators seem to detect convex order. The direct randomized Dirichlet method is less complex; however, it does not seem to explore the P 1 ( R d ) space as well as the two indirect Dirichlet methods. This can be seen in particular in Figure 2, where the direct randomized Dirichlet method does not identify convex order around s 0 correctly. On the other hand, both of the indirect Dirichlet methods yield very similar results for the examples considered. As the name suggests, the “indirect Dirichlet method with samples” works on samples directly, which might be more convenient for practical applications on real data.

As can be expected from the numerical implementation, the histogram method consistently yields the lowest runtimes, while runtimes of the other methods are much higher. Indeed, when working with samples, the weights of the empirical distributions are constant, while the OT cost matrices M a and M b in the implementation have to be re-computed in each iteration and this is very costly; for the histogram method, the weights ρ change, while the grid stays constant – and thus also M a and M b . As an alternative, one can also use the swapping algorithm of [30]. In the aforementioned examples, this does not lead to a significant improvement of accuracy, however.

5 Model-independent arbitrage strategies

Let us consider a financial market with d financial assets and denote its price process by ( S t ) t 0 . Let us assume S 0 = s 0 R and fix two maturities T 1 < T 2 . If call options with these maturities are traded at all strikes, then the prices of the call options determine the distribution of S T 1 and S T 2 under any martingale measure; this fact was first established by [9]. Let us denote the laws of S T 1 and S T 2 by μ and ν , respectively. If trading is only allowed at 0 , T 1 , and T 2 , the following definition is natural and will be crucial for our analysis.

Definition 5.1

The triple of measurable functions ( u 1 , u 2 , Δ ) is a model-independent arbitrage if u 1 L 1 ( μ ) , u 2 L 1 ( ν ) , and

u 1 ( x ) u 1 d μ + u 2 ( y ) u 2 d ν + Δ ( x ) ( y x ) > 0 , for all ( x , y ) R d × R d .

If no such strategies exist, then we call the market free of model-independent arbitrage.

In the aforementioned equation, u 1 and u 2 can be interpreted as payoffs of Vanilla options with market prices u 1 d μ and u 2 d ν , respectively, while the term Δ ( x ) ( y x ) denotes the gains or losses from buying Δ ( x ) assets at time T 1 and holding them until T 2 .

The following theorem makes the connection between model-independent arbitrages and convex order of μ and ν apparent. It can essentially be found in [20, Theorem 3.4].

Theorem 5.2

The following are equivalent:

  1. The market is free of model-independent arbitrage.

  2. ( μ , ν ) .

  3. μ c ν .

In particular, if μ c ν , then there exists a convex function f , such that the triple ( f ( x ) , f ( y ) , g ( x ) ) is a model-independent arbitrage. Here, g is a measurable selector of the subdifferential of f.

The strategy ( f ( x ) , f ( y ) , g ( x ) ) is often called a calendar spread. As our setting is not quite exactly covered by [20, Theorem 3.4] and the proof is not hard, we include it here.

Proof of Theorem 5.2

(ii) (iii) is Strassen’s theorem (see [36]). If μ c ν , then by definition, there exists a convex function f such that

f d μ > f d ν .

On the other hand, f is convex and thus satisfies

f ( y ) f ( x ) g ( x ) ( y x ) for all ( x , y ) R d × R d .

Combining the two aforementioned equations shows that ( f ( x ) , f ( y ) , g ( x ) ) is a model-independent arbitrage, and thus, (i) (iii). It remains to show (ii) (i), which is well known. Indeed, taking expectations in the inequality

u 1 ( x ) u 1 d μ + u 2 ( y ) u 2 d ν + Δ ( x ) ( y x ) > 0 , for all ( x , y ) R d × R d

under any martingale measure with marginals μ and ν leads to a contradiction. This concludes the proof.□

As a direct consequence of Theorem 5.2, we can use Theorem 1.1 to detect model-independent arbitrages in the market under consideration: indeed, Theorem 1.1 states that μ c ν implies the existence of a probability measure ρ P 1 ( R d ) satisfying

C ( ρ , ν ) C ( ρ , μ ) < 0 .

Next, the proof of Theorem 1.1 shows that ρ = f ˆ ( x ) * ν for some convex function f ˆ : R d R and

f ˆ d ν f ˆ d μ C ( ρ , ν ) C ( ρ , μ ) < 0 , i.e. f ˆ d ν < f ˆ d μ .

In particular, a model-independent arbitrage strategy is given by calendar spread ( f ˆ ( x ) , f ˆ ( x ) , f ˆ ( x ) ) . In particular, we can use the same methods as in Section 4 to find ρ . We then estimate f ( x ) ˆ from the optimizing transport plan π Π ( ρ , ν ) of C ( ρ , ν ) by taking the conditional expectation x π y ( d x ) , where ( π y ) y R d denotes the conditional probability distribution of π with respect to its second marginal ν . This is a standard technique (see, e.g., [13] for details). In conclusion, we can obtain an explicit arbitrage strategy.

To illustrate the ideas outlined earlier, we return to Example 4.1, i.e., μ = N ( 0 , σ 2 I ) and ν = N ( 0 , I ) for σ 2 > 0 and d = 1 , 2 . Having determined ρ such that C ( ρ , ν ) C ( ρ , μ ) < 0 , we estimate f ˆ numerically. We show estimates for f ˆ and f ˆ in the plots in Figures 4 and 5.

Figure 4 
               Plot of estimates for 
                     
                        
                        
                           ∂
                           
                              
                                 f
                              
                              
                                 ˆ
                              
                           
                        
                        \partial \hat{f}
                     
                   and 
                     
                        
                        
                           
                              
                                 f
                              
                              
                                 ˆ
                              
                           
                        
                        \hat{f}
                     
                   for 
                     
                        
                        
                           μ
                           =
                           N
                           
                              (
                              
                                 0
                                 ,
                                 2
                              
                              )
                           
                           ,
                           ν
                           =
                           N
                           
                              (
                              
                                 0
                                 ,
                                 1
                              
                              )
                           
                        
                        \mu ={\mathcal{N}}\left(0,2),\nu ={\mathcal{N}}\left(0,1)
                     
                  , and 
                     
                        
                        
                           d
                           =
                           1
                        
                        d=1
                     
                  . Both plots use 
                     
                        
                        
                           N
                           =
                           100
                        
                        N=100
                     
                   samples.
Figure 4

Plot of estimates for f ˆ and f ˆ for μ = N ( 0 , 2 ) , ν = N ( 0 , 1 ) , and d = 1 . Both plots use N = 100 samples.

Figure 5 
               Plot of estimate for 
                     
                        
                        
                           
                              
                                 f
                              
                              
                                 ˆ
                              
                           
                        
                        \hat{f}
                     
                   for 
                     
                        
                        
                           μ
                           =
                           N
                           
                              (
                              
                                 0
                                 ,
                                 4
                                 I
                              
                              )
                           
                           ,
                           ν
                           =
                           N
                           
                              (
                              
                                 0
                                 ,
                                 I
                              
                              )
                           
                        
                        \mu ={\mathcal{N}}\left(0,4I),\nu ={\mathcal{N}}\left(0,I)
                     
                  , 
                     
                        
                        
                           d
                           =
                           2
                        
                        d=2
                     
                  . Both plots use 
                     
                        
                        
                           N
                           =
                           100
                        
                        N=100
                     
                   samples.
Figure 5

Plot of estimate for f ˆ for μ = N ( 0 , 4 I ) , ν = N ( 0 , I ) , d = 2 . Both plots use N = 100 samples.

6 Remaining proofs

Proof of Corollary 2.3

Recall that a function g : R d R is c -concave, iff f ( x ) x 2 2 g ( x ) is convex. In particular,

g d μ g d ν = x 2 2 g ( x ) μ ( d x ) + y 2 2 g ( y ) ν ( d y ) + x 2 2 μ ( d x ) y 2 2 ν ( d y ) = f d ν f d μ + x 2 2 μ ( d x ) y 2 2 ν ( d y ) .

By (7), we obtain

inf g c -concave g d μ g d ν = inf f convex f d ν f d μ + x 2 2 μ ( d x ) y 2 2 ν ( d y ) = 1 2 inf ρ P ( R d ) W 2 2 ( μ , ρ ) W 2 2 ( ν , ρ ) + y 2 ν ( d y ) x 2 μ ( d x ) + x 2 μ ( d x ) y 2 ν ( d y ) = 1 2 inf ρ P ( R d ) ( W 2 ( μ , ρ ) 2 W 2 ( ν , ρ ) 2 ) .

This concludes the proof.□

Proof of Corollary 2.4

The first claim follows from Corollary 1.2 by setting ρ = μ . By (4), the above implies

2 x 2 μ ( d x ) 2 sup π Π ( μ , ν ) x , y π ( d x , d y ) ,

so the second claim follows.□

Proof of Corollary 2.1

First, [41, Theorem 2 and Lemma 1] show that μ c ν iff

(8) 0 1 [ F ν 1 ( 1 u ) F μ ( 1 u ) ] d h ( u ) 0

for all concave functions h such that the aforementioned integral is finite. As any concave function is Lebesgue-almost surely differentiable, standard approximation arguments imply that (8) holds iff

0 1 g ( u ) [ F ν 1 ( u ) F μ 1 ( u ) ] d u 0

for all bounded increasing left-continuous functions g : ( 0 , 1 ) R . But

{ F ρ 1 : ρ P ( R ) with bounded support }

is exactly the set of all bounded increasing left-continuous functions on ( 0 , 1 ) . Note that by [38, Equation (2.47)]

W 2 ( ν , ρ ) 2 = 0 1 ( F ν 1 ( x ) F ρ 1 ( x ) ) 2 d x = y 2 ν ( d y ) 2 0 1 F ν 1 ( x ) F ρ 1 ( x ) d x + z 2 ρ ( d z ) ,

we calculate

W 2 ( ν , ρ ) 2 W 2 ( μ , ρ ) 2 = y 2 ν ( d y ) 2 0 1 F ρ 1 ( u ) F ν 1 ( u ) d u + z 2 ρ ( d z ) x 2 μ ( d y ) + 2 0 1 F ρ 1 ( u ) F μ 1 ( u ) d u z 2 ρ ( d z ) = 2 0 1 F ρ 1 ( u ) [ F μ 1 ( u ) F ν 1 ( u ) ] d u + y 2 ν ( d y ) x 2 μ ( d y ) .

This concludes the proof.□


A first version of this article was submitted to the ArXiv on July 04, 2022, under the title “A characterisation of convex order using the 2-Wasserstein distance.”


Acknowledgments

JW thanks Beatrice Acciaio, Guillaume Carlier, Max Nendel, Gudmund Pammer, and Ruodu Wang for helpful discussions.

  1. Funding information: JW acknowledges the support by NSF Grant DMS-2205534. Part of this research was performed while JW was visiting the Institute for Mathematical and Statistical Innovation (IMSI), which is supported by the National Science Foundation (Grant No. DMS-1929348). EZ acknowledges the support through the summer internship program of the Columbia University Statistics Department.

  2. Conflict of interest: There are no conflicts of interest.

  3. Data availability statement: Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

[1] Acciaio, B., Beiglböck, M., Penkner, F., & Schachermayer, W. (2013). A model-free version of the fundamental theorem of asset pricing and the super-replication theorem. Mathematical Finance, 26, 233–251. doi: 10.1111/mafi.12060. Search in Google Scholar

[2] Alfonsi, A., Corbetta, J., & Jourdain, B. (2019). Sampling of one-dimensional probability measures in the convex order and computation of robust option price bounds. International Journal of Theoretical and Applied Finance, 22(3), 1950002. 10.1142/S021902491950002XSearch in Google Scholar

[3] Alfonsi, A., Corbetta, J., & Jourdain, B. (2020). Sampling of probability measures in the convex order by Wasserstein projection. Annales Henri Poincare, 56(3), 1706–1729. 10.1214/19-AIHP1014Search in Google Scholar

[4] Alfonsi, A., & Jourdain, B. (2020). Squared quadratic Wasserstein distance: optimal Couplings and Lions differentiability. ESAIM: Probability and Statistics, 24, 703–717. 10.1051/ps/2020013Search in Google Scholar

[5] Arnold, B. (2012). Majorization and the Lorenz order: A brief introduction, (Vol. 43). Berlin: Springer Science & Business Media. Search in Google Scholar

[6] Beiglböck, M., Henry-Labordère, P., & Penkner, F. (2013). Model-independent bounds for option prices-a mass transport approach. Finance and Stochastics, 17(3), 477–501. 10.1007/s00780-013-0205-8Search in Google Scholar

[7] Beiglböck, M., Nutz, M., & Touzi, N. (2015). Complete duality for martingale optimal transport on the line. Annals of Probability, 45(5), 3038–3074. 10.1214/16-AOP1131Search in Google Scholar

[8] Bernard, C., Rüschendorf, L., & Vanduffel, S. (2017). Value-at-risk bounds with variance constraints. Journal of Risk and Insurance, 84(3), 923–959. 10.1111/jori.12108Search in Google Scholar

[9] Breeden, D., & Litzenberger, R. (1978). Prices of state-contingent claims implicit in option prices. Journal of Business, 51, 621–651. 10.1086/296025Search in Google Scholar

[10] Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Communications on Pure and Applied Mathematics, 44(4), 375–417. 10.1002/cpa.3160440402Search in Google Scholar

[11] Carlier, G. (2008). Remarks on tolandas duality, convexity constraint and optimal transport. Pacific Journal of Optimization, 4(3), 423–432. Search in Google Scholar

[12] De March, H., & Touzi, N. (2019). Irreducible convex paving for decomposition of multidimensional martingale transport plans. Annals of Probability, 47(3), 1726–1774. 10.1214/18-AOP1295Search in Google Scholar

[13] Deb, N., Ghosal, P., & Sen, B. (2021). Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems, 34, 29736–29753. Search in Google Scholar

[14] Domingo-Enrich, C., Schiff, Y., & Mroueh, Y. (2022). Learning with Stochastic Orders. arXiv: http://arXiv.org/abs/arXiv:2205.13684. Search in Google Scholar

[15] Elton, J., & Hill, T.P. (1992). Fusions of a probability distribution. The Annals of Probability, 20, 421–454. 10.1214/aop/1176989936Search in Google Scholar

[16] Elton, J., & Hill, T.P. (1998). On the basic representation theorem for convex domination of measures. Journal of Mathematical Analysis and Applications, 228(2), 449–466. 10.1006/jmaa.1998.6158Search in Google Scholar

[17] Embrechts, P., Puccetti, G., & Rüschendorf, L. (2013). Model uncertainty and var aggregation. Journal of Banking & Finance, 37(8), 2750–2764. 10.1016/j.jbankfin.2013.03.014Search in Google Scholar

[18] Galichon, A., Henry-Labordère, P., & Touzi, N. (2014). A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options. Annals of Applied Probability, 24(1), 312–336. 10.1214/13-AAP925Search in Google Scholar

[19] Guo, G., & Obłój, J. (2019). Computational methods for martingale optimal transport problems. Annals of Applied Probability, 29(6), 3311–3347. 10.1214/19-AAP1481Search in Google Scholar

[20] Guyon, J., Menegaux, R., & Nutz, M. (2017). Bounds for VIX futures given S&P 500 smiles. Finance and Stochastics, 21, 593–630. 10.1007/s00780-017-0334-6Search in Google Scholar

[21] Jourdain, B., & Margheriti, W. (2022). Martingale Wasserstein inequality for probability measures in the convex order. Bernoulli, 28(2), 830–858. 10.3150/21-BEJ1368Search in Google Scholar

[22] Kantorovich, L. (1958). On the translocation of masses. Management Science, 5, 1–4. 10.1287/mnsc.5.1.1Search in Google Scholar

[23] Massa, M., & Siorpaes, P. (2022). How to quantise probabilities while preserving their convex order. arXiv: http://arXiv.org/abs/arXiv:2206.10514. Search in Google Scholar

[24] Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Paris: De l’Imprimerie Royale. Search in Google Scholar

[25] Müller, A., & Rüschendorf, L. (2001). On the optimal stopping values induced by general dependence structures. Journal of Applied Probability, 38(3), 672–684. 10.1239/jap/1005091031Search in Google Scholar

[26] Müller, A., & Stoyan, D. (2002). Comparison Methods for Stochastic Models and Risks, (Vol. 389), New York: Wiley. Search in Google Scholar

[27] Nendel, M. (2020). A note on stochastic dominance, uniform integrability and lattice properties. Bulletin of the London Mathematical Society, 52(5), 907–923. 10.1112/blms.12371Search in Google Scholar

[28] Obłój, J., & Siorpaes, P. (2017). Structure of martingale transports in finite dimensions. arXiv: http://arXiv.org/abs/arXiv:1702.08433. Search in Google Scholar

[29] Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607. 10.1561/2200000073Search in Google Scholar

[30] Puccetti, G. (2017). An algorithm to approximate the optimal expected inner product of two vectors with given marginals. Journal of Mathematical Analysis and Applications, 451(1), 132–145. 10.1016/j.jmaa.2017.02.003Search in Google Scholar

[31] Rachev, S., & Rüschendorf, L. (1998). Mass Transportation Problems: Volume I: Theory, (Vol. 1). New York: Springer Science & Business Media.Search in Google Scholar

[32] Ross, S. (1996). Stochastic Processes, (Vol. 2). New York: Wiley. Search in Google Scholar

[33] Rüschendorf, L., & Rachev, S. (1990). A characterization of random variables with minimum L2-distance. Journal of Multivariate Analysis, 32(1), 48–54. 10.1016/0047-259X(90)90070-XSearch in Google Scholar

[34] Rüschendorf, L., & Uckelmann, L. (2002). Variance minimization and random variables with constant sum. In: Distributions with given marginals and statistical modeling, (pp. 211–222) Dordrecht: Springer.10.1007/978-94-017-0061-0_22Search in Google Scholar

[35] Shaked, M., & Shanthikumar, J. (2007). Stochastic Orders. New York: Springer. 10.1007/978-0-387-34675-5Search in Google Scholar

[36] Strassen, V. (1965). The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36, 423–439. 10.1214/aoms/1177700153Search in Google Scholar

[37] Tchen, A. (1980). Inequalities for distributions with given marginals. Annals of Probability, 8, 814–827. 10.1214/aop/1176994668Search in Google Scholar

[38] Villani, C. (2003). Topics in optimal transportation. (Vol. 58). Providence, Rhode Island: American Mathematical Society. Search in Google Scholar

[39] Villani, C. (2008). Optimal Transport: Old and New, (Vol. 338). Berlin: Springer. Search in Google Scholar

[40] Wang, B., & Wang, R. (2011). The complete mixability and convex minimization problems with monotone marginal densities. Journal of Multivariate Analysis, 102(10), 1344–1360. 10.1016/j.jmva.2011.05.002Search in Google Scholar

[41] Wang, Q., Wang, R., & Wei, Y. (2020). Distortion riskmetrics on general spaces. ASTIN Bulletin, 50(3), 827–851. 10.1017/asb.2020.14Search in Google Scholar

Received: 2023-03-08
Revised: 2023-06-30
Accepted: 2023-08-30
Published Online: 2023-10-18

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 16.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/demo-2023-0102/html
Scroll to top button