Home Mathematics Stability estimation of some Markov controlled processes
Article Open Access

Stability estimation of some Markov controlled processes

  • Evgueni Gordienko EMAIL logo and Juan Ruiz de Chavez
Published/Copyright: November 24, 2022

Abstract

We consider a discrete-time Markov controlled process endowed with the expected total discounted reward. We assume that the distribution of the underlying random vectors is unknown and that it is approximated by an appropriate known distribution. We found upper bounds of a decrease in reward when the policy, optimal for the approximating process, is applied to control the original process.

MSC 2010: 90B05; 90C31; 90C40; 93E20

1 Introduction

In the theory of discrete-time Markov processes, the term “stability” is used in various meanings. First and foremost, uncontrolled processes, this refers to some recurrent or ergodic properties of the processes (see, e.g., [1]).

Quite a long time ago, this concept moved into the field of controlled processes, particularly, in the context of adaptive control. (Among the huge number of references, we indicate only a couple of fairly recent ones [2,3].)

The second widely used meaning of the word “stability” is close to “continuity.” Speaking of the quantitative approach to such continuity under perturbations of certain parameters, the deviations of some basic characteristics of the Markov processes (such as the limiting distribution) are estimated.

Using probability metrics, the methods of quantitative continuity of uncontrolled processes have been developed, for instance, in the works [4,5,6].

The quantitative assessment of the stability (or “continuity”) of optimal control of a Markov process has its own peculiarities. Here, the policy that is optimal for a certain “approximating process” is used to control the original (“real”) process. The underlying probability distributions of the latter are unknown and are often evaluated by statistical procedures. Such estimation leads to what we have designated as the “approximating controlled process.”

The problem is posed as finding the upper bounds of the stability index, which is defined in (2.7) in Section 2, and it expresses the decrease in the given performance index (compared to the application of the optimal for the original process control). This problem was probably first considered in [7,8]. Since then, the authors just mentioned and others have been solving this problem for various classes of discrete-time Markov controlled processes and for different performance indexes (optimization criteria).

In this article, we consider Markov control processes with general state and action spaces, choosing the expected total discounted reward as an optimization criterion. Thus, the results given in Section 3 are related to those obtained in the previous articles [9,10,11]. In contrast to the problem setting in these articles, we focus our attention on the controlled processes with bounded one-step rewards. This allows us to obtain new stability inequalities using both the total variation metric and the Dudley metric.

The total variation distance works well under the standard compactness-continuity conditions, but to obtain the corresponding stability inequality in terms of the Dudley metric, we have to impose additional Lipschitz continuity conditions.

The Dudley metric is convenient in an important situation where the nonparametric approach is applied, i.e., when unknown probability distributions are approximated by empirical distributions (see, e.g., [12]).

It should be noted that the problem of estimating the stability of optimal control considered in the article is closely related to the problem of adaptive control of Markov processes. In the adaptive formulation, the control is accompanied by some estimation procedure, and the current control policies should approximate the optimal ones as the distribution (or parameters) is refined. For the development of adaptive algorithms, quantitative estimates of the “stability of optimal control” can be useful. Among the vast literature, the works [13,14,15, 16,17] used the expected total discounted reward as a criterion of optimization and discuss the application of nonparametric estimation of “governing distributions.”

2 Setting of the problem

We consider a discrete-time Markov controlled process of the form:

(2.1) X t = F ( X t 1 , a t , ξ t ) , t = 1 , 2 , ,

where X t X is a state of the process at time t , and ξ 1 , ξ 2 , is a sequence of independent and identically distributed (i.i.d.) random vectors with values in a complete separable metric space ( S , ρ ) . Let A be a given action set. Then, if X t 1 = z X , then the control (action) a t is selected from a designated compact subset A ( z ) A . We assume that X and A are complete separable metric spaces (which are, particularly, Borel spaces). The metric in X will be denoted by d . Finally, F : X × A × S X is a measurable function.

A sequence π = ( a 1 , , a t , ) , where the control a t at time t is a measurable function of the current state x t 1 and can also depend on previous states and actions, is called control policy, or simply policy. A policy π is called stationary and denoted by f if there is a measurable function f : X A such that a t = f ( X t 1 ) A ( X t 1 ) , t = 1 , 2 .

We denote by:

  1. Π the set of all policies;

  2. F the set of all stationary policies.

A policy optimization criterion, in our setting, is the expected total discounted reward:

(2.2) V ( x , π ) = E x π t = 1 α t 1 r ( X t 1 , a t ) , π Π , x X ,

where E x π is the expectation with respect to the probability that corresponds to the application of a policy π with an initial state of the process x X (see, e.g., [18] for the construction of the corresponding probability space). r ( z , a ) is the one-step reward acquired when the process is in the state z and the action a is selected and α ( 0 , 1 ) is a given discount factor.

Throughout the article, we will assume that r is a measurable bounded function, that is,

(2.3) sup ( x , a ) K r ( x , a ) b < .

In this inequality and further on, K = def { ( x , a ) X × A : a A ( x ) for all x X } , which is supposed to be a measurable subset of X × A .

The policy π is called optimal, if for each x X ,

(2.4) V ( x , π ) = V ( x ) = def sup π Π V ( x , π ) , x X .

In many applications, all components of the process, except for the distribution G of the random vector ξ 1 in (2.1), are known. For the distribution G , usually some approximation G ˜ is available (e.g., obtained from statistical data).

Despite the fact that a controller is looking for the optimal policy π , she/he is forced to work with the following approximating controlled processes:

(2.5) X ˜ t = F ( X ˜ t 1 , a ˜ t , ξ ˜ t ) , t = 1 , 2 , .

The only difference between this process and the “original” process in (2.1) is that the i.i.d. random vectors, ξ ˜ 1 , ξ ˜ 2 , , have the common distribution G ˜ .

The expected total discounted reward V ˜ ( x , π ) for the process (2.5) is defined by formula (2.2), in which X t 1 , a t is replaced by X ˜ t 1 , a ˜ t .

Let B denote the space of all measurable bounded functions u : X R , with the uniform norm:

u = def sup x X u ( x ) .

Let ξ and ξ ˜ be generic vectors for ξ 1 , ξ 2 , and ξ ˜ 1 , ξ ˜ 2 , , respectively.

Assumption 1

For each fixed x X :

  1. the function r ( x , ) is continuous on A ( x ) ;

  2. for every u B , the maps

    a E u [ F ( x , a , ξ ) ] and a E u [ F ( x , a , ξ ˜ ) ]

    are continuous on A ( x ) .

The next assertion is well-known (see, e.g., [13, Ch. 2], and [19] for the proof).

Proposition 2.1

Under Assumption 1, there exist stationary policies π f and π ˜ f ˜ , which are optimal for therealprocess (2.1) and for the approximating process (2.5), respectively.

In other words, (2.4) holds with f , and also

(2.6) V ˜ ( x , f ˜ ) = V ˜ ( x ) = def sup π Π V ˜ ( x , π ) , x X .

Remark 2.1

If we assume that for a compact A , A ( x ) = A , x X and the one-step reward function r ( x , a ) is continuous on X × A , then Proposition 2.1 holds true if we replace Assumption 1(b) with the following less restrictive condition:

Assumption 1

( b ): For each x X and for every continuous and bounded function u : X R , the maps

a E u [ F ( x , a , ξ ) ] and a E u [ F ( x , a , ξ ˜ ) ]

are continuous on A . (See [19] for the corresponding proof of Proposition 2.1.)

Assume that the controller can find the policy f ˜ , and she/he applies f ˜ to control the “original” process (2.1). In this way, f ˜ is used as reasonable approximation to the not available policy f . We will measure the accuracy of such an approximation by evaluating the following stability index:

(2.7) Δ ( x ) = def V ( x , f ) V ( x , f ˜ ) 0 , x X .

The problem under consideration is to prove stability inequalities of the type:

sup x X Δ ( x ) C μ ( G , G ˜ ) ,

where μ is either the total variation metric or the Dudley metric.

3 The results

First, we recall the definitions of two metrics in the space of distributions of random vectors with values in ( S , S ) . Here, S is the Borel σ -algebra of subsets of S .

The total variation metric V (see, e.g., [20]):

If ξ and ξ ˜ are random vectors with distributions G and G ˜ , then

(3.1) V ( G , G ˜ ) = def sup φ B 1 E φ ( ξ ) E φ ( ξ ˜ ) ,

where

B 1 = { φ : S R : φ is measurable and φ = sup s S φ ( s ) 1 } .

The Dudley metric d (see [21]):

(3.2) d ( G , G ˜ ) = def sup φ B 1 , L E φ ( ξ ) E φ ( ξ ˜ ) ,

where

(3.3) B 1 , L = φ B 1 : φ + sup s s φ ( s ) φ ( s ) ρ ( s , s ) 1 , where ρ is the metric in S .

It is well-known that the convergence in the metric d is equivalent to the weak convergence of distributions (see, e.g., [21]).

Theorem 1

Under (2.3) and Assumption 1,

(3.4) sup x X Δ ( x ) 2 α b ( 1 α ) 2 V ( G , G ˜ ) .

Proof

In view of Proposition 2.1, we can write (2.4) and (2.6) as follows ( x X ):

(3.5) V ( x , f ) = V ( x ) = sup f F V ( x , f ) ,

(3.6) V ˜ ( x , f ) = V ˜ ( x ) = sup f F V ˜ ( x , f ) .

Then, for arbitrary x X by (2.7) and (3.5) and (3.6),

(3.7) Δ ( x ) V ( x , f ) V ˜ ( x , f ˜ ) + V ˜ ( x , f ˜ ) V ( x , f ˜ ) = sup f F V ( x , f ) sup f F V ˜ ( x , f ) + V ( x , f ˜ ) V ˜ ( x , f ˜ ) 2 sup f F V ( x , f ) V ˜ ( x , f ) .

Let us fix an arbitrary stationary policy f F and define two operators:

T f : B B and T ˜ f : B B as follows ( u B ):

(3.8) T f u ( x ) = def { r ( x , f ( x ) ) + α E u [ F ( x , f ( x ) , ξ ) ] } , x X ,

(3.9) T ˜ f u ( x ) = def { r ( x , f ( x ) ) + α E u [ F ( x , f ( x ) , ξ ˜ ) ] } , x X .

The following two facts are well-known (see, e.g., [13, Ch. 2]):

  1. The functions V f ( ) = def V ( , f ) and V ˜ f ( ) = def V ˜ ( , f ) (where “ ” stands for x X ) belong to B , and moreover, they are fixed points of the operators T f and T ˜ f , that is,

    (3.10) T f V f = V f and T ˜ f V ˜ f = V ˜ f .

  2. The operators T f and T ˜ f are contractive with modulus α , that is, ( u , v B ):

    (3.11) T f u T f v α u v ; T ˜ f u T ˜ f v α u v .

    Therefore,

    V f V ˜ f = T f V f T ˜ f V ˜ f T f V f T f V ˜ f + T f V ˜ f T ˜ f V ˜ f α V f V ˜ f + T f V ˜ f T ˜ f V ˜ f .

    Hence,

    (3.12) V f V ˜ f 1 1 α T f V ˜ f T ˜ f V ˜ f .

Let us estimate the second factor on the right side of (3.12). By (3.8) and (3.9), we have

(3.13) T f V ˜ f T ˜ f V ˜ f = α sup x X E V ˜ f [ F ( x , f ( x ) , ξ ) ] E V ˜ f [ F ( x , f ( x ) , ξ ˜ ) ] .

Using the definition of V ˜ f (i.e., (2.2) with X ˜ t , a ˜ t ), we see that

(3.14) sup x X V ˜ f ( x ) t = 1 α t 1 b = b 1 α .

Thus, for each x fixed, in (3.13), the function V ˜ f [ F ( x , f ( x ) , s ) ] of s S is bounded by b ( 1 α ) 1 . Applying the definitions (3.1), (3.13), and (3.12), we find that

sup x X V f ( x ) V ˜ f ( x ) α b ( 1 α ) 2 V ( G , G ˜ ) .

Combining the last inequality with (3.7), we obtain (3.4).□

In a fairly common situation, the unknown distribution G is estimated by the empirical distribution G ˜ n , obtained from the sample ξ 1 , ξ 2 , , ξ n . Excluding the cases of discrete G , V ( G , G ˜ n ) fails to approach zero as n . Thus, in many situations, inequality (3.4) is useless. On the other hand, under mild conditions, we have:

d ( G , G ˜ n ) 0 almost surely , and E d ( G , G ˜ n ) 0 as n ,

(see the end of this section.)

To obtain the stability inequality with the Dudley metric d on the right-hand side, we need additional Lipschitz conditions.

Assumption 2

  1. There exist a constant L 0 and a measurable function L ¯ 1 : S [ 0 , ) such that:

    (3.15) ( 1 ) r ( x , a ) r ( y , a ) L 0 d ( x , y ) , for all ( x , a ) , ( y , a ) K ;

    (3.16) ( 2 ) d [ F ( x , a , ξ ) , F ( y , a , ξ ) ] L ¯ 1 ( ξ ) d ( x , y ) , for all ( x , a ) , ( y , a ) K ,

    E L ¯ 1 ( ξ ) = L 1 and α L 1 < 1 .

  2. There is a constant L < such that for each ( x , a ) K ; s , s S ,

    (3.17) d [ F ( x , a , s ) , F ( x , a , s ) ] L ρ ( s , s ) .

  3. A is compact and A ( x ) = A for all x X .

Theorem 2

Under Assumptions 1and2,

(3.18) sup x X Δ ( x ) 2 α ( 1 α ) 2 b 1 α + L 0 L 1 α L 1 d ( G , G ˜ ) ,

whered is the Dudley metric defined in (3.2).

Proof

We define the operators T : B B and T ˜ : B B as follows ( u B ):

(3.19) T u ( x ) = def sup a A { r ( x , a ) + α E u [ F ( x , a , ξ ) ] } , x X ,

(3.20) T ˜ u ( x ) = def sup a A { r ( x , a ) + α E u [ F ( x , a , ξ ˜ ) ] } , x X .

In [13, Ch. 2], it was proved that: (1)

(3.21) V = T V and V ˜ = T ˜ V ˜ ,

where V and V ˜ defined in (3.5) and (3.6) are value functions of the process (2.1) and of the process (2.5), respectively.

(2) Both operators T and T ˜ are contractive (with respect to ) with modulus α .

Let us define the number (generally belonging to [ 0 , ] ):

(3.22) μ ( ξ , ξ ˜ ) = def sup ( x , a ) K E V [ F ( x , a , ξ ) ] E V [ F ( x , a , ξ ˜ ) ] .

The first step in the proof is to establish the following inequality:

(3.23) sup x X Δ ( x ) 2 α ( 1 α ) 2 μ ( ξ , ξ ˜ ) .

For ( x , a ) K , let

(3.24) H ( x , a ) = def r ( x , a ) + α E V [ F ( x , a , ξ ) ] ,

(3.25) H ˜ ( x , a ) = def r ( x , a ) + α E V ˜ [ F ( x , a , ξ ˜ ) ] ,

and for each t 1 ,

Γ t = { x , a 1 ; X 1 , a 2 ; , X t 1 , a t }

be the part of a trajectory of the process (2.1) when applying the stationary policy f ˜ .

By Markov property of process (2.1) (when a stationary policy is applied) and (3.24), we have:

ζ t = def E f ˜ [ α V ( X t ) Γ t ] = H ( X t 1 , a t ) r ( X t 1 , a t ) = H ( X t 1 , a t ) r ( X t 1 , a t ) sup a A H ( X t 1 , a ) + sup a A H ( X t 1 , a ) .

We can see from (3.24), (3.19), and (3.21) that

sup a A H ( X t 1 , a ) = V ( X t 1 ) .

Hence,

(3.26) ζ t = H ( X t 1 , a t ) sup a A H ( X t 1 , a ) r ( X t 1 , a t ) + V ( X t 1 ) = Λ t r ( X t 1 , a t ) + V ( X t 1 ) ,

where

(3.27) Λ t = def sup a A H ( X t 1 , a ) H ( X t 1 , a t ) .

Now, rewriting (3.26) as

V ( X t 1 ) r ( X t 1 , a t ) ζ t = Λ t

and taking expectation E x f ˜ (in both parts), we obtain:

E x f ˜ V ( X t 1 ) E x f ˜ r ( X t 1 , a t ) α E x f ˜ V ( X t ) = E x f ˜ Λ t .

Multiplying the last equality by α t 1 and summing the inequalities with t = 1 , 2 , n , we obtain:

(3.28) V ( x ) α n E x f ˜ V ( X n ) t = 1 n α t 1 E x f ˜ r ( X t 1 , a t ) = t = 1 n α t 1 E x f ˜ Λ t .

From (3.14), it follows that V is a bounded function. So, taking in (3.28) limit n , the second term on the left-hand side tends to zero, while the third term approaches V ( x , f ˜ ) . Therefore,

(3.29) Δ ( x ) = V ( x ) V ( x , f ˜ ) = t = 1 α t 1 E x f ˜ Λ t .

Since f ˜ is the optimal policy for the process (2.5) applying (3.20), (3.21), and (3.25), we easily find that

sup a A H ˜ ( X t 1 , a ) = H ˜ ( X t 1 , a t ) .

Hence, by (3.27),

Λ t = sup a A H ( X t 1 , a ) sup a A H ˜ ( X t 1 , a ) + H ˜ ( X t 1 , a t ) H ( X t 1 , a t )

and

Λ t sup a A 2 H ( X t 1 , a ) H ˜ ( X t 1 , a ) 2 α sup a A E V [ F ( X t 1 , a , ξ ) ] E V ˜ [ F ( X t 1 , a , ξ ˜ ) ] ,

where the expectation in the last term is taken with respect to the random vectors ξ and ξ ˜ (with X t 1 being fixed). From the last inequality, we obtain:

(3.30) Λ t 2 α sup a A E V [ F ( X t 1 , a , ξ ) ] E V [ F ( X t 1 , a , ξ ˜ ) ] + 2 α sup a A E V [ F ( X t 1 , a , ξ ˜ ) ] E V ˜ [ F ( X t 1 , a , ξ ˜ ) ] .

The first term on the right-hand side of (3.30) is not greater than 2 α μ ( ξ , ξ ˜ ) (see (3.22)), and the second term is not greater than 2 α V V ˜ .

Using (3.21) and the contractive property of T ˜ , we have

V V ˜ T ˜ V ˜ T ˜ V + T ˜ V T V α V V ˜ + T V T ˜ V

or (see (3.19), (3.20))

V V ˜ α 1 α sup x X sup a A E V [ F ( x , a , ξ ) ] E V [ F ( x , a , ξ ˜ ) ] α 1 α μ ( ξ , ξ ˜ ) .

The last inequality and (3.30) provide that for each t 1 ,

Λ t 2 α 1 + α 1 α μ ( ξ , ξ ˜ ) .

Substituting this in (3.29), we obtain (3.23).

The second step in the proof of the theorem is to show that under Assumption 2, in (3.23),

(3.31) μ ( ξ , ξ ˜ ) b 1 α + L 0 L 1 α L 1 d ( G , G ˜ ) .

By (3.14), the function V in (3.22) is bounded by b ( 1 α ) 1 . Now, we will show that for all ( x , a ) K ; s , s S ,

(3.32) V [ F ( x , a , s ) ] V [ F ( x , a , s ) ] L ˜ ρ ( s , s ) ,

(3.33) where L ˜ = L 0 L 1 α L 1 .

First, we check that the value function V : X R satisfies the Lipschitz conditions with the constant L 0 / ( 1 α L 1 ) .

Let u 0 0 and T be the operator defined in (3.19). Also, set u 1 = T u 0 . Then, for any x , y X ,

(3.34) u 1 ( x ) u 1 ( y ) = sup a A r ( x , a ) sup a A r ( y , a ) sup a A r ( x , a ) r ( y , a ) L 0 d ( x , y ) ,

due to (3.15) in Assumption 2.

Let now u 2 = T u 1 . Then, in view of (3.19),

u 2 ( x ) u 2 ( y ) sup a A { r ( x , a ) r ( y , a ) + α E u 1 [ F ( x , a , ξ ) ] E u 1 [ F ( y , a , ξ ) ] } L 0 d ( x , y ) + α L 0 sup a A E r ( F ( x , a , ξ ) , F ( y , a , ξ ) ) L 0 ( 1 + α L 1 ) d ( x , y ) .

To obtain the last inequality, we have made use of (3.34) and Assumption 2(a), (2).

Letting u n = T u n 1 , n 1 , it is proved by induction that for any x , y ,

(3.35) u n ( x ) u n ( y ) L 0 [ 1 + α L 1 + , ( α L 1 ) n 1 ] d ( x , y ) .

Since V is a fixed point of the contractive operator T , we have V T n u 0 0 as n . We see from (3.35) that for every n 1 , the function u n = T n u 0 is Lipschitz with the constant L ˜ ˜ = L 0 ( 1 α L 1 ) . Consequently, V satisfies the Lipschitz condition with the constant L ˜ ˜ .

To verify (3.32), observe that by Assumption 2(b), the function φ ( s ) = V [ F ( x , a , s ) ] is a composition of two Lipschitz functions.

Note that φ b ( 1 α ) 1 , and φ is Lipschitz with the constant L ˜ in (3.33). Therefore, if we divide φ by b ( 1 α ) 1 + L ˜ , we obtain a function from the class B 1 , L in (3.3). Finally, to obtain inequality (3.31), it suffices to compare (3.22) with the definition of the Dudley metric given in (3.2) and (3.3).□

The natural question arises: How to evaluate d ( G , G ˜ ) in (3.18) if the distribution G is assumed to be unknown? We can give the answer in one of the most important cases when G ˜ is the empirical distribution, used to estimate G .

Now, we assume that the random vectors ξ 1 , ξ 2 in (2.1) are observable and let ξ 1 , ξ 2 , ξ n be i.i.d. observations of a random vector ξ with distribution G . The empirical distribution G ˜ G ˜ n is defined (on ( S , S ) ) as follows:

G ˜ n = 1 n k = 1 n δ ξ k , where for k = 1 , 2 , n ,

δ ξ k ( B ) = 1 , if ξ k B , 0 , otherwise .

( B S ).

Assume that S = R k and is the Euclidian norm. Also, suppose that there exist constants K < and h > 0 such that E e h ξ K .

Then, there is a calculable constant C = C ( k , K , h ) such that: for each n = 1 , 2 ,

(3.36) E d ( G , G ˜ n ) C δ ( k , n ) ,

where δ ( k , n ) = log ( 1 + n ) n 1 / 2 , if  k = 1 , log 2 ( 1 + n ) n 1 / 2 , if  k = 2 , log ( 1 + n ) n 1 / k , if  k 3 .

The inequality (3.36) was shown in Proposition 2.1 in [10], but it is actually a fairly direct consequence of Proposition 3.4 in [12]. Taking expectation in both parts of (3.18) one can apply inequality (3.36).

Remark 3.1

There is a class of controlled Markov processes with observable “perturbations” ξ 1 , ξ 2 . (One representative is discussed in Example 2.) Even more often, the mentioned random vectors are not observable. In such cases, one should either use some indirect methods of bounding d ( G , G n ) , or look for other treatments. It is worth noting that our setting of the problem, generally speaking, does not require any estimation procedure. The distribution G can be, for example, some “theoretical simplification” of the known, but “too complex” real distribution G .

4 Examples

Example 1

In fact, this is a simple counterexample showing that Assumption 2 is essential for inequality (3.18) to hold.

Let X = [ 0 , ) , A = { 0 , 1 } , S = R 2 , and for ξ t = ( ξ t ( 1 ) , ξ t ( 2 ) ) ,

X t = ξ t ( 1 ) + X t 1 a t ξ t ( 2 ) , t = 1 , 2 , .

For a = 0 and a = 1 , the one-step reward function is the same and given by the following formula:

(4.1) r ( x , a ) = 2 , if x = 0 x , if x ( 0 , 1 ] 1 , if x > 1 .

For an arbitrary but fixed ε > 0 , we set G = δ ( 0 , 1 ) , that is,

P ( ξ t ( 1 ) = 0 ) = 1 and P ( ξ t ( 2 ) = 1 ) = 1 ,

and also, G ˜ = δ ( ε , 1 ) , that is

P ( ξ t ( 1 ) = ε ) = 1 and P ( ξ t ( 2 ) = 1 ) = 1 .

Then, the “real” process is

(4.2) X t = X t 1 a t , t 1 ,

and the approximating one is

(4.3) X ˜ t = ε + X ˜ t 1 a t , t 1 .

Let α ( 0 , 1 ) be any discount factor.

Let us fix the initial state X 0 = X ˜ 0 = 1 . From (4.1) and (4.2), we see that the optimal stationary policy for the process (4.2) is f = { 0 , 0 , } (i.e., always select the action f ( x ) = 0 ). The corresponding reward is

(4.4) V ( 1 , f ) = 1 + t = 2 α t 1 2 = 1 1 α + α 1 α .

Since the process (4.3) can never reach the state x = 0 and r ( x , a ) is non-decreasing on ( 0 , ) , the stationary optimal policy for the process (4.3) is f ˜ = { 1 , 1 , } . The application of f ˜ to the process (4.2) gives

V ( 1 , f ˜ ) = t = 1 α t 1 1 = 1 1 α .

Comparing this with (4.4), we see that the stability index in (2.7) is

Δ ( 1 ) = α 1 α > 0 .

On the other hand, it is easy to show that d ( G , G ε ) = ε 0 (as ε 0 ).

Note that V ( G , G ε ) = 2 for all ε > 0 .

Example 2

(See, e.g., [13, Ch. 1] or [22].) In this model, related to a dam operation, the stocks of water are specified by the equations

(4.5) X t = min { X t 1 a t + ξ t , M } , t = 1 , 2 , ,

where M < is the capacity of a water reservoir, X t 1 is the stock of water at the beginning of t th period (say, day). The control a t is the volume of water released during the t th period (e.g., for irrigation). Finally, ξ t is a non-negative random variable representing the water inflow in the t th period. We assume that ξ 1 , ξ 2 , are i.i.d. random variables having density g .

As we see from (4.5), for this control process, X = [ 0 , M ] , A ( x ) = [ 0 , x ] , x [ 0 , M ] , and S = [ 0 , ) .

Choosing some bounded one-step reward function r ( x , a ) (which in the simplest case is ( 1 ) × the cost of a unit of water) and fixing a discount factor α ( 0 , 1 ) , we are faced with the problem of optimal water management, which is set as maximizing the expected long-term total discounted reward.

We assume that the density g (of the water inflow) is unknown, and it is approximated by some known density g ˜ (obtained, for instance, from statistical estimations).

Also, we assume the following:

  1. For each x [ 0 , M ] , the one-step reward r ( x , a ) is a continuous function of a [ 0 , x ] .

  2. Both densities g and g ˜ are bounded and continuous on ( 0 , ) .

In the verification of Assumption 1, (b) is a matter of simple calculations. Then, according to Proposition 2.1, there exist stationary optimal policies f and f ˜ , for the process (4.5) and, correspondingly, for the following approximating water release process:

X ˜ t = min { X ˜ t 1 a ˜ t + ξ ˜ t , M } , t = 1 , 2 , ,

where the i.i.d. random variables have density g ˜ . The application, for instance, of the policy f signifies that at t th period, the part f ( X t 1 ) of a current stock X t 1 is released.

Noting that all conditions of Theorem 1 are satisfied, and for distributions having densities,

V ( G , G ˜ ) = 0 g ( y ) g ˜ ( y ) d y ,

by (3.4) we have

sup x [ 0 , M ] Δ ( x ) 2 α b ( 1 α ) 2 0 g ( y ) g ˜ ( y ) d y ,

where b = sup ( x , a ) K r ( x , a ) , and Δ ( x ) is the stability index defined in (2.7).

Example 3

(Controlledenvironmental stochastic process”) The uncontrolled version of this discrete-time stochastic processes is defined by following recurrent equations (see, e.g., [23, Ch. 9]):

(4.6) X t = α ( ξ t ) X t 1 + φ ( ξ t ) , t = 1 , 2 , ,

where ξ 1 , ξ 2 are i.i.d. random vectors with values in the Euclidian space R k , and X t R ( t = 0 , 1 , 2 , ).

Processes of type (4.6) are used in modeling some phenomena in environmental science.

We will consider a controlled variant of (4.6), that is, the process

(4.7) X t = α ( ξ t ) X t 1 + φ ( a t , ξ t ) , t = 1 , 2 , ,

where a t A , and A is a given compact subset of the Euclidian space R m .

In this way, A ( x ) = A for all x X = R . In this example, the space S is R k .

Let r ( x , a ) be a certain, bounded by b , one-step reward function, which is continuous on R × A , and, moreover, for some L 0 < ,

(4.8) r ( x , a ) r ( y , a ) L 0 x y ,

for all x , y R and a A .

We assume that

(4.9) E α ( ξ 1 ) L 1 and α L 1 < 1 ,

and, for some L < ,

(4.10) φ ( a , s ) φ ( a , s ) L s s ,

for all s , s R k , and a A , and also for each s R k , the map a φ ( a , s ) is continuous. Using (4.7)–(4.10), it is easy to check the fulfillment of Assumption 2. Also, Assumption 1(a) and (b*) are fulfilled. Indeed, if u : R R is continuous and bounded, then the map a E u [ α ( ξ ) x + φ ( a , ξ ) ] is continuous by the dominated convergence theorem.

All of the above allow us to apply the stability inequality (3.18). Making use of the known relationship between the Dudley and Wasserstein metric, for the particular case where k = 1 (i.e., ξ t is a random variable), the mentioned inequality can be written as follows:

sup x R Δ ( x ) 2 3 / 2 α ( 1 α ) 2 b 1 α + L 0 L 1 α L 1 F ξ ( y ) F ξ ˜ ( y ) d y 1 / 2 ,

where F ξ and F ξ ˜ are the distribution functions of ξ and ξ ˜ , respectively, and ξ ˜ is generic for i.i.d. random vectors ξ ˜ 1 , ξ ˜ 2 , involved in the approximating process

X ˜ t = α ( ξ ˜ t ) X ˜ t 1 + φ ( a ˜ t , ξ ˜ t ) , t = 1 , 2 , .

Acknowledgement

We thank the reviewers for their careful revision of the manuscript and for suggestions, which allow us to correct and improve the presentation of the article.

  1. Author contributions: All authors read and approved the final manuscript.

  2. Conflict of interest: The authors state no conflict of interest.

  3. Data availability statement: No data, models, or code are generated or used during the study.

References

[1] S. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993. 10.1007/978-1-4471-3267-7Search in Google Scholar

[2] Ch. Andrieu, V. B. Tadić, and M. Vihola, On the stability of some controlled Markov chains and its applications to stochastic approximation with Markovian dynamic, Ann. Appl. Probab. 25 (2015), no. 1, 1–45, https://doi.org/10.1214/13-AAP953. Search in Google Scholar

[3] Y. F. Atchadé and G. Fort, Limit theorems for some adaptive MCMC algorithms with subgeometric kernels: Part II, Bernoulli 18 (2012), no. 3, 975–1001, https://doi.org/10.3150/11-BEJ360. Search in Google Scholar

[4] V. M. Zolotarev, On the continuity of stochastic sequences generated by recurrent processes, Theory Probab. Appl. 20 (1975), no. 4, 819–832, https://doi.org/10.1137/1120088. Search in Google Scholar

[5] N. V. Kartashov, Inequalities in stability and ergodicity theorems for Markov chains with a general phase space. II, Teor. Veroyatn. Primen. 30 (1985), no. 3, 478–485, (in Russian). 10.1137/1130063Search in Google Scholar

[6] V. V. Kalasnikov and S. A. Anichkin, Continuity of random sequences and approximation of Markov chains, Adv. Appl. Probab. 13 (1981), no. 2, 402–414. 10.2307/1426691Search in Google Scholar

[7] N. M. Van Dijk, Perturbation theory for unbounded Markov reward processes with applications to queuing, Adv. Appl. Probab. 20 (1988), no. 1, 99–111, https://doi.org/10.2307/1427272. Search in Google Scholar

[8] E. I. Gordienko, Stability estimates for controlled Markov chains with a minorant. Stability problems of stochastic models, J. Sov. Math. 40 (1988), 481–486, https://doi.org/10.1007/BF01083641. Search in Google Scholar

[9] R. Montes-de-Oca, A. Sakhanenko, and F. Salem-Silva, Estimates for perturbations of general discounted Markov control chains, Appl. Math. 30 (2003), no. 3, 287–304, https://doi.org/10.4064/am30-3-4. Search in Google Scholar

[10] E. Gordienko, E. Lemus-Rodriiiiiguez, and R. Montes-de-Oca, Discounted cost optimality problem: Stability with respect to weak metrics, Math. Methods Oper. Res. 68 (2008), no. 1, 77–96, https://doi.org/10.1007/s00186-007-0171-z. Search in Google Scholar

[11] E. I. Gordienko and F. S. Salem, Robustness inequality for Markov control processes with unbounded costs, Systems Control Lett. 33 (1998), no. 2, 125–130, DOI: https://doi.org/10.1016/S0167-6911(97)00077-7. https://doi.org/10.1016/S0167-6911(97)00077-7Search in Google Scholar

[12] R. M. Dudley, The speed of mean Glivenko-Cantelli convergence, Ann. Math. Statist. 40 (1969), 40–50, https://doi.org/10.1214/aoms/1177697802. Search in Google Scholar

[13] O. Hernández-Lerma, Adaptive Markov Control Processes, Applied Mathematical Sciences, vol. 79, Springer-Verlag, New York, 1989. 10.1007/978-1-4419-8714-3Search in Google Scholar

[14] M. Schäl, Estimation and control in discounted stochastic dynamic programming, Stochastics 20 (1987), no. 1, 51–71, https://doi.org/10.1080/17442508708833435. Search in Google Scholar

[15] R. Cavazos-Cadena, Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. 65 (1990), no. 2, 191–207, https://doi.org/10.1007/BF01102341. Search in Google Scholar

[16] O. Hernández-Lerma and S. I. Marcus, Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl. 46 (1985), no. 3, 227–235, https://doi.org/10.1007/BF00938426. Search in Google Scholar

[17] E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion, Kybernetika (Prague) 34 (1998), no. 2, 217–234. Search in Google Scholar

[18] K. Hinderer, Foundations of non-stationary dynamic programming with discrete time parameter, Lecture Notes in Operations Research and Mathematical Systems, Vol. 33, Springer-Verlag, Berlin-New York, 1970. 10.1007/978-3-642-46229-0Search in Google Scholar

[19] O. Hernández-Lerma and M. Muñoz de Özak, Discrete-time Markov control processes with discounted unbounded costs: optimality criteria, Kybernetika (Prague) 28 (1992), no. 3, 191–212. Search in Google Scholar

[20] S. T. Rachev, Probability Metrics and the Stability of Stochastic Models, John Wiley & Sons, Ltd., Chichester, 1991. Search in Google Scholar

[21] R. M. Dudley, Real analysis and probability, in: Cambridge Studies in Advanced Mathematics, vol. 74, Cambridge University Press, Cambridge, 2002, Revised reprint of the 1989 original. 10.1017/CBO9780511755347Search in Google Scholar

[22] S. Yakowitz, Dynamic programming applications in water resources, Water Resources 18 (1982), 673–696. 10.1029/WR018i004p00673Search in Google Scholar

[23] S. T. Rachev and L. Rüschendorf, Mass Transportation Problems. Vol. II: Applications, Probability and its Applications (New York), Springer-Verlag, New York, 1998. Search in Google Scholar

Received: 2022-02-11
Revised: 2022-06-13
Accepted: 2022-09-21
Published Online: 2022-11-24

© 2022 Evgueni Gordienko and Juan Ruiz de Chavez, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Regular Articles
  2. A random von Neumann theorem for uniformly distributed sequences of partitions
  3. Note on structural properties of graphs
  4. Mean-field formulation for mean-variance asset-liability management with cash flow under an uncertain exit time
  5. The family of random attractors for nonautonomous stochastic higher-order Kirchhoff equations with variable coefficients
  6. The intersection graph of graded submodules of a graded module
  7. Isoperimetric and Brunn-Minkowski inequalities for the (p, q)-mixed geominimal surface areas
  8. On second-order fuzzy discrete population model
  9. On certain functional equation in prime rings
  10. General complex Lp projection bodies and complex Lp mixed projection bodies
  11. Some results on the total proper k-connection number
  12. The stability with general decay rate of hybrid stochastic fractional differential equations driven by Lévy noise with impulsive effects
  13. Well posedness of magnetohydrodynamic equations in 3D mixed-norm Lebesgue space
  14. Strong convergence of a self-adaptive inertial Tseng's extragradient method for pseudomonotone variational inequalities and fixed point problems
  15. Generic uniqueness of saddle point for two-person zero-sum differential games
  16. Relational representations of algebraic lattices and their applications
  17. Explicit construction of mock modular forms from weakly holomorphic Hecke eigenforms
  18. The equivalent condition of G-asymptotic tracking property and G-Lipschitz tracking property
  19. Arithmetic convolution sums derived from eta quotients related to divisors of 6
  20. Dynamical behaviors of a k-order fuzzy difference equation
  21. The transfer ideal under the action of orthogonal group in modular case
  22. The multinomial convolution sum of a generalized divisor function
  23. Extensions of Gronwall-Bellman type integral inequalities with two independent variables
  24. Unicity of meromorphic functions concerning differences and small functions
  25. Solutions to problems about potentially Ks,t-bigraphic pair
  26. Monotonicity of solutions for fractional p-equations with a gradient term
  27. Data smoothing with applications to edge detection
  28. An ℋ-tensor-based criteria for testing the positive definiteness of multivariate homogeneous forms
  29. Characterizations of *-antiderivable mappings on operator algebras
  30. Initial-boundary value problem of fifth-order Korteweg-de Vries equation posed on half line with nonlinear boundary values
  31. On a more accurate half-discrete Hilbert-type inequality involving hyperbolic functions
  32. On split twisted inner derivation triple systems with no restrictions on their 0-root spaces
  33. Geometry of conformal η-Ricci solitons and conformal η-Ricci almost solitons on paracontact geometry
  34. Bifurcation and chaos in a discrete predator-prey system of Leslie type with Michaelis-Menten prey harvesting
  35. A posteriori error estimates of characteristic mixed finite elements for convection-diffusion control problems
  36. Dynamical analysis of a Lotka Volterra commensalism model with additive Allee effect
  37. An efficient finite element method based on dimension reduction scheme for a fourth-order Steklov eigenvalue problem
  38. Connectivity with respect to α-discrete closure operators
  39. Khasminskii-type theorem for a class of stochastic functional differential equations
  40. On some new Hermite-Hadamard and Ostrowski type inequalities for s-convex functions in (p, q)-calculus with applications
  41. New properties for the Ramanujan R-function
  42. Shooting method in the application of boundary value problems for differential equations with sign-changing weight function
  43. Ground state solution for some new Kirchhoff-type equations with Hartree-type nonlinearities and critical or supercritical growth
  44. Existence and uniqueness of solutions for the stochastic Volterra-Levin equation with variable delays
  45. Ambrosetti-Prodi-type results for a class of difference equations with nonlinearities indefinite in sign
  46. Research of cooperation strategy of government-enterprise digital transformation based on differential game
  47. Malmquist-type theorems on some complex differential-difference equations
  48. Disjoint diskcyclicity of weighted shifts
  49. Construction of special soliton solutions to the stochastic Riccati equation
  50. Remarks on the generalized interpolative contractions and some fixed-point theorems with application
  51. Analysis of a deteriorating system with delayed repair and unreliable repair equipment
  52. On the critical fractional Schrödinger-Kirchhoff-Poisson equations with electromagnetic fields
  53. The exact solutions of generalized Davey-Stewartson equations with arbitrary power nonlinearities using the dynamical system and the first integral methods
  54. Regularity of models associated with Markov jump processes
  55. Multiplicity solutions for a class of p-Laplacian fractional differential equations via variational methods
  56. Minimal period problem for second-order Hamiltonian systems with asymptotically linear nonlinearities
  57. Convergence rate of the modified Levenberg-Marquardt method under Hölderian local error bound
  58. Non-binary quantum codes from constacyclic codes over 𝔽q[u1, u2,…,uk]/⟨ui3 = ui, uiuj = ujui
  59. On the general position number of two classes of graphs
  60. A posteriori regularization method for the two-dimensional inverse heat conduction problem
  61. Orbital stability and Zhukovskiǐ quasi-stability in impulsive dynamical systems
  62. Approximations related to the complete p-elliptic integrals
  63. A note on commutators of strongly singular Calderón-Zygmund operators
  64. Generalized Munn rings
  65. Double domination in maximal outerplanar graphs
  66. Existence and uniqueness of solutions to the norm minimum problem on digraphs
  67. On the p-integrable trajectories of the nonlinear control system described by the Urysohn-type integral equation
  68. Robust estimation for varying coefficient partially functional linear regression models based on exponential squared loss function
  69. Hessian equations of Krylov type on compact Hermitian manifolds
  70. Class fields generated by coordinates of elliptic curves
  71. The lattice of (2, 1)-congruences on a left restriction semigroup
  72. A numerical solution of problem for essentially loaded differential equations with an integro-multipoint condition
  73. On stochastic accelerated gradient with convergence rate
  74. Displacement structure of the DMP inverse
  75. Dependence of eigenvalues of Sturm-Liouville problems on time scales with eigenparameter-dependent boundary conditions
  76. Existence of positive solutions of discrete third-order three-point BVP with sign-changing Green's function
  77. Some new fixed point theorems for nonexpansive-type mappings in geodesic spaces
  78. Generalized 4-connectivity of hierarchical star networks
  79. Spectra and reticulation of semihoops
  80. Stein-Weiss inequality for local mixed radial-angular Morrey spaces
  81. Eigenvalues of transition weight matrix for a family of weighted networks
  82. A modified Tikhonov regularization for unknown source in space fractional diffusion equation
  83. Modular forms of half-integral weight on Γ0(4) with few nonvanishing coefficients modulo
  84. Some estimates for commutators of bilinear pseudo-differential operators
  85. Extension of isometries in real Hilbert spaces
  86. Existence of positive periodic solutions for first-order nonlinear differential equations with multiple time-varying delays
  87. B-Fredholm elements in primitive C*-algebras
  88. Unique solvability for an inverse problem of a nonlinear parabolic PDE with nonlocal integral overdetermination condition
  89. An algebraic semigroup method for discovering maximal frequent itemsets
  90. Class-preserving Coleman automorphisms of some classes of finite groups
  91. Exponential stability of traveling waves for a nonlocal dispersal SIR model with delay
  92. Existence and multiplicity of solutions for second-order Dirichlet problems with nonlinear impulses
  93. The transitivity of primary conjugacy in regular ω-semigroups
  94. Stability estimation of some Markov controlled processes
  95. On nonnil-coherent modules and nonnil-Noetherian modules
  96. N-Tuples of weighted noncommutative Orlicz space and some geometrical properties
  97. The dimension-free estimate for the truncated maximal operator
  98. A human error risk priority number calculation methodology using fuzzy and TOPSIS grey
  99. Compact mappings and s-mappings at subsets
  100. The structural properties of the Gompertz-two-parameter-Lindley distribution and associated inference
  101. A monotone iteration for a nonlinear Euler-Bernoulli beam equation with indefinite weight and Neumann boundary conditions
  102. Delta waves of the isentropic relativistic Euler system coupled with an advection equation for Chaplygin gas
  103. Multiplicity and minimality of periodic solutions to fourth-order super-quadratic difference systems
  104. On the reciprocal sum of the fourth power of Fibonacci numbers
  105. Averaging principle for two-time-scale stochastic differential equations with correlated noise
  106. Phragmén-Lindelöf alternative results and structural stability for Brinkman fluid in porous media in a semi-infinite cylinder
  107. Study on r-truncated degenerate Stirling numbers of the second kind
  108. On 7-valent symmetric graphs of order 2pq and 11-valent symmetric graphs of order 4pq
  109. Some new characterizations of finite p-nilpotent groups
  110. A Billingsley type theorem for Bowen topological entropy of nonautonomous dynamical systems
  111. F4 and PSp (8, ℂ)-Higgs pairs understood as fixed points of the moduli space of E6-Higgs bundles over a compact Riemann surface
  112. On modules related to McCoy modules
  113. On generalized extragradient implicit method for systems of variational inequalities with constraints of variational inclusion and fixed point problems
  114. Solvability for a nonlocal dispersal model governed by time and space integrals
  115. Finite groups whose maximal subgroups of even order are MSN-groups
  116. Symmetric results of a Hénon-type elliptic system with coupled linear part
  117. On the connection between Sp-almost periodic functions defined on time scales and ℝ
  118. On a class of Harada rings
  119. On regular subgroup functors of finite groups
  120. Fast iterative solutions of Riccati and Lyapunov equations
  121. Weak measure expansivity of C2 dynamics
  122. Admissible congruences on type B semigroups
  123. Generalized fractional Hermite-Hadamard type inclusions for co-ordinated convex interval-valued functions
  124. Inverse eigenvalue problems for rank one perturbations of the Sturm-Liouville operator
  125. Data transmission mechanism of vehicle networking based on fuzzy comprehensive evaluation
  126. Dual uniformities in function spaces over uniform continuity
  127. Review Article
  128. On Hahn-Banach theorem and some of its applications
  129. Rapid Communication
  130. Discussion of foundation of mathematics and quantum theory
  131. Special Issue on Boundary Value Problems and their Applications on Biosciences and Engineering (Part II)
  132. A study of minimax shrinkage estimators dominating the James-Stein estimator under the balanced loss function
  133. Representations by degenerate Daehee polynomials
  134. Multilevel MC method for weak approximation of stochastic differential equation with the exact coupling scheme
  135. Multiple periodic solutions for discrete boundary value problem involving the mean curvature operator
  136. Special Issue on Evolution Equations, Theory and Applications (Part II)
  137. Coupled measure of noncompactness and functional integral equations
  138. Existence results for neutral evolution equations with nonlocal conditions and delay via fractional operator
  139. Global weak solution of 3D-NSE with exponential damping
  140. Special Issue on Fractional Problems with Variable-Order or Variable Exponents (Part I)
  141. Ground state solutions of nonlinear Schrödinger equations involving the fractional p-Laplacian and potential wells
  142. A class of p1(x, ⋅) & p2(x, ⋅)-fractional Kirchhoff-type problem with variable s(x, ⋅)-order and without the Ambrosetti-Rabinowitz condition in ℝN
  143. Jensen-type inequalities for m-convex functions
  144. Special Issue on Problems, Methods and Applications of Nonlinear Analysis (Part III)
  145. The influence of the noise on the exact solutions of a Kuramoto-Sivashinsky equation
  146. Basic inequalities for statistical submanifolds in Golden-like statistical manifolds
  147. Global existence and blow up of the solution for nonlinear Klein-Gordon equation with variable coefficient nonlinear source term
  148. Hopf bifurcation and Turing instability in a diffusive predator-prey model with hunting cooperation
  149. Efficient fixed-point iteration for generalized nonexpansive mappings and its stability in Banach spaces
Downloaded on 7.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/math-2022-0514/html
Scroll to top button