Skip to main content
Article Open Access

Markov decision processes approximation with coupled dynamics via Markov deterministic control systems

  • EMAIL logo , , and
Published/Copyright: October 24, 2023

Abstract

This article presents an approximation of discrete Markov decision processes with small noise on Borel spaces with an infinite horizon and an expected total discounted cost by the corresponding deterministic Markov process. In both cases, the dynamics evolve through a system consisting of two coupled difference equations. It is assumed that the difference equations of the system are perturbed by a small noise. Under our assumptions, a bound for the stability index is given, and the optimal cost convergence rate is estimated using a small perturbation parameter. Moreover, the convergence of the optimal policy on compact subsets is verified. Finally, two examples are presented to illustrate the developed theory.

MSC 2010: 90C40; 93C55; 93C73; 93E20

1 Introduction

This article deals with the so-called discrete-time Markov decision processes (MDPs) with an infinite horizon and total discounted cost [15]. The importance of working with MDPs lies in the wide range of applications in various disciplines, e.g., engineering, computer science, communications, and economics [6,7]. The main problem in MDPs is to determine an optimal policy and the optimal value function. To characterize and determine the solutions of MDPs, the dynamic programming (DP) approach [2,8] is available.

In this work, the MDPs of interest are those that evolve through dynamics consisting of two coupled difference equations, as shown in equations (1) and (2). Equation (1) models the transitions of system states, where the set of all states is denoted by X and its elements are called x -states. Similarly, equation (2) models the change of the system’s parameters; the set of all parameters is denoted by Γ , and its elements are called α -states. Let ε 0 and δ 0 be positive numbers and let ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] , then we consider disturbances { ξ t ( ε ) } and { η t ( δ ) } , which are sequences of independent and identically distributed random elements with values in some Borel spaces ( S 1 , r 1 ) and ( S 2 , r 2 ) (metric perturbation spaces [9] or noise spaces [10]), respectively. Moreover, suppose that there exist s 1 S 1 and s 2 S 2 such that s 1 = ξ t ( 0 ) and s 2 = η t ( 0 ) , each element of the above sequences depends on numerical parameters ε and δ such that E r 1 ( ξ ( ε ) , s 1 ) 0 when ε 0 and E r 2 ( η ( δ ) , s 2 ) 0 when δ 0 , where ξ and η are generic elements of { ξ t ( ε ) } and { η t ( δ ) } , respectively. In this framework, we are interested in the following problems:

  • To study approximations of MDPs by the deterministic control process (see equations (3) and (4)). In particular, we are interested in ensuring that the optimal policy of the deterministic system is asymptotically optimal for the random system (see Theorem 1 and Remark 4).

  • To analyze the convergence of the optimal value function and the optimal policy of the stochastic system when ε 0 and δ 0 (see Theorem 2).

The following briefly describes work related to the problems discussed in this manuscript. In a study by Liptser et al. [11], the problem of approximating a continuous-time stochastic control process by a deterministic process was considered. In this article, the authors demonstrate that the stochastic problem can be approximated by a deterministic one when the noise is small and the fluctuations become fast. In this context, it is shown that the optimal control of the deterministic problem is asymptotically optimal for stochastic problems. In the continuous case, Dupuis and Kushner [12] addressed a similar problem, i.e., when the effects of noise in a physical system are small, these authors performed an asymptotic analysis of the diffusion approximation and used it for the desired estimates in the original system. For discrete-time MDPs, these classes of problems were studied by Cruz-Suarez and Ilhuicatzi-Roldan [9] and Cruz-Suarez et al. [13], where the dynamic of the system is described by a single difference equation. Convergence between models was also addressed in by Kara and Yuksel [14]. However, convergence is studied using sequences belonging to the set of admissible state-action pairs, which is assumed to be a subset of a given Euclidean space. Moreover, this study is carried out under the assumption that the action space is a compact set and that the cost function is bounded. Now, when considering MDPs that are developed with respect to equations (1) and (2), the results found in the study by Cruz-Suarez et al. [13] are generalized. The approach of using coupled equations can be applied, e.g., by considering a random discount factor [1518], where the second difference equation refers to the evolution of the random discount factor.

The methodology for solving the problems described above is to impose Lipschitz continuity [19,20] constraints on the components of the control model and to apply DP techniques. Specifically, we assume Lipschitz conditions for the functions c , F , and G involved in the dynamic system composed of two coupled difference equations (see equations (1) and (2)). A direct consequence of this assumption is the Lipschitz continuity of the optimal cost, which corresponds to an additional contribution to the present manuscript. This approach ensures the following three important aspects:

  • The existence of an upper bound for the stability index [10,21,22] when we apply the optimal policy of the deterministic system. Consequently, it results that the optimal policy of the deterministic system is asymptotically optimal for the stochastic system (see Theorem 1 and Remark 4).

  • A convergence rate of the optimal cost function for the random system with respect to the deterministic system.

  • The uniform convergence of the optimal stochastic policy to the deterministic policy when ε 0 and δ 0 on compact subsets of the state space.

This article is structured as follows: Section 2 presents the basic theory of MDPs with states evolving with dynamics consisting of two coupled difference equations; Section 3 provides the approximation problem statement for the value function and the optimal policy; Section 4 presents the results that provide the bound for the stability index δ ˆ ε , δ , the convergence rate of the optimal cost, and the convergence of the optimal policy on compact subsets; Section 5 illustrates the developed theory with two examples. The first relates to a consumption-investment problem [15,23]. The second example is a control problem with small additive noise. For both problems, the upper bounds for the stability index and the convergence rate of the optimal value function are given explicitly; and finally, in Section 6, concluding remarks are given.

2 Markov control model

Consider the following Markov model:

( X × Γ , A , { A ( x , α ) ( x , α ) X × Γ } , Q , c ) ,

where X × Γ and A are Borel spaces and are called the state space and the action space, respectively; { A ( x , α ) ( x , α ) X × Γ } is a family of non-empty measurable subsets A ( x , α ) of A , where A ( x , α ) denotes the set of feasible actions (controls) when the system is in state ( x , α ) X × Γ . The set K of feasible state-actions is defined as follows:

K { ( x , α , a ) ( x , α ) X × Γ , a A ( x , α ) } ,

which is a measurable subset of X × Γ × A ; the next component is a stochastic kernel Q on X × Γ given K , i.e., Q ( x , α , a ) is a probability measure on X × Γ for each ( x , α , a ) K and Q ( B ) is a measurable function on K for each B ( X × Γ ) , where ( X × Γ ) denotes the Borel σ -algebra of X × Γ ; c : K R is a measurable function called the one-stage cost function.

Remark 1

In the subsequent development, the metrics of the spaces X , Γ , and A will be denoted by d x , d α , and d 2 , respectively. Consequently, the following metric is defined on X × Γ :

d 1 ( ( x , α ) , ( x , α ) ) = max { d x ( x , x ) , d α ( α , α ) }

for all ( x , α ) , ( x , α ) X × Γ . Furthermore, on K the metric d is defined as follows:

d ( ( x , α , a ) , ( x , α , a ) ) = max { d 1 ( ( x , α ) , ( x , α ) ) , d 2 ( a , a ) }

for all ( x , α , a ) , ( x , α , a ) K .

The dynamic of the system is described below. Suppose that at time t , t { 0 , 1 , } , the system occupies state ( x t , α t ) = ( x , α ) X × Γ . Then, the decision-maker (or controller) chooses a control a t = a A ( x , α ) . Consequently, two things happen:

  1. a cost c ( x t , α t , a t ) is incurred, and

  2. the system jumps to a state ( x t + 1 , α t + 1 ) = ( x , α ) according to the transition law Q ( x , α , a ) (i.e., Q ( B x , α , a ) = P r ( ( x t + 1 , α t + 1 ) B x t = x , α t = α , a t = a ) , B ( X × Γ ) , and ( x , α , a ) K ).

Then, the system moves to a state ( x t + 1 , α t + 1 ) , and the process is repeated.

In this manuscript, the transition law Q is assumed to be induced by a system of difference equations, i.e.,

(1) x t + 1 = F ( x t , α t , a t , ξ t ( ε ) ) ,

(2) α t + 1 = G ( α t , η t ( δ ) )

where t = 0 , 1 , , with ( x 0 , α 0 ) X × Γ given, where F : K × S 1 X and G : Γ × S 2 Γ are measurable functions. Let ε 0 and δ 0 be fixed positive numbers and let ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] and the disturbances { ξ t ( ε ) } and { η t ( δ ) } are sequences of independent and identically distributed (i.i.d.) random elements with values in some Borel spaces ( S 1 , r 1 ) and ( S 2 , r 2 ) , respectively.

Remark 2

It is assumed that the random variables ξ : Ω 1 S 1 and η : Ω 2 S 2 are considered to be defined in the probability spaces ( Ω 1 , 1 , P 1 ) and ( Ω 2 , 2 , P 2 ) , where ξ and η are generic elements of { ξ t ( ε ) } and { η t ( δ ) } , respectively. Moreover, ( Ω 1 × Ω 2 , 1 2 , P ) denotes the product probability space, where 1 2 is the σ -algebra product and P is the product measure induced by the Ionescu-Tulcea theorem [3]. The expected value with respect to the probability measure P will be denoted by E .

In the following, we will consider the space S S 1 × S 2 with the metric:

r ( ( ξ ( ω 1 ) , η ( ω 2 ) ) , ( ξ ˆ ( ω 1 ) , η ˆ ( ω 2 ) ) ) = max { r 1 ( ξ ( ω 1 ) , ξ ˆ ( ω 1 ) ) , r 2 ( η ( ω 2 ) , η ˆ ( ω 2 ) ) }

for all ( ξ ( ω 1 ) , η ( ω 2 ) ) , ( ξ ˆ ( ω 1 ) , η ˆ ( ω 2 ) ) S , where ω 1 , ω 1 Ω 1 and ω 2 , ω 2 Ω 2 .

Now, consider the random vector χ t ( ε , δ ) ( ξ t ( ε ) , η t ( δ ) ) for all t 0 , then the difference equations (1) and (2) can be expressed as follows:

( x t + 1 , α t + 1 ) = H ( x t , α t , a t , χ t ( ε , δ ) ) ( F ( x t , α t , a t , ξ t ( ε ) ) , G ( α t , η t ( δ ) ) ) .

Suppose that there exist s 1 S 1 and s 2 S 2 such that s 1 = ξ ( 0 ) and s 2 = η ( 0 ) . Each element of the above sequences depends on numerical parameters ε and δ such that E r 1 ( ξ ( ε ) , s 1 ) 0 when ε 0 and E r 2 ( η ( δ ) , s 2 ) 0 when δ 0 . On the other hand, a deterministic MDP is considered whose dynamics evolve according to the difference equations shown in equations (3) and (4):

(3) x t + 1 = F ( x t , α t , a t , s 1 ) ,

(4) α t + 1 = G ( α t , s 2 ) ,

for all t = 0 , 1 , . Note that χ ( 0 , 0 ) = ( ξ ( 0 ) , η ( 0 ) ) = ( s 1 , s 2 ) , so the joint dynamics given by equations (3) and (4) is denoted as follows:

( x t + 1 , α t + 1 ) = H ( x t , α t , a t , χ t ( 0 , 0 ) ) ( F ( x t , α t , a t , ξ t ( 0 ) ) , G ( α t , η t ( 0 ) ) ) .

Under this framework, we are interested in the approximation of MDPs that evolve through (1) and (2) by the deterministic control process given by equations (3) and (4).

When the processes of x -states and α -states are specified by the dynamical model given by equations (1) and (2), the transition law takes the form

(5) Q ( B x , α , a ) P r [ ( x t + 1 , α t + 1 ) B x t = x , α t = α , a t = a ] = S 1 × S 2 1 B ( H ( x , α , a , s ε , δ ) ) μ ( d s ) = μ ( { s S 1 × S 2 : H ( x , α , a , s ε , δ ) B } ) ,

where B ( X × Γ ) , 1 B ( ) denotes the indicator function of B , and μ is the common distribution of the random vector χ t ( ε , δ ) .

On the other hand, when the processes of x -states and α -states are specified by the dynamical model equations (3) and (4), the transition law takes the form

(6) Q H ( B x , α , a ) 1 B ( H ( x , α , a , χ ( 0 , 0 ) ) ) ,

where B ( X × Γ ) and ( x , α , a ) K . Thus, the Markov control model is given by ( X × Γ , A , { A ( x , α ) : ( x , α ) X × Γ } , Q H , c ) .

A control policy π is a sequence { π t : t = 0 , 1 , } , where for each t = 0 , 1 , , π t ( h t ) is a conditional probability on the Borel σ -algebra ( A ) , given the history h t ( x 0 , α 0 , a 0 , , x t 1 , α t 1 , a t 1 , x t , α t ) , such that π t ( A ( x t , α t ) h t ) = 1 . The set of all policies is denoted by Π .

Let F { ϕ : X × Γ A ϕ is measurable and ϕ ( x , α ) A ( x , α ) , ( x , α ) X × Γ } . A sequence π = { ϕ t t = 0 , 1 , } of functions ϕ t F is called a Markov policy. A Markov policy π = { ϕ t t = 0 , 1 , } is called a stationary policy if ϕ t = ϕ F for all t = 0 , 1 , .

Given initial states ( x 0 = x , α 0 = α ) X × Γ and arbitrary policy π Π , there exists a probability measure P ( x , α ) π induced by the triplet ( x , α , π ) over the space Ω = ( X × Γ × A ) , with as the σ -algebra product. The existence of this probability measure is verified in an analogous way as in the study by Gonzalez-Hernandez et al. [18]. The corresponding expectation operator is denoted by E ( x , α ) π . The triplet ( x , α , π ) determines a stochastic process ( Ω , , P ( x , α ) π , { ( x t , α t ) } ) called the Markov decision process. Subsequently, we denote y = ( x , α ) and Y = X × Γ .

3 Problem statement

Consider a deterministic Markov control model ( Y , A , { A ( y ) : y Y } , Q H , c ) as presented in Section 2. In addition, consider a stochastic control system with the same state space Y , control space A , admissible sets A ( y ) , y Y , and cost function c , but with the dynamical system described as follows:

y t + 1 = H ( y t , a t , χ t ( ε , δ ) ) , t = 0 , 1 , .

Note that when the system is controlled by a deterministic policy, in the stochastic transition law (5), the stochastic system becomes a deterministic system with the transition law (6), when ε 0 and δ 0 .

For each policy π Π and initial state ( x , α ) Y , consider the expected total discounted cost, denoted by V ˆ ε , δ ( x , α , π ) , and defined as follows:

V ˆ ε , δ ( x , α , π ) = E ( x , α ) π t = 0 β t c ( x t , α t , a t ) ,

where β ( 0 , 1 ) is a discount factor.

Thus, the optimal control problem is to find a policy π * Π such that

V ˆ ε , δ ( x , α , π * ) = inf π Π V ˆ ε , δ ( x , α , π ) ,

( x , α ) X × Γ . Then, the optimal value function (optimal cost) is defined as V ε , δ ( x , α ) inf π Π V ˆ ε , δ ( x , α , π ) , ( x , α ) X × Γ . π * is called the optimal policy, while V ε , δ ( x , α ) is called the optimal value function, for ( x , α ) Y . In the deterministic case, when ε = 0 and δ = 0 , V ε , δ will be denoted by V .

In the next section, we establish conditions to perform an asymptotic analysis of the optimal solution for the stochastic system.

4 Conditions and results

In this section, we introduce three blocks of conditions to study the convergence of the stochastic system defined by equations (1) and (2). In addition, a bound is given for the stability index, which depends on a small noise disturbance parameter δ ˆ ε , δ . In the following, χ ( ε , δ ) denotes a generic element of { χ t ( ε , δ ) } .

Condition 1

  1. A ( x , α ) is a compact set for each ( x , α ) Y and the set-valued mapping ( x , α ) A ( x , α ) is upper semicontinuous with respect to the Hausdorff metric.

  2. The cost function c ( y , ) is lower semicontinuous on A ( y ) for every y Y .

  3. For every bounded continuous function U : Y R ,

    U ( y , a ) E U [ H ( y , a , χ ( ε , δ ) ) ] ,

    where ( y , a ) K is a continuous function on K and E is introduced in Remark 2.

Condition 1 is necessary to ensure the existence of minimizers in the corresponding optimality equation. Condition 1(a) is similar to Assumption 1 presented in the study by Gordienko et al. [10].

Let Z : X × Γ [ 1 , ) be a measurable function. If U is a real-valued function over X × Γ , then its weighted norm is defined as follows:

U Z sup ( x , α ) X × Γ U ( x , α ) Z ( x , α ) ,

where Z denotes the weight function. Let B Z be the Banach space of measurable functions U : Y R such that U Z < .

Condition 2

There exist a constant γ such that γ ( β , 1 ) and a weight function W on Y such that for all ε [ 0 , ε 0 ] , δ [ 0 , δ 0 ] :

  1. c ( y , a ) W ( y ) , ( y , a ) K .

  2. E W [ H ( y , a , χ ( ε , δ ) ) ] γ β W ( y ) , ( y , a ) K .

  3. For every state y Y , the function

    W ( y , a ) E W [ H ( y , a , χ ( ε , δ ) ) ]

    is continuous in a A ( y ) .

Condition 2 is used to provide the existence of solutions of the optimality equation [10]. In addition, under Conditions 1 and 2, the DP approach is valid. Thus, for each ( x , α ) X × Γ , the following relation holds:

V ε , δ ( x , α ) = inf a A ( x , α ) c ( x , α , a ) + β S 1 × S 2 V ε , δ ( y ) Q ( y x , α , a ) .

One method of approximating the value function is to use value iterations, which are defined as follows:

V ε , δ n ( x , α ) = inf a A ( x , α ) c ( x , α , a ) + β S 1 × S 2 V ε , δ n 1 ( y ) Q ( y x , α , a ) ,

where ( x , α ) X × Γ and n = 1 , 2 , , with V ε , δ 0 ( ) = 0 .

Condition 3

There exist constants L 0 , L 1 , L 2 , x , and L 2 , α such that

  1. c ( y , a ) c ( y , a ) L 0 d 1 ( y , y ) for each ( y , a ) , ( y , a ) K .

  2. d 1 ( H ( y , a , ( s 1 , s 2 ) ) , H ( y , a , ( s 1 , s 2 ) ) ) L 1 d 1 ( y , y ) for ( y , a ) , ( y , a ) K , and for all ( s 1 , s 2 ) S 1 × S 2 , with L 1 1 .

  3. The functions F and G satisfy: d x ( F ( x , α , a , s 1 ) , F ( x , α , a , s 1 ) ) L 2 , x r 1 ( s 1 , s 1 ) for each ( x , α , a ) K and for all s 1 , s 1 S 1 .

  4. d α ( G ( α , s 2 ) , G ( α , s 2 ) ) L 2 , α r 2 ( s 2 , s 2 ) , for any α Γ and for all s 2 , s 2 S 2 .

Remark 3

Under Condition 3, the cost function and the function H involved in the dynamics of the states are Lipschitz functions with respect to the variable y Y . Furthermore, the functions F and G are Lipschitz functions with respect to ξ and η , respectively.

If Conditions 1 and 2 are satisfied, with similar arguments to those in [4] (taking into account the respective changes), the existence of a stationary optimal policy π ε , δ = { f ε , δ , f ε , δ , } is guaranteed, where f ε , δ F . Moreover, the following facts hold:

  1. V ˆ ε , δ ( x , α , π ε , δ ) = V ε , δ ( x , α ) B W .

  2. E V ε , δ [ H ( y , a , χ ( ε , δ ) ) ] < , for each ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] , ( y , a ) K .

Moreover, the optimal policy for the deterministic control problem is denoted by π 0 * = { f * , f * , } with f * F .

Let L be the Kantorovich metric defined on ( S , B s ) :

(7) L ( χ , χ ) = sup { E φ ( χ ) E φ ( χ ) φ such that φ ( s ) φ ( s ) r ( s , s ) , s , s S } .

On the other hand, the stability index Δ ε , δ is defined as follows:

Δ ε , δ ( y , π ) V ˆ ε , δ ( y , π ) V ε , δ ( y ) , y Y , π Π .

The index Δ ε , δ ( y , π ) expresses the excess of the discounted cost when the policy π is applied to the stochastic control process with respect to equations (1) and (2) for ε , δ > 0 and y Y . The quality of the approximation for the stochastic system by the policy π 0 * will be measured by the stability index Δ ε , δ ( y , π 0 * ) (see [10,13]), i.e.,

Δ ε , δ ( y , π 0 * ) V ˆ ε , δ ( y , π 0 * ) V ε , δ ( y ) , y Y .

In addition, we define a small-noise disturbance parameter δ ˆ ε , δ as follows:

δ ˆ ε , δ E max { r 1 ( ξ ( ε ) , ξ ( 0 ) ) , r 2 ( η ( δ ) , η ( 0 ) ) }

for ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] . Theorem 1 provides an upper bound for Δ ε , δ ( y , π 0 * ) involving the noise parameter δ ˆ ε , δ .

The following lemmas are applied to prove Theorems 1 and 2.

Lemma 1

Under Conditions 1,2, and 3(a) and (b) for each ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] fixed, we have that V ε , δ n is a Lipschitz function for all n = 1 , 2 , . Consequently, V ε , δ is a Lipschitz function with Lipschitz constant L 0 1 β L 1 .

Proof

Let be ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] . The proof will be made by induction. For n = 1 , note that ( x , α ) , ( x , α ) Y

V ε , δ 1 ( x , α ) V ε , δ 1 ( x , α ) = inf a A { c ( x , α , a ) } inf a A { c ( x , α , a ) } sup a A c ( x , α , a ) c ( x , α , a ) L 0 d 1 ( ( x , α ) , ( x , α ) ) .

For n > 1 , suppose that V ε , δ n 1 is a Lipschitz function with constant L 0 i = 0 n 2 ( β L 1 ) i . Then,

V ε , δ n ( x , α ) V ε , δ n ( x , α ) = inf a A { c ( x , α , a ) + β E V ε , δ n 1 [ H ( x , α , a , χ ( ε , δ ) ) ] } inf a A { c ( x , α , a ) + β E V ε , δ n 1 [ H ( x , α , a , χ ( ε , δ ) ) ] } sup a A { c ( x , α , a ) c ( x , α , a ) + β E V ε , δ n 1 [ H ( x , α , a , χ ( ε , δ ) ) ] β E V ε , δ n 1 [ H ( x , α , a , χ ( ε , δ ) ) ] } L 0 d 1 ( ( x , α ) , ( x , α ) ) + β sup a A E L 0 i = 0 n 2 ( β L 1 ) i d 1 ( H ( x , α , a , χ ( ε , δ ) ) , H ( x , α , a , χ ( ε , δ ) ) ) L 0 d 1 ( ( x , α ) , ( x , α ) ) + β L 0 i = 0 n 2 ( β L 1 ) i L 1 d 1 ( ( x , α ) , ( x , α ) ) = L 0 + L 0 i = 1 n 1 ( β L 1 ) i d 1 ( ( x , α ) , ( x , α ) ) = L 0 i = 0 n 1 ( β L 1 ) i d 1 ( ( x , α ) , ( x , α ) ) .

Therefore, V ε , δ n is a Lipschitz function with constant L 0 i = 0 n 1 ( β L 1 ) i , for n N .

Now, to verify the second part, note that β L 1 < 1 , thus i = 0 ( β L 1 ) i = 1 1 β L 1 . In addition, since V ε , δ n V ε , δ when n , it follows that V ε , δ is a Lipschitz function with a Lipschitz constant L 0 1 β L 1 .□

Lemma 2

Under Conditions1 and 2(b) for each ε [ 0 , ε 0 ] , δ [ 0 , δ 0 ] , and t 1 , it holds that

(8) E y π 0 * sup a A ( y t 1 ) { E W [ H ( y t 1 , a , χ t 1 ( ε , δ ) ) ] } γ β t 1 W ( y ) .

Proof

Consider ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] . From Condition 2(b), we have that

E W [ H ( y t 1 , a , χ t 1 ( ε , δ ) ) ] γ β W ( y t 1 )

for any t 1 fixed. Then, it is obtained that

E y π 0 * sup a A ( y t 1 ) { E W [ H ( y t 1 , a , χ t 1 ( ε , δ ) ) ] } γ β E y π 0 * W ( y t 1 )

for any t 1 fixed. Now, consider h ˆ t = { y , a 1 , y 1 , a 2 , , y t 1 , a t } , the history of the joint process described by equations (1) and (2) under policy π 0 * = { f * , f * , } , then

E y π 0 * W ( y t 1 ) = E y π 0 * W ( H ( y t 2 , a t 2 , χ t 2 ( ε , δ ) ) ) = E y π 0 * [ E W ( H ( y t 2 , a t 2 , χ t 2 ( ε , δ ) ) ) h ˆ t 2 ] γ β E y π 0 * [ W ( y t 2 ) h ˆ t 2 ] = γ β E y π 0 * [ W ( H ( y t 3 , a t 3 , χ t 3 ( ε , δ ) ) ) h ˆ t 2 ] = γ β E y π 0 * W ( H ( y t 3 , a t 3 , χ t 3 ( ε , δ ) ) ) .

Thus,

E y π 0 * sup a A ( y t 1 ) { E W [ H ( y t 1 , a , χ t 1 ( ε , δ ) ) ] } γ β 2 E y π 0 * W ( H ( y t 3 , a t 3 , χ t 3 ( ε , δ ) ) ) .

Continuing with this procedure, it is obtained that

E y π 0 * sup a A ( y t 1 ) { E W [ H ( y t 1 , a , χ t 1 ( ε , δ ) ) ] } γ β t 1 W ( y ) .

The proof of the following theorem is based on Theorem 1 in [10].

Theorem 1

Under Conditions 13, it holds that

Δ ε , δ ( y , π 0 * ) C ˆ ( y ) δ ˆ ε , δ , y Y ,

where

C ˆ ( y ) = 2 β L 0 max { L 2 , x , L 2 , α } 1 β L 1 1 1 β + β ( 1 γ ) 2 W ( y ) ,

for each ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] .

Proof

Note that for ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] , V ε , δ and f ε , δ satisfy the following optimality equation:

(9) V ε , δ ( y ) = inf a A ( y ) { c ( y , a ) + β E V ε , δ [ H ( y , a , χ ( ε , δ ) ) ] } = c ( y , f ε , δ ( y ) ) + β E V ε , δ [ H ( y , f ε , δ ( y ) , χ ( ε , δ ) ) ] .

Denote

(10) R ε , δ ( y , a ) c ( y , a ) + β E V ε , δ [ H ( y , a , χ ( ε , δ ) ) ] , ( y , a ) K ,

and consider h ˆ t = { y , a 1 , y 1 , a 2 , , y t 1 , a t } as in the proof of Lemma 2. By the Markov property, it is proved that

(11) E π 0 * [ β V ε , δ ( y t ) h ˆ t ] = R ε , δ ( y t 1 , a t ) c ( y t , a t ) inf a A ( y t 1 ) R ε , δ ( y t 1 , a ) + inf a A ( y t 1 ) R ε , δ ( y t 1 , a ) .

Denoting Λ t ε , δ R ε , δ ( y t 1 , a t ) inf a A ( y t 1 ) R ε , δ ( y t 1 , a ) , then by equation (11), it is obtained that

(12) E π 0 * [ β V ε , δ ( y t ) h ˆ t ] = Λ t ε , δ c ( y t , a t ) + V ε , δ ( y t 1 ) .

If we take the expected value in equation (12), we obtain that

(13) E y π 0 * [ β V ε , δ ( y t ) ] = E y π 0 * [ V ε , δ ( y t 1 ) ] E y π 0 * [ c ( y t 1 , a t ) ] + E y π 0 * [ Λ t ε , δ ] .

Summing equation (13) on t = 1 , 2 , , n with weights β t 1 , we obtain

(14) t = 1 n β t 1 E y π 0 * [ c ( y t 1 , a t ) ] = t = 1 n β t 1 [ E y π 0 * V ε , δ ( y t 1 ) β E y π 0 * V ε , δ ( y t ) ] + t = 1 n β t 1 E y π 0 * [ Λ t ε , δ ] = V ε , δ ( y ) β n E y π 0 * V ε , δ ( y n ) + t = 1 n β t 1 E y π 0 * Λ t ε , δ .

Since V ε , δ B W , lim n β n E y π 0 * V ε , δ ( y n ) = 0 . Thus, when n , it follows from equation (14) that

(15) Δ ε , δ 0 ( y , π 0 * ) = t = 1 β t 1 E y π 0 * Λ t ε , δ = t = 1 β t 1 E y π 0 * c ( y t 1 , a t ) V ε , δ ( y ) .

Now, by equations (9) and (10), it follows that

R 0 , 0 ( y , f * ( y ) ) = inf a A ( y ) R 0 , 0 ( y , a ) ,

then

Λ t ε , δ = R ε , δ ( y t 1 , a ) R 0 , 0 ( y t 1 , f * ( y t 1 ) ) + inf a A ( y t 1 ) { R 0 , 0 ( y t 1 , a ) } inf a A ( y t 1 ) { R ε , δ ( y t 1 , a ) } ,

which implies that

Λ t ε , δ R ε , δ ( y t 1 , a ) R 0 , 0 ( y t 1 , f * ( y t 1 ) ) + sup a A ( y t 1 ) { R 0 , 0 ( y t 1 , a ) R ε , δ ( y t 1 , a ) } .

Therefore,

Λ t ε , δ 2 sup a A ( y t 1 ) R ε , δ ( y t 1 , a ) R 0 , 0 ( y t 1 , a ) 2 β sup a A ( y t 1 ) E V ε , δ ( H ( y t 1 , a , χ ( ε , δ ) ) ) E V ( H ( y t 1 , a , χ ( 0 , 0 ) ) ) ,

where the expected value in the last term is taken with respect to the random vector χ ( ε , δ ) at fixed t . It follows from the last inequality that

(16) Λ t ε , δ 2 β sup a A ( y t 1 ) E V ε , δ ( H ( y t 1 , a , χ ( ε , δ ) ) ) E V ε , δ ( H ( y t 1 , a , χ ( 0 , 0 ) ) ) + 2 β sup a A ( y t 1 ) E V ε , δ ( H ( y t 1 , a , χ ( 0 , 0 ) ) ) E V ( H ( y t 1 , a , χ ( 0 , 0 ) ) ) 2 β μ 1 ( χ ( ε , δ ) , χ ( 0 , 0 ) ) + 2 β V ε , δ V W sup a A ( y t 1 ) E W ( H ( y t 1 , a , χ ( 0 , 0 ) ) ) ,

where

μ 1 ( χ ( ε , δ ) , χ ( 0 , 0 ) ) = sup ( y , a ) K E V ε , δ ( H ( y , a , χ ( ε , δ ) ) ) E V ε , δ ( H ( y , a , χ ( 0 , 0 ) ) ) .

From Proposition 8.3.9 part (a) of [4], it can be shown that

T ε , δ u ( y ) inf a A ( y ) { c ( y , a ) + β E u ( H ( y , a , χ ( ε , δ ) ) ) }

is a contractive operator in B W with module γ , for each ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] . Since V ε , δ and V are fixed points for the operator T ε , δ , we obtain that

V ε , δ V W T ε , δ V ε , δ T 0 , 0 V ε , δ W + T 0 , 0 V ε , δ T 0 , 0 V W .

This last relation implies that

(17) V ε , δ V W ( 1 γ ) 1 T ε , δ V ε , δ T 0 , 0 V ε , δ W β ( 1 γ ) 1 sup y y { W 1 ( y ) sup a A ( y ) E V ε , δ [ H ( y , a , χ ( ε , δ ) ) ] E V ε , δ [ H ( y , a , χ ( 0 , 0 ) ) ] } .

Combining inequality (8) from Lemma 2 and expressions (16) and (17), it follows that

E y π 0 * Λ t ε , δ 2 β 1 + β 1 γ γ β t 1 W ( y ) μ 1 ( χ ( ε , δ ) , χ ( 0 , 0 ) ) .

Finally, by equation (15), we obtain that

Δ ε , δ ( y , π 0 * ) 2 β 1 1 β + β ( 1 γ ) 2 W ( y ) μ 1 ( χ ( ε , δ ) , χ ( 0 , 0 ) ) .

By Lemma 1, it yields that

(18) Δ ε , δ ( y , π 0 * ) 2 β L 0 1 β L 1 1 1 β + β ( 1 γ ) 2 W ( y ) L ( χ ( ε , δ ) , χ ( 0 , 0 ) ) .

Consider the particular case χ = χ ( 0 , 0 ) on equation (7), then it follows that

L ( χ ( ε , δ ) , χ ( 0 , 0 ) ) = E r ( χ ( ε , δ ) , χ ( 0 , 0 ) ) = δ ˆ ε , δ .

Therefore, by substituting the previous equality in equation (18) the result follows.□

Remark 4

Observe that Theorem 1 guarantees that the optimal policy of the deterministic system (see equations (3) and (4)) π 0 * F is asymptotically optimal for the stochastic system (see equations (1) and (2)), i.e.,

lim ε , δ 0 V ˆ ε , δ ( y , π 0 * ) V ε , δ ( y ) = 0 .

In the following lemma, we verify the continuity of the function f * , under the assumption that there exists a unique optimal policy π 0 * . The uniqueness of the optimal policy is a restrictive assumption, but in [24] three blocks of conditions are provided for the components of the decision model to guarantee this assumption. In particular, Cruz-Suarez et al. [24] provided conditions for uniqueness when the state space is a subset of R n .

Lemma 3

Under Conditions 1 and 2, and if, in addition, the stationary optimal policy for the deterministic problem π 0 * = { f * , f * , } is unique, then f * is a continuous function.

Proof

By contradiction, it will be shown that for ε = 0 and δ = 0 , the optimal policy f * : Y A is a continuous function. Under Conditions 1 and 2, we have that

(19) V ( x , α ) = inf a A { c ( x , α , a ) + β V ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) } = c ( x , α , f * ( x , α ) ) + β V ( F ( x , α , f * ( x , α ) , s 1 ) , G ( α , s 2 ) ) ,

( x , α ) Y . Suppose there exists ( x ˆ , α ˆ ) Y where f * is not continuous. Then, there exists a sequence { ( x n , α n ) } such that ( x n , α n ) ( x ˆ , α ˆ ) , but d 2 ( f * ( x n , α n ) , f * ( x ˆ , α ˆ ) ) 0 , when n . After taking a subsequence – if necessary – without loss of generality, there exists τ > 0 such that d 2 ( f * ( x n , α n ) , f * ( x ˆ , α ˆ ) ) τ . Since A is compact, there exists a subsequence { z n k } of { z n = f * ( x n , α n ) } that converges to z A , where z f * ( x ˆ , α ˆ ) . Consider ( x n k , α n k ) and f * ( x n k , α n k ) instead of ( x , α ) and f * ( x , α ) in equation (19), then we obtain that

V ( x n k , α n k ) = c ( x n k , α n k , f * ( x n k , α n k ) ) + β V ( F ( x n k , α n k , f * ( x n k , α n k ) , ξ ( 0 ) ) , G ( α n k , η ( 0 ) ) ) .

By the continuity of the functions c , F , G , and V and if k , we obtain that

V ( x ˆ , α ˆ ) = c ( x ˆ , α ˆ , z ) + β V ( F ( x ˆ , α ˆ , z , ξ ( 0 ) ) , G ( α ˆ , η ( 0 ) ) ) ,

( x ˆ , α ˆ ) Y . From Conditions 1 and 2, there exists an optimal policy f ¯ with z = f ¯ ( x ˆ , α ˆ ) . But f ¯ ( x ˆ , α ˆ ) f * ( x ˆ , α ˆ ) , which contradicts the uniqueness of an optimal policy. Therefore, f * is a continuous function.□

Next, the main theorem is stated and proved.

Theorem 2

Under Conditions 13, for each ε [ 0 , ε 0 ] and δ [ 0 , δ 0 ] , the following statements hold:

  1. V ε , δ V W C 1 δ ˆ ε , δ , where C 1 = β 1 γ L 0 max { L 2 , x , L 2 , α } 1 β L 1 .

  2. Let K be a compact subset of Y. If the stationary optimal policy for the deterministic problem π 0 * = { f * , f * , } is unique, then f ε , δ f * uniformly on K when ε and δ go to zero.

Proof

(a) Observe that equation (17) implies that

V ε , δ V W β ( 1 γ ) 1 sup y Y { W 1 ( y ) sup a A ( y ) E V ε , δ ( H ( y , a , χ ( ε , δ ) ) ) E V ε , δ ( H ( y , a , χ ( 0 , 0 ) ) ) } .

Now, by Lemma 1 and since W ( y ) 1 , y Y , it yields that

V ε , δ V W β ( 1 γ ) 1 L 0 1 β L 1 sup y Y sup a A ( y ) E d 1 ( H ( y , a , χ ( ε , δ ) ) , H ( y , a , χ ( 0 , 0 ) ) ) .

On the other hand, the following expressions hold:

E d 1 ( H ( y , a , χ ( ε , δ ) ) , H ( y , a , χ ( 0 , 0 ) ) ) = E max { d x ( F ( x , α , a , ξ ( ε ) ) , F ( x , α , a , ξ ( 0 ) ) ) , d α ( G ( α , η ( δ ) ) , G ( α , η ( 0 ) ) ) } E L 2 , x r 1 ( ξ ( ε ) , ξ ( 0 ) ) 1 { d α d x } ( ( F ( x , α , a , ξ ( ε ) ) , G ( α , η ( δ ) ) ) , ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) ) + E L 2 , α r 2 ( η ( δ ) , η ( 0 ) ) 1 { d x < d α } ( ( F ( x , α , a , ξ ( ε ) ) , G ( α , η ( δ ) ) ) , ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) ) ,

where 1 { d α d x } ( ) denotes the indicator function of { d α d x } and 1 { d x < d α } ( ) denotes the indicator function of { d x < d α } . Then, we conclude that

V ε , δ V W β ( 1 γ ) 1 L 0 max { L 2 , x , L 2 , α } 1 β L 1 δ ˆ ε , δ .

(b) Suppose that there exist a compact set K Y , a real number τ > 0 , and sequences { ε n } , { δ n } convergent to 0 such that

(20) d 2 ( f ε n , δ n ( x n , α n ) , f * ( x n , α n ) ) τ 2 , n = 1 , 2 , ,

for some convergent sequence { ( x n , α n ) } K , such that ( x n , α n ) ( x , α ) K , when n . Since A is compact, take a subsequence { ( x m , α m ) } of { ( x n , α n ) } such that f ε n , δ n ( x m , α m ) a A . For the continuity of f* given by Lemma 1 and equation (20), we now obtain that d 2 ( a , f * ( x , α ) ) τ 2 .

Since f ε n , δ n is an optimal policy, we obtain by equation (19) that

(21) V ε m , δ m ( x m , α m ) = c ( x m , α m , f ε m , δ m ( x m , α m ) ) + β E V ε m , δ m ( F ( x m , α m , f * ( x m , α m ) , ξ ( ε m ) ) , G ( α m , η ( δ m ) ) )

for m = 1 , 2 , .

Now, note that

(22) E V ε m , δ m ( F ( x m , α m , f * ( x m , α m ) , ξ ( ε m ) ) , G ( α m , η ( δ m ) ) ) V ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) E V ε m , δ m ( F ( x m , α m , f * ( x m , α m ) , ξ ( ε m ) ) , G ( α m , η ( δ m ) ) ) V ε m , δ m ( F ( x m , α m , f * ( x m , α m ) , ξ ( 0 ) ) , G ( α m , η ( 0 ) ) ) + V ε m , δ m ( F ( x m , α m , f * ( x m , α m ) , ξ ( 0 ) ) , G ( α m , η ( 0 ) ) ) V ( F ( x m , α m , f ε m , δ m ( x m , α m ) , ξ ( 0 ) ) , G ( α m , η ( 0 ) ) ) + V ( F ( x m , α m , f ε m , δ m ( x m , α m ) , ξ ( 0 ) ) , G ( α m , η ( 0 ) ) ) V ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) .

By Lemma 1, the first term on the right-hand side of equation (22) is less than or equal to L 0 1 β L 1 δ ˆ ε m , δ m . The remaining terms converge to 0 when m , by the continuity of the functions F , G , and V . Therefore, when m , equation (21) becomes

V ( x , α ) = c ( x , α , a ) + β V ( F ( x , α , a , ξ ( 0 ) ) , G ( α , η ( 0 ) ) ) .

By similar arguments to those in the proof of Lemma 3, it is known that there exists an optimal policy f ¯ with a = f ¯ ( x , α ) , but f ¯ ( x , α ) f * ( x , α ) . This fact is a contradiction to the uniqueness of the optimal policy. Therefore, f ε n , δ n converges uniformly to f * .□

5 Examples

In this section, we present two examples that illustrate the developed theory and two examples that do not satisfy any of the conditions of Theorem 1, so that we arrive at conclusions quite different from those provided by this result. In this section, we consider d x , d α , d 2 , r 1 , and r 2 as the usual metric on R .

5.1 Consumption-investment problem

We consider a consumption-investment problem [15,23] in which an investor must allocate his current wealth, say x t , between investment a t and consumption x t a t , in each stage t = 0 , 1 , 2 , . In addition, at each stage t , a discount factor exp ( α t ) is imposed, which depends on the real bank interest rate α t . The state and action spaces will be X = A = [ 0 , ) . Assuming that borrowing is not allowed, the set of admissible controls takes the form: A ( x , α ) = [ 0 , x ] . Furthermore, it is assumed that the bank receives at least an interest rate of exp ( α * ) 1 for α * > 0 . Thus, the discount rate space is Γ = [ α * , ) .

The state process { x t } and the discounting process { α t } satisfy the following difference equations:

(23) x t + 1 = ξ t ( ε ) ( x t a t ) , α t + 1 = α t + η t ( δ ) ,

t = 0 , 1 , 2 , , with ( x 0 , α 0 ) Y given, { ξ t } and { η t } are sequences of independent and identically distributed discrete random variables, independent of ( x 0 , α 0 ) , and S 1 = S 2 = [ 0 , 1 ] .

Remark 5

In particular, if η t ( 0 ) = s 2 = 0 , t = 0 , 1 , , in equation (23) the corresponding deterministic MDP has a constant discount factor.

The objective is to maximize the utility of consumption of the investor for all π Π ,

V ˆ ε , δ ( x , α , π ) = E ( x , α ) π t = 0 e S t u ( x t , α t , a t ) ,

where S t = α 0 + α 1 + + α t 1 and u is an utility function. In particular, consider the utility function u defined by

u ( x , α , a ) = b γ 1 a γ 1 ,

( x , α , a ) K , where b > 0 and γ 1 ( 0 , 1 ) . In addition, suppose that μ γ 1 E [ ξ γ 1 ] < with 0 < β μ γ 1 < 1 , where β = e α 0 . By the definition of utility function, Conditions 1(b) and 3(a) are immediately satisfied with L 0 1 .

Note that A ( x , α ) is compact for all ( x , α ) X × Γ . Now, consider H a the Hausdorff metric, then for ( x , α ) , ( x , α ) X × Γ , we have that

H a ( A ( x , α ) , A ( x , α ) ) = H a ( [ 0 , x ] , [ 0 , x ] ) = x x max { x x , α α } = d 1 ( ( x , α ) , ( x , α ) ) .

Then the set-value mapping ( x , α ) A ( x , α ) is continuous with respect to the Hausdorff metric, so that Condition 1(a) is satisfied. Furthermore, due to the continuity of H , the function U ( x , α , a ) is also continuous, then Condition 1(c) is valid. Consider W : X × Γ [ 1 , ) defined by

W ( x , α ) = b μ γ 1 γ 1 ( 1 β μ γ 1 ) x γ 1 + 1 , ( x , α ) X × Γ .

Cruz-Suarez et al. [23] verified that the function W satisfies Conditions 2(a) and (b). In addition, note that

W ( x , α , a ) = b μ γ 1 γ 1 ( 1 β μ γ 1 ) ( x a ) E ξ ( ε ) + 1

is continuous on K , so Condition 2 holds.

On the other hand, for ( x , α , a ) , ( x , α , a ) K and for all χ ( ε , δ ) ( ω 1 , ω 2 ) S 1 × S 2 , we have that

d 1 ( H ( x , α , a , χ ( ε , δ ) ( ω 1 , ω 2 ) ) , H ( x , α , a , χ ( ε , δ ) ( ω 1 , ω 2 ) ) ) = max { ξ x x , α α } max { x x , α α } = d 1 ( ( x , α ) , ( x , α ) ) ,

then Condition 3(b) is valid for L 1 = 1 . Finally, Condition 3(c) is satisfied due to the following expressions:

d x ( F ( x , α , a , ξ ( ω 1 ) ) , F ( x , α , a , ξ ( ω 1 ) ) ) = F ( x , α , a , ξ ( ω 1 ) ) F ( x , α , a , ξ ( ω 1 ) ) = ( ξ ( ω 1 ) ξ ( ω 1 ) ) ( x a ) x ξ ( ω 1 ) ξ ( ω 1 ) = L 2 , x r 1 ( ξ ( ω 1 ) , ξ ( ω 1 ) )

for each ( x , α , a ) K and for all ξ ( ω 1 ) , ξ ( ω 1 ) S 1 , where L 2 , x x .

We also obtain that

d α ( G ( α , η ( ω 2 ) ) , G ( α , η ( ω 2 ) ) ) = G ( α , η ( ω 2 ) ) G ( α , η ( ω 2 ) ) = α + η ( ω 2 ) ( α + η ( ω 2 ) ) = δ η ( ω 2 ) η ( ω 2 ) δ 0 η ( ω 2 ) η ( ω 1 ) = L 2 , α r 2 ( η ( ω 2 ) , η ( ω 2 ) )

for each α Γ and for all η ( ω 2 ) , η ( ω 2 ) S 2 , where L 2 , α δ 0 .

By Theorem 1, the next inequality holds

Δ ε , δ ( ( x , α ) , π 0 * ) 2 β max { x , δ 0 } 1 β 1 1 β + β ( 1 γ ) 2 b μ γ 1 γ 1 ( 1 β μ γ 1 ) x γ 1 + 1 R ˆ ,

where R ˆ = E max { ξ ( ε ) ξ ( 0 ) , η ( δ ) η ( 0 ) } . By Theorem 2, the convergence rate of the optimal value function is

C 1 = β max { x , δ 0 } ( 1 γ ) ( 1 β ) .

On the other hand, by Theorem 2(b), we have that

sup ( x , α ) K f ε , δ ( x , α ) f * ( x , α ) 0 ,

when ε 0 and δ 0 .

5.2 Control problem with small additive noise

Assume that the dynamic of the system is given by the difference equation:

(24) x t + 1 = 1 2 ( α t x t + a t + ξ t ( ε ) ) , α t + 1 = h α t + η t ( δ ) ,

t = 0 , 1 , 2 , , where 0 < h 1 and { ξ t ( ε ) } , { η t ( δ ) } are sequences of independent and identically distributed random variables (i.i.d.r.v.) that take values in S 1 = [ 0 , B 3 ] and S 2 = [ 0 , 1 2 ] , respectively. The x -states space is X = [ 0 , B ] , where 0 < B < 6 1 β 1 , with β as the discount factor and the α -states space is Γ = [ 0 , 1 ] , i.e., 0 α 1 . The control space is A = [ 0 , B 3 ] . The set of feasible controls in the states ( x , α ) is A ( x , α ) = [ 0 , x α ] , and the cost function is

c ( x , α , a ) = x α a , ( x , α , a ) K .

Remark 6

In particular, if η t ( 0 ) = s 2 = 0 , t = 0 , 1 , , and h = 1 in equation (24), the corresponding deterministic MDP has constant parameter α .

For this example, Condition 1 is verified immediately. Next, Conditions 2 and 3 are checked.

Consider W : X × Γ [ 1 , ) defined by W ( x , α ) = x + 1 for all ( x , α ) X × Γ . Then, it follows that

c ( x , α , a ) = ( x α a ) x a x < x + 1 W ( x , α )

for all ( x , α , a ) K .

We also obtain that

(25) E W [ H ( x , α , a , ( ξ ( ε ) , η ( δ ) ) ) ] = E 1 2 ( α x + a + ξ ( ε ) ) + 1 = 1 2 ( α x + a + E ξ ( ε ) ) + 1 1 2 2 α x + B 3 + 1 1 2 2 x + 2 + B 3 x + 1 + B 6 1 + B 6 ( x + 1 ) = γ β W ( x , α )

for all ( x , α , a ) K , where γ = β 1 + B 6 . It is clear that γ > β . Moreover, since B < 6 1 β 1 , it yields that γ ( β , 1 ) .

From the second equality in equation (25), we observe that the function E W [ H ( x , α , a , χ ( ε , δ ) ) ] is continuous on K . Therefore, Condition 2 is satisfied.

Finally, Condition 3 is verified. Note that

c ( x , α , a ) c ( x , α , a ) = x α a ( x α a ) = x α ( x α ) x x max { x x , α α } = L 0 d 1 ( ( x , α ) , ( x , α ) )

for all ( x , α , a ) , ( x , α , a ) K , where L 0 1 .

The following inequalities are valid for the joint dynamics of the states:

d 1 ( H ( x , α , a , χ ( ε , δ ) ( ω 1 , ω 2 ) ) , H ( x , α , a , χ ( ε , δ ) ( ω 1 , ω 2 ) ) ) = max 1 2 α x 1 2 α x , h ( α α ) max 1 2 x x , h α α L 1 d 1 ( ( x , α ) , ( x , α ) )

for all ( x , α , a ) K and χ ( ε , δ ) ( ω 1 , ω 2 ) S 1 × S 2 , where L 1 max 1 2 , h 1 .

Finally, we verify Lipschitz conditions for the functions F and G with respect to the disturbance variables:

d x ( F ( x , α , a , ξ ( ω 1 ) ) , F ( x , α , a , ξ ( ω 1 ) ) ) = 1 2 ( α x + a + ξ ( ω 1 ) ) 1 2 ( α x + a + ξ ( ω 1 ) ) = 1 2 ξ ( ω 1 ) 1 2 ξ ( ω 1 ) 1 2 ξ ( ω 1 ) ξ ( ω 1 ) L 2 , x r 1 ( ξ ( ω 1 ) , ξ ( ω 1 ) )

for each ( x , α , a ) K , and for all ξ ( ω 1 ) , ξ ( ω 1 ) S 1 , where L 2 , x 1 2 and

d α ( G ( α , η ( ω 2 ) ) , G ( α , η ( ω 2 ) ) ) = ( h α + η ( ω 2 ) ) ( h α + η ( ω 2 ) ) = η ( ω 2 ) η ( ω 2 ) L 2 , α r 2 ( η ( ω 2 ) , η ( ω 2 ) )

for each α Γ and for all η ( ω 2 ) , η ( ω 2 ) S 2 , where L 2 , α 1 . Therefore, Condition 3 is satisfied.

By Theorem 1, it yields that

Δ ε , δ ( ( x , α ) , π 0 * ) 2 β 1 β max 1 2 , h 1 1 β + β ( 1 ( β ( 1 + B 6 ) ) ) 2 ( x + 1 ) R ˆ ,

where R ˆ E max { ξ ( ε ) ξ ( 0 ) , η ( δ ) η ( 0 ) } . On the other hand, the convergence rate of the optimal value function by Theorem 2 is as follows:

C 1 = β 1 β ( 1 + B 6 ) 1 β max 1 2 , h .

In addition, by part (b) of Theorem 2, we have that f ε , δ ( x , α ) f * ( x , α ) , when ε 0 and δ 0 .

5.3 Importance of conditions

Finally, we present two examples where Conditions 2 and 3 are not satisfied and, therefore, the conclusions of Theorem 1 are not reached.

Example 1

Let X = [ 0 , ) , Γ = [ 0 , 1 ] , A = [ 1 , 1 β ] , and ε , δ [ 0 , 1 ] and the one-stage cost function is given as follows:

c ( x , α , 0 ) = 1 , ( x , α ) X × Γ , c ( x , α , a ) = a , x [ 0 , 1 ] , α Γ , a + x 1 , x > 1 , a ( 0 , 1 β ) , α Γ , c ( x , α , 1 β ) = 0 , x [ 0 , 1 ] , α Γ , x 1 , x > 1 , α Γ .

Consider the difference equations:

(26) x t + 1 = a t x t + ε α t ξ t , α t + 1 = k α t + δ η t ,

t = 0 , 1 , , where { ξ t } is a sequence of i.i.d.r.v. with uniform distribution over ( 0 , 1 ) , η t = 0 , t = 1 , 2 , and k < 1 . The deterministic approximation to the process (26) is given by the following equations:

(27) x t + 1 = a t x t , α t + 1 = k α t ,

t = 1 , 2 , and k < 1 . Consider x 0 = 0 and α 0 = 1 , then for any control policy in (27), we have that ( x t , α t ) = ( 0 , k t ) , t = 1 , 2 , . Therefore, the policy π 0 * = { 1 β , 1 β , } corresponds to the minimum value of V ˆ ( ( 0 , 1 ) , π 0 * ) = V ( ( 0 , 1 ) ) = 0 . Now, if the policy π 0 * is applied in equation (26) with initial state ( 0 , 1 ) , it is obtained that

(28) x t = 1 β t + ε k β t 1 i = 0 t 1 ( β k ) i ξ i + 1 ,

t = 1 , 2 , . Note that the first term on the right-hand side of equation (28) is greater than 1, so x t > 1 for t = 1 , 2 , . Since c ( x , α , 1 β ) = x 1 for x > 1 and using equation (28), it yields that

β t E ( 0 , 1 ) π 0 * c x t , α t , 1 β = β t E ( 0 , 1 ) π 0 * 1 β t + ε k β t 1 i = 0 t 1 ( β k ) i ξ i + 1 1 = 1 + ε k β 2 i = 0 t 1 ( β k ) i β t ε k β 2 i = 0 t 1 ( β k ) i = ε k β 2 1 ( k β ) t 1 k β ,

t = 1 , 2 , . Now, for each ε 1 ( 0 , 1 2 ) , choose the stationary policy π 1 = { ε 1 , ε 1 , } , and by equation (26), we obtain that x t [ 0 , 1 ] , t = 0 , 1 , 2 , . Note that

V ˆ ε , δ ( ( 0 , 1 ) , π 1 ) = E ( 0 , 1 ) π 1 t = 0 β t c ( y t , α t , a t ) = E ( 0 , 1 ) π 1 t = 0 β t ε 1 = ε 1 1 β .

Thus,

0 V ε , δ ( ( 0 , 1 ) ) V ˆ ε , δ ( ( 0 , 1 ) , π 1 ) 0

when ε 1 0 . On the other hand, observe that ( k β ) t k β for t 1 , then

Δ ε , δ ( ( 0 , 1 ) , π 1 ) = V ˆ ε , δ ( ( 0 , 1 ) , π 1 ) V ε , δ ( ( 0 , 1 ) ) = E ( 0 , 1 ) π 1 t = 0 β t c ( y t , α t , a t ) ε k β 2 t = 0 1 ( k β ) t 1 k β = ε k β 2 t = 1 1 ( k β ) t 1 k β k β 2 t = 1 1 k β 1 k β = .

Therefore, Δ ε , δ ( ( 0 , 1 ) , π 1 ) = .

In this example, Condition 2 is not satisfied, in particular, there does not exist a continuous function W : Y [ 1 , ) such that c ( y , a ) W ( y ) , for ( y , a ) K . In this case, it happens that Δ ε , δ ( ( 0 , 1 ) , π 1 ) = , for any ε , δ > 0 .

Example 2

Let X = R , Γ = [ 0 , ) , A = { 0 , 1 } , and ε , δ [ 0 , 1 ] , and for i { 0 , 1 } , the one-stage cost function is defined as follows:

c ( x , α , i ) = 1 , x 0 , α Γ , 3 , in another case.

In addition, consider the difference equations:

(29) x t + 1 = x t α t ( a t ε ξ t ) , α t + 1 = h α t δ η t ,

t = 0 , 1 , , where { ξ t } is a sequence of random variables with standard normal distribution and { η t } is a sequence of random variables with exponential distribution with parameter 1 and h > 0 . The deterministic approximation to the process (29) is given by the following equations:

(30) x t + 1 = α t x t a t , α t + 1 = h α t ,

t = 0 , 1 , . Consider initial states x 0 = 1 and α 0 = 1 . Under this framework, the policy π 0 * = { 0 , 0 , } is optimal for the deterministic process (30) and for ε , δ > 0 , V ˆ ε , δ ( ( 1 , 1 ) , π 0 * ) = 1 1 β . On the other hand, V ˆ ε , δ ( ( 1 , 1 ) , π 1 ) = 3 1 β , with π 1 = { 1 , 1 , } . Therefore,

Δ ε , δ ( ( 1 , 1 ) , π 1 ) = V ˆ ε , δ ( ( 1 , 1 ) , π 1 ) V ε , δ ( ( 1 , 1 ) ) V ˆ ε , δ ( ( 1 , 1 ) , π 1 ) V ˆ ε , δ ( ( 1 , 1 ) , π 0 * ) = 3 1 β 1 1 β = 2 1 β .

In this example, Condition 3 is not satisfied, in particular, the cost function c is not a Lipschitz function. In this case, we conclude that Δ ε , δ ( ( 1 , 1 ) , π 1 ) 2 1 α even though δ ˆ ε , δ 0 when ε 0 and δ 0 .

Finally, note that if L 1 > 1 in Condition 3(b), it is not possible to guarantee the existence of an upper bound for Δ ε , δ ( y , π 0 * ) . Moreover, it is also not possible to determine a convergence rate for the optimal value function, when β L 1 > 1 .

6 Conclusions

In this article, we established conditions under which uniform convergence of the optimal value function and the optimal policy of a family of MDPs indexed by parameters ε and δ converges to the optimal value function and the optimal policy of an adequate deterministic MDP when ε 0 and δ 0 . These MDPs evolve according to coupled difference equations. The first equation is related to the evolution of the x -states by the function F appearing in equation (1), while the second equation is related to the evolution of some parameter of the model (see equation (2)). The main results of the article are Theorems 1 and 2. Theorem 1 provides an upper bound for the stability index. On the other hand, Theorem 2 establishes the convergence of the sequences { V ε , δ } and { f ε , δ } to V and f * , respectively, when ε and δ go to zero. Finally, the developed theory was illustrated with two examples showing the conclusions of the main results. A direct consequence of Theorem 1 is that the optimal policy of the deterministic problem is asymptotically optimal for the stochastic problem. On the other hand, the results presented in Theorem 2 allow us to perform approximations for stochastic systems using the perturbation method. Such methodology is well established in the literature of economic growth models for stochastic systems whose dynamics are described only by an equation of x -states, see, e.g., [25]. However, for stochastic systems with two coupled difference equations, the research is still ongoing.

Acknowledgments

The authors are deeply grateful to the reviewers and the Associate Editor for their careful reading of the original manuscript and for their advice to improve the paper.

  1. Conflict of interest: The authors state that there are no conflicts of interest.

References

[1] D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete-Time Case, Athena Scientific, United States of America, 1996. Search in Google Scholar

[2] E. A. Feinberg and A. Shwartz, Handbook of Markov Decision Processes: Methods and Applications, Springer Science & Business Media, New York, 2012. Search in Google Scholar

[3] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control, Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996. Search in Google Scholar

[4] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999. Search in Google Scholar

[5] M. L. Puterman, Markov Decision Processes, Wiley Interscience, Hoboken, New Jersey, 1994. Search in Google Scholar

[6] R. J. Boucherie and N. M. Van Dijk, Markov Decision Processes in Practice, Springer International Publishing, Cham, Switzerland, 2017. Search in Google Scholar

[7] D. Hernández-Hernández and J. A. Minjárez-Sosa, Optimization, Control, and Applications of Stochastic Systems, Springer Science & Business Media, New York Heidelberg Dordrecht London, 2012. Search in Google Scholar

[8] R. Bellman, Dynamic Programming, Dover Publications, United States of America, 2003. Search in Google Scholar

[9] H. Cruz-Suárez and R. Ilhuicatzi-Roldán, Stochastic optimal control for small noise intensities: The discrete-time case, WSEAS Trans. Math. 9 (2010), no. 2, 120–129. Search in Google Scholar

[10] E. Gordienko, E. Lemus-Rodríguez, and R. Montes-de-Oca, Discounted cost optimality problem: stability with respect to weak metrics, Math. Methods Oper. Res. 68 (2008), no. 1, 77–96, DOI: https://doi.org/10.1007/s00186-007-0171-z. Search in Google Scholar

[11] R. S. Liptser, W. J. Runggaldier, and M. Taksar, Deterministic approximation for stochastic control problems, SIAM J. Control Optim. 34 (1996), no. 1, 161–178, DOI: https://doi.org/10.1137/S0363012993254540. Search in Google Scholar

[12] P. Dupuis and H. J. Kushner, Stochastic systems with small noise, analysis and simulation; a phase locked loop example, SIAM J. Appl. Math. 47 (1987), no. 3, 643–661, https://www.jstor.org/stable/2101805. Search in Google Scholar

[13] H. Cruz-Suárez, E. Gordienko, and R. Montes-de-Oca, A note on deterministic approximation of discounted Markov decision processes, Appl. Math. Lett. 22 (2009), no. 8, 1252–1256, DOI: https://doi.org/10.1016/j.aml.2009.01.039. Search in Google Scholar

[14] A. D. Kara and S. Yüksel, Robustness to incorrect system models in stochastic control, arXiv:1803.06046, 2020, https://doi.org/10.48550/arXiv.1803.06046.Search in Google Scholar

[15] J. González-Hernández, R. R. López-Martínez, and J. A. Minjárez-Sosa, Adaptive policies for stochastic systems under a randomized discount criterion, Bol. Soc. Mat. Mex. 14 (2008), no. 1, 149–163. Search in Google Scholar

[16] J. González-Hernández, R. R. López-Martínez, J. A. Minjárez-Sosa, and J. A. Gabriel-Arguelles, Constrained Markov control processes with randomized discounted cost criteria: occupation measures and extremal points, Risk Decis. Anal. 4 (2013), no. 3, 163–176, DOI: https://doi.org/10.3233/RDA-2012-0063. Search in Google Scholar

[17] J. González-Hernández, R. R. López-Martínez, and J. A. Minjárez-Sosa, Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion, Kybernetika 45 (2009), no. 5, 737–754, http://eudml.org/doc/37698. Search in Google Scholar

[18] J. González-Hernández, R. R. López-Martínez, and J. R. Pérez-Hernández, Markov control processes with randomized discounted cost in Borel space, Math. Methods Oper. Res. 65 (2007), no. 1, 27–44, DOI: https://doi.org/10.1007/s00186-006-0092-2. Search in Google Scholar

[19] K. Hinderer, Lipschitz continuity of value functions in Markovian decision processes, Math. Methods Oper. Res. 62 (2005), 3–22, DOI: https://doi.org/10.1007/s00186-005-0438-1. Search in Google Scholar

[20] R. Miculescu, Approximations by Lipschitz functions generated by extensions, Real Anal. Exchange 28 (2003), no. 1, 33–40, DOI: https://doi.org/10.14321/realanalexch.28.1.0033. Search in Google Scholar

[21] E. I. Gordienko, An estimate of the stability of optimal control of certain stochastic and deterministic systems, J. Sov. Math. 59 (1992), no. 4, 891–899, https://link.springer.com/content/pdf/10.1007/BF01099115.pdf. Search in Google Scholar

[22] E. I. Gordienko and F. S. Salem, Robustness inequality for Markov control processes with unbounded costs, Systems Control Lett. 33 (1998), no. 2, 125–130, DOI: https://doi.org/10.1016/S0167-6911(97)00077-7. Search in Google Scholar

[23] H. Cruz-Suárez, R. Montes-De-Oca, and G. Zacarías, A consumption-investment problem modelled as a discounted Markov decision process, Kybernetika 47 (2011), no. 6, 909–929, http://dml.cz/dmlcz/141734. Search in Google Scholar

[24] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva, Conditions for the uniqueness of optimal policies of discounted Markov decision processes, Math. Methods Oper. Res. 60 (2004), no. 3, 415–436, DOI: https://doi.org/10.1007/s001860400372. Search in Google Scholar

[25] K. L. Judd, Numerical Methods in Economics, MIT Press, United States of America, 1998. Search in Google Scholar

Received: 2023-01-31
Revised: 2023-09-13
Accepted: 2023-09-14
Published Online: 2023-10-24

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Special Issue on Future Directions of Further Developments in Mathematics
  2. What will the mathematics of tomorrow look like?
  3. On H 2-solutions for a Camassa-Holm type equation
  4. Classical solutions to Cauchy problems for parabolic–elliptic systems of Keller-Segel type
  5. Control of multi-agent systems: Results, open problems, and applications
  6. Logical perspectives on the foundations of probability
  7. Subharmonic solutions for a class of predator-prey models with degenerate weights in periodic environments
  8. A non-smooth Brezis-Oswald uniqueness result
  9. Luenberger compensator theory for heat-Kelvin-Voigt-damped-structure interaction models with interface/boundary feedback controls
  10. Special Issue on Fractional Problems with Variable-Order or Variable Exponents (Part II)
  11. Positive solution for a nonlocal problem with strong singular nonlinearity
  12. Analysis of solutions for the fractional differential equation with Hadamard-type
  13. Hilfer proportional nonlocal fractional integro-multipoint boundary value problems
  14. A comprehensive review on fractional-order optimal control problem and its solution
  15. The θ-derivative as unifying framework of a class of derivatives
  16. Review Articles
  17. On the use of L-functionals in regression models
  18. Minimal-time problems for linear control systems on homogeneous spaces of low-dimensional solvable nonnilpotent Lie groups
  19. Regular Articles
  20. Existence and multiplicity of solutions for a new p(x)-Kirchhoff problem with variable exponents
  21. An extension of the Hermite-Hadamard inequality for a power of a convex function
  22. Existence and multiplicity of solutions for a fourth-order differential system with instantaneous and non-instantaneous impulses
  23. Relay fusion frames in Banach spaces
  24. Refined ratio monotonicity of the coordinator polynomials of the root lattice of type Bn
  25. On the uniqueness of limit cycles for generalized Liénard systems
  26. A derivative-Hilbert operator acting on Dirichlet spaces
  27. Scheduling equal-length jobs with arbitrary sizes on uniform parallel batch machines
  28. Solutions to a modified gauged Schrödinger equation with Choquard type nonlinearity
  29. A symbolic approach to multiple Hurwitz zeta values at non-positive integers
  30. Some results on the value distribution of differential polynomials
  31. Lucas non-Wieferich primes in arithmetic progressions and the abc conjecture
  32. Scattering properties of Sturm-Liouville equations with sign-alternating weight and transmission condition at turning point
  33. Some results for a p(x)-Kirchhoff type variation-inequality problems in non-divergence form
  34. Homotopy cartesian squares in extriangulated categories
  35. A unified perspective on some autocorrelation measures in different fields: A note
  36. Total Roman domination on the digraphs
  37. Well-posedness for bilevel vector equilibrium problems with variable domination structures
  38. Binet's second formula, Hermite's generalization, and two related identities
  39. Non-solid cone b-metric spaces over Banach algebras and fixed point results of contractions with vector-valued coefficients
  40. Multidimensional sampling-Kantorovich operators in BV-spaces
  41. A self-adaptive inertial extragradient method for a class of split pseudomonotone variational inequality problems
  42. Convergence properties for coordinatewise asymptotically negatively associated random vectors in Hilbert space
  43. Relating the super domination and 2-domination numbers in cactus graphs
  44. Compatibility of the method of brackets with classical integration rules
  45. On the inverse Collatz-Sinogowitz irregularity problem
  46. Positive solutions for boundary value problems of a class of second-order differential equation system
  47. Global analysis and control for a vector-borne epidemic model with multi-edge infection on complex networks
  48. Nonexistence of global solutions to Klein-Gordon equations with variable coefficients power-type nonlinearities
  49. On 2r-ideals in commutative rings with zero-divisors
  50. A comparison of some confidence intervals for a binomial proportion based on a shrinkage estimator
  51. The construction of nuclei for normal constituents of Bπ-characters
  52. Weak solution of non-Newtonian polytropic variational inequality in fresh agricultural product supply chain problem
  53. Mean square exponential stability of stochastic function differential equations in the G-framework
  54. Commutators of Hardy-Littlewood operators on p-adic function spaces with variable exponents
  55. Solitons for the coupled matrix nonlinear Schrödinger-type equations and the related Schrödinger flow
  56. The dual index and dual core generalized inverse
  57. Study on Birkhoff orthogonality and symmetry of matrix operators
  58. Uniqueness theorems of the Hahn difference operator of entire function with a Picard exceptional value
  59. Estimates for certain class of rough generalized Marcinkiewicz functions along submanifolds
  60. On semigroups of transformations that preserve a double direction equivalence
  61. Positive solutions for discrete Minkowski curvature systems of the Lane-Emden type
  62. A multigrid discretization scheme based on the shifted inverse iteration for the Steklov eigenvalue problem in inverse scattering
  63. Existence and nonexistence of solutions for elliptic problems with multiple critical exponents
  64. Interpolation inequalities in generalized Orlicz-Sobolev spaces and applications
  65. General Randić indices of a graph and its line graph
  66. On functional reproducing kernels
  67. On the Waring-Goldbach problem for two squares and four cubes
  68. Singular moduli of rth Roots of modular functions
  69. Classification of self-adjoint domains of odd-order differential operators with matrix theory
  70. On the convergence, stability and data dependence results of the JK iteration process in Banach spaces
  71. Hardy spaces associated with some anisotropic mixed-norm Herz spaces and their applications
  72. Remarks on hyponormal Toeplitz operators with nonharmonic symbols
  73. Complete decomposition of the generalized quaternion groups
  74. Injective and coherent endomorphism rings relative to some matrices
  75. Finite spectrum of fourth-order boundary value problems with boundary and transmission conditions dependent on the spectral parameter
  76. Continued fractions related to a group of linear fractional transformations
  77. Multiplicity of solutions for a class of critical Schrödinger-Poisson systems on the Heisenberg group
  78. Approximate controllability for a stochastic elastic system with structural damping and infinite delay
  79. On extremal cacti with respect to the first degree-based entropy
  80. Compression with wildcards: All exact or all minimal hitting sets
  81. Existence and multiplicity of solutions for a class of p-Kirchhoff-type equation RN
  82. Geometric classifications of k-almost Ricci solitons admitting paracontact metrices
  83. Positive periodic solutions for discrete time-delay hematopoiesis model with impulses
  84. On Hermite-Hadamard-type inequalities for systems of partial differential inequalities in the plane
  85. Existence of solutions for semilinear retarded equations with non-instantaneous impulses, non-local conditions, and infinite delay
  86. On the quadratic residues and their distribution properties
  87. On average theta functions of certain quadratic forms as sums of Eisenstein series
  88. Connected component of positive solutions for one-dimensional p-Laplacian problem with a singular weight
  89. Some identities of degenerate harmonic and degenerate hyperharmonic numbers arising from umbral calculus
  90. Mean ergodic theorems for a sequence of nonexpansive mappings in complete CAT(0) spaces and its applications
  91. On some spaces via topological ideals
  92. Linear maps preserving equivalence or asymptotic equivalence on Banach space
  93. Well-posedness and stability analysis for Timoshenko beam system with Coleman-Gurtin's and Gurtin-Pipkin's thermal laws
  94. On a class of stochastic differential equations driven by the generalized stochastic mixed variational inequalities
  95. Entire solutions of two certain Fermat-type ordinary differential equations
  96. Generalized Lie n-derivations on arbitrary triangular algebras
  97. Markov decision processes approximation with coupled dynamics via Markov deterministic control systems
  98. Notes on pseudodifferential operators commutators and Lipschitz functions
  99. On Graham partitions twisted by the Legendre symbol
  100. Strong limit of processes constructed from a renewal process
  101. Construction of analytical solutions to systems of two stochastic differential equations
  102. Two-distance vertex-distinguishing index of sparse graphs
  103. Regularity and abundance on semigroups of partial transformations with invariant set
  104. Liouville theorems for Kirchhoff-type parabolic equations and system on the Heisenberg group
  105. Spin(8,C)-Higgs pairs over a compact Riemann surface
  106. Properties of locally semi-compact Ir-topological groups
  107. Transcendental entire solutions of several complex product-type nonlinear partial differential equations in ℂ2
  108. Ordering stability of Nash equilibria for a class of differential games
  109. A new reverse half-discrete Hilbert-type inequality with one partial sum involving one derivative function of higher order
  110. About a dubious proof of a correct result about closed Newton Cotes error formulas
  111. Ricci ϕ-invariance on almost cosymplectic three-manifolds
  112. Schur-power convexity of integral mean for convex functions on the coordinates
  113. A characterization of a ∼ admissible congruence on a weakly type B semigroup
  114. On Bohr's inequality for special subclasses of stable starlike harmonic mappings
  115. Properties of meromorphic solutions of first-order differential-difference equations
  116. A double-phase eigenvalue problem with large exponents
  117. On the number of perfect matchings in random polygonal chains
  118. Evolutoids and pedaloids of frontals on timelike surfaces
  119. A series expansion of a logarithmic expression and a decreasing property of the ratio of two logarithmic expressions containing cosine
  120. The 𝔪-WG° inverse in the Minkowski space
  121. Stability result for Lord Shulman swelling porous thermo-elastic soils with distributed delay term
  122. Approximate solvability method for nonlocal impulsive evolution equation
  123. Construction of a functional by a given second-order Ito stochastic equation
  124. Global well-posedness of initial-boundary value problem of fifth-order KdV equation posed on finite interval
  125. On pomonoid of partial transformations of a poset
  126. New fractional integral inequalities via Euler's beta function
  127. An efficient Legendre-Galerkin approximation for the fourth-order equation with singular potential and SSP boundary condition
  128. Eigenfunctions in Finsler Gaussian solitons
  129. On a blow-up criterion for solution of 3D fractional Navier-Stokes-Coriolis equations in Lei-Lin-Gevrey spaces
  130. Some estimates for commutators of sharp maximal function on the p-adic Lebesgue spaces
  131. A preconditioned iterative method for coupled fractional partial differential equation in European option pricing
  132. A digital Jordan surface theorem with respect to a graph connectedness
  133. A quasi-boundary value regularization method for the spherically symmetric backward heat conduction problem
  134. The structure fault tolerance of burnt pancake networks
  135. Average value of the divisor class numbers of real cubic function fields
  136. Uniqueness of exponential polynomials
  137. An application of Hayashi's inequality in numerical integration
Downloaded on 19.4.2026 from https://www.degruyterbrill.com/document/doi/10.1515/math-2023-0129/html
Scroll to top button