Home A new refinement of Jensen’s inequality with applications in information theory
Article Open Access

A new refinement of Jensen’s inequality with applications in information theory

  • Lei Xiao and Guoxiang Lu EMAIL logo
Published/Copyright: December 31, 2020

Abstract

In this paper, we present a new refinement of Jensen’s inequality with applications in information theory. The refinement of Jensen’s inequality is obtained based on the general functional in the work of Popescu et al. As the applications in information theory, we provide new tighter bounds for Shannon’s entropy and some f-divergences.

MSC 2010: 26B25; 26D15; 94A17

1 Introduction

Let C be a convex subset of the linear space X and f a convex function on C. If p = ( p 1 , , p n ) is a probability sequence and x = ( x 1 , , x n ) C n , then the well-known Jensen’s inequality

(1) f i = 1 n p i x i i = 1 n p i f ( x i )

holds [1]. If f is concave, then the preceding inequality is reversed.

Jensen’s inequality probably plays a crucial role in the theory of mathematical inequalities. It is applied widely in mathematics, statistics, and information theory and can deduce many important inequalities such as arithmetic-geometric mean inequality, Hölder inequality, Minkowski inequality, and Ky Fan’s inequality.

In 2010, Dragomir obtained a refinement of Jensen’s inequality as follows [2]:

Theorem 1.1

If f , x , p are defined as above, then

(2) f i = 1 n p i x i min k { 1 , 2 , , n } ( 1 p k ) f i = 1 n p i x i p k x k 1 p k + p k f ( x k ) 1 n k = 1 n ( 1 p k ) f i = 1 n p i x i p k x k 1 p k + k = 1 n p k f ( x k ) max k { 1 , 2 , , n } ( 1 p k ) f i = 1 n p i x i p k x k 1 p k + p k f ( x k ) i = 1 n p i f ( x i ) .

The same year, Dragomir has also obtained a different refinement of Jensen’s inequality as follows [3]:

Theorem 1.2

(S. S. Dragomir) Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. If x k C and p k > 0 , k { 1 , 2 , , n } with k = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , , n } , we have

(3) k = 1 n p k f ( x k ) D ( f , p , x ; J ) f k = 1 n p k x k ,

where D ( f , p , x ; J ) is a functional defined as follows:

D ( f , p , x ; J ) P J f 1 P J i J p i x i + P ¯ J f 1 P ¯ J j J ¯ p j x j

with J a nonempty subset of { 1 , 2 , , n } , J ¯ { 1 , 2 , , n } \ J , P J i J p i and P ¯ J P J ¯ = j J ¯ p j = 1 i J p i , where J { 1 , 2 , , n } .

It is easy to find that if J = { k } , then inequalities (3) imply inequalities (2).

In 2016, Popescu et al. defined a new refined functional as follows [4]:

D ( f , p , x ; J , J 1 , J 2 , , J m ) i = 1 m P J i f 1 P J i j J i p j x j + P ¯ J f 1 P ¯ J j J ¯ p j x j ,

where J 1 , J 2 , , J m are nonempty, pairwise disjoint subsets of J, with J = i J i and P J i j J i p j . It is easy to observe that

i = 1 m P J i + P ¯ J = 1 ,

and in order to make sense, m should be less or equal with the cardinal of J, that is, 1 m J . If m = 1 , then

D ( f , p , x ; J , J 1 ) = D ( f , p , x ; J ) .

Then Theorem 1.2 can be generalized as follows:

Theorem 1.3

(P. G. Popescu et al.) Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. If x k C and p k > 0 , k { 1 , 2 , , n } with k = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , , n } , we have

(4) k = 1 n p k f ( x k ) D ( f , p , x ; J , J 1 , J 2 , , J m ) D ( f , p , x ; J ) f k = 1 n p k x k ,

where J 1 , J 2 , , J m are nonempty, pairwise disjoint subsets of J, with J = i J i and m | J | .

In [5], Horváth developed a general method to refine the discrete Jensen’s inequality in the convex and mid-convex cases. The main part of the inequalities in Theorems 1.2 and 1.3 are special cases of Theorem 1 in the paper. Recently, Horváth et al. [6] presented new upper bounds for the Shannon entropy (see Corollary 1) and defined an extended f-divergence functional (see Definition 2) by applying a cyclic refinement of Jensen’s inequality. For more other refinements and applications related to Jensen’s inequality, see [717].

The main aim of this paper is to extend the results of Dragomir [3] and Popescu et al. [4] by the aforementioned functional. In Section 2, we give refinement of Jensen’s inequality associated with the general functionals. The refinement demonstrates some estimates of Jensen’s gap and tightens the inequalities (4). In Section 3, we show the applications in information theory. We propose and prove new tighter upper bounds for Shannon’s entropy compared to the bound given in [4]. At last, we obtain new bounds for some f-divergences better than the bounds given in [3].

2 General inequalities by generalization

We continue to use the aforementioned definition and show the main results.

Theorem 2.1

Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. If x k C and p k > 0 , k { 1 , 2 , , n } with i = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , , n } , we have

(5) k =1 n p k f ( x k ) max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m + 1 ) max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m ) f k =1 n p k x k .

Proof

We assume the value of max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m ) is obtained for J i = J i ( m ) , 1 i m .

If m + 1 = n and each subset J i ( m ) ( 1 i m ) and J ¯ ( m ) contain one element, then we can easily obtain that the inequalities (5) hold as follows:

k = 1 n p k f ( x k ) = max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m + 1 ) = max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m ) f k = 1 n p k x k .

Otherwise, there exists a subset J i ( m ) ( 1 i m ) or J ¯ ( m ) , which contains more than one element. Without loss of generality we assume J m ( m ) contains more than one element. Then we find two nonempty subsets J m ( m + 1 ) , J m + 1 ( m + 1 ) such that J m ( m + 1 ) J m + 1 ( m + 1 ) = J m ( m ) and J m ( m + 1) J m + 1 ( m + 1) = . By using Jensen’s inequality, we have

P J m ( m + 1 ) P J m ( m + 1 ) J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) j J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) P J m ( m + 1 ) J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) j J m + 1 ( m + 1 ) p j x j f P J m ( m + 1 ) P J m ( m + 1 ) J m + 1 ( m + 1 ) 1 P J m ( m + 1 ) j J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) P J m ( m + 1 ) J m + 1 ( m + 1 ) 1 P J m + 1 ( m + 1 ) j J m + 1 ( m + 1 ) p j x j = f 1 P J m ( m + 1 ) J m + 1 ( m + 1 ) j J m ( m + 1 ) J m + 1 ( m + 1 ) p j x j .

The aforementioned inequality can be rewritten as:

P J m ( m + 1 ) f 1 P J m ( m + 1 ) j J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) j J m + 1 ( m + 1 ) p j x j P J m ( m + 1 ) J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) J m + 1 ( m + 1 ) j J m ( m + 1 ) J m + 1 ( m + 1 ) p j x j .

So let J i ( m + 1 ) = J i ( m ) , 1 i m 1 , we can deduce that

max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m + 1 ) D ( f , p , x ; J ( m + 1 ) , J 1 ( m + 1 ) , J 2 ( m + 1 ) , , J m 1 ( m + 1 ) , J m ( m + 1 ) , J m + 1 ( m + 1 ) ) = i = 1 m 1 P J i ( m + 1 ) f 1 P J i ( m + 1 ) j J i ( m + 1 ) p j x j + P J m ( m + 1 ) f 1 P J m ( m + 1 ) j J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) j J m + 1 ( m + 1 ) p j x j + P ¯ J ( m + 1 ) f 1 P ¯ J ( m + 1 ) j J ¯ ( m + 1 ) p j x j i = 1 m 1 P J i ( m + 1 ) f 1 P J i ( m + 1 ) j J i ( m + 1 ) p j x j + P J m ( m + 1 ) J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) J m + 1 ( m + 1 ) j J m ( m + 1 ) J m + 1 ( m + 1 ) p j x j + P ¯ J ( m + 1 ) f 1 P ¯ J ( m + 1 ) j J ¯ ( m + 1 ) p j x j = i = 1 m 1 P J i ( m ) f 1 P J i ( m ) j J i ( m ) p j x j + P J m ( m ) f 1 P J m ( m ) j J m ( m ) p j x j + P ¯ J ( m ) f 1 P ¯ J ( m ) j J ¯ ( m ) p j x j = D ( f , p , x ; J ( m ) , J 1 ( m ) , J 2 ( m ) , , J m 1 ( m ) , J m ( m ) ) = max J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m ) .

So the middle inequality in (5) holds.

The first inequality and the last inequality in (5) can be seen from Theorem 1.3.□

Theorem 2.2

Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. If x k C and p k > 0 , k { 1 , 2 , , n } with i = k n p k = 1 , then for any nonempty subset J of { 1 , 2 , , n } , we have

(6) k = 1 n p k f ( x k ) min J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m + 1 ) min J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m ) f k = 1 n p k x k .

Proof

We assume the value of min J { 1 , , n } D ( f , p , x ; J , J 1 , J 2 , , J m + 1 ) is obtained for J i = J i ( m + 1 ) , 1 i m + 1 . Then we let two nonempty subsets J m ( m + 1 ) , J m + 1 ( m + 1 ) such that J m ( m + 1 ) J m + 1 ( m + 1 ) = J m ( m ) . Using the similar method in Theorem 2.1, inequalities (6) can be obtained.□

Now we say that S 1 , S 2 , , S m generate a partition of the set S if they are pairwise disjoint and nonempty sets with i = 1 m S i = S . Then the main results above are given as follows:

Theorem 2.3

Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. Assume further that x k C and p k > 0 , k { 1 , 2 , , n } with i = 1 n p k = 1 . If A m denotes all partitions of the set { 1 , 2 , , n } with m elements ( m = 1 , 2 , , n ) , then

(7) k = 1 n p k f ( x k ) max { J 1 , J 2 , , J n 1 } A n 1 D ( f , p , x ; J 1 , J 2 , , J n 1 ) max { J 1 , J 2 , , J m } A m D ( f , p , x ; J 1 , J 2 , , J m ) max { J 1 , J 2 } A 2 D ( f , p , x ; J 1 , J 2 ) f k = 1 n p k x k ,

where

D ( f , p , x ; J 1 , J 2 , , J m ) i = 1 m P J i f 1 P J i j J i p j x j , m = 1 , 2 , , n .

Proof

Since the first inequality and the last inequality follow from Theorem 1.3, we can suppose that n 4 , and we need only to prove that

max { J 1 , J 2 , , J m + 1 } A m + 1 D ( f , p , x ; J 1 , J 2 , , J m + 1 ) max { J 1 , J 2 , , J m } A m D ( f , p , x ; J 1 , J 2 , , J m ) .

for every m = 2 , , n 2 . It is enough to show that for each fixed { J 1 , J 2 , , J m } A m there exists { K 1 , K 2 , , K m + 1 } A m + 1 such that

D ( f , p , x ; K 1 , K 2 , , K m + 1 ) D ( f , p , x ; J 1 , J 2 , , J m ) .

Since n 4 and m { 2 , , n 2 } , one of the sets J 1 , J 2 , , J m contains at least two elements. We can suppose that

J m = K m K m + 1 ,

where K m and K m + 1 are disjoint and nonempty sets. Then { J 1 , J 2 , , J m 1 , K m , K m + 1 } A m + 1 and

P J m f 1 P J m j J m p j x j = P J m f P K m P J m 1 P K m j K m p j x j + P K m + 1 P J m 1 P K m + 1 j K m + 1 p j x j .

By P K m P J m + P K m + 1 P J m = 1 , Jensen’s inequality can be applied, and we obtained from the aforementioned equality that

P J m f 1 P J m j J m p j x j P K m f 1 P K m j K m p j x j + P K m + 1 f 1 P K m + 1 j K m + 1 p j x j ,

and this gives the result.

The proof is complete.□

Theorem 2.4

Let C be a convex subset in the real linear space X and assume that f : C is a convex function on C. Assume further that x k C and p k > 0 , k { 1 , 2 , , n } with i = 1 n p k = 1 . If A m denotes all partitions of the set { 1 , 2 , , n } with m elements ( m = 1 , 2 , , n ) , then

(8) k = 1 n p k f ( x k ) min { J 1 , J 2 , , J n 1 } A n 1 D ( f , p , x ; J 1 , J 2 , , J n 1 ) min { J 1 , J 2 , , J m } A m D ( f , p , x ; J 1 , J 2 , , J m ) min { J 1 , J 2 } A 2 D ( f , p , x ; J 1 , J 2 ) f k = 1 n p k x k ,

where

D ( f , p , x ; J 1 , J 2 , , J m ) i = 1 m P J i f 1 P J i j J i p j x j , m = 1 , 2 , , n .

Proof

Analyzing the proof of Theorem 2.3, we can see the next: if { K 1 , K 2 , , K m + 1 } A m + 1 is a refinement of { J 1 , J 2 , , J m } A m (every element of { K 1 , K 2 , , K m + 1 } is contained in an element of { J 1 , J 2 , , J m } ), then

D ( f , p , x ; K 1 , K 2 , , K m + 1 ) D ( f , p , x ; J 1 , J 2 , , J m )

holds. Since each partition from A m + 1 is a refinement of a partition from A m , the result follows.

The proof is complete.□

3 Applications in information theory

3.1 New upper bounds for Shannon’s entropy

As the consistent work, bounds for Shannon’s entropy [18] can be found in [4,8,10,15]. For further discussion, we present the definition of Shannon’s entropy first. If the discrete probability distribution n is given by P ( X = i ) = p i , p i > 0 , i = 1 , 2 , , n , s . t . i = 1 n p i = 1 , then Shannon’s entropy is defined as

H ( X ) i = 1 n p i log 1 p i .

In [4], Popescu et al. obtained a new upper bound for entropy as follows:

(9) H ( X ) min J , J 1 , J 2 , , J m log i = 1 m | J i | P J i P J i | J ¯ | P ¯ J P ¯ J .

Furthermore, considering the aforementioned results the following tighter bounds for Shannon’s entropy are presented.

Theorem 3.1

Let H ( X ) be defined as above, under the assumptions of Theorem 2.1, the following inequalities hold:

(10) H ( X ) min J , J 1 , J 2 , , J m + 1 log i =1 m + 1 | J i | P J i P J i | J ¯ | P ¯ J P ¯ J min J , J 1 , J 2 , , J m log i =1 m | J i | P J i P J i J ¯ P ¯ J P ¯ J min J , J 1 log | J 1 | P J 1 P J 1 | J ¯ | P ¯ J P ¯ J log n .

Proof

Taking into consideration the inequalities of Theorem 2.1 applied for the convex function f ( x ) = log x and x i = 1 / p i , 1 i n , then

k = 1 n p k log 1 p k max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 P J i log 1 P J i j J i p j 1 p j P ¯ J log 1 J ¯ j J ¯ p j 1 p j max J , J 1 , J 2 , , J m i = 1 m P J i log 1 P J i j J i p j 1 p j P ¯ J log 1 P ¯ J j J ¯ p j 1 p j log k = 1 n p k 1 p k .

Those inequalities are equivalent with

H ( X ) min J , J 1 , J 2 , , J m + 1 i = 1 m + 1 log | J i | P J i P J i + log | J ¯ | P ¯ J P ¯ J min J , J 1 , J 2 , , J m i = 1 m log | J i | P J i P J i + log | J ¯ | P ¯ J P ¯ J log n .

Let m have the value from 1 to n 1 and the inequalities (10) are deduced.□

Theorem 3.2

Let H ( X ) be defined as above, under the assumptions of Theorem 2.3, the following inequalities hold:

(11) H ( X ) min { J 1 , J 2 , , J n 1 } A n - 1 log i = 1 n 1 | J i | P J i P J i min { J 1 , J 2 , , J m } A m log i = 1 m | J i | P J i P J i min { J 1 , J 2 } A 2 log | J 1 | P J 1 P J 1 | J 2 | P J 2 P J 2 log n .

Proof

Taking into consideration the inequalities of Theorem 2.3, we have the inequalities (11) by the similar method above.□

3.2 New lower bounds for f-divergence measures

Given a convex function f : [ 0 , ) , the f-divergence functional

(12) I f ( p , q ) i = 1 n q i f p i q i ,

where p = ( p 1 , , p n ) , q = ( q 1 , , q n ) are positive sequences, was introduced by Csiszár in [19], as a generalized measure of information, a “distance function” on the set of probability distributions n . As in [19], we interpret undefined expressions by

f ( 0 ) = lim t 0 + f ( t ) ; 0 f 0 0 = 0 ; 0 f a 0 = lim q 0 + q f a q = a lim t f ( t ) t , a > 0 .

The following results were essentially given by Csiszár and Körner [20]:

  1. If f is convex, then I f ( p , q ) is jointly convex in p and q ;

  2. For every p , q + n , we have

(13) I f ( p , q ) i = 1 n q i f i = 1 n p i i = 1 n q i .

If f is strictly convex, equality holds in (13) iff

p 1 q 1 = p 2 q 2 = = p n q n .

If f is normalized, i.e., f ( 1 ) = 0 , then for every p , q + n with i = 1 n p i = i = 1 n q i , we have the inequality

(14) I f ( p , q ) 0 .

In particular, if p , q n , then (14) holds. This is the well-known nonnegative property of the f-divergence.

Dragomir gives the concept for functions defined on a cone in a linear space as follows [3]:

In the first place, we recall that the subset K in a linear space X is a cone if the following two conditions are satisfied:

  1. for any x , y K we have x + y K ;

  2. for any x K and any α 0 we have α x K .

For a given n-tuple of vectors z = ( z 1 , , z n ) K n and a probability distribution q n with all values nonzero, we can define, for the convex function f : K , the following f-divergence of z with the distribution q

(15) I f ( z , q ) i = 1 n q i f z i q i .

It is obvious that if X , K = [ 0 , ) and x = p n , then we obtain the usual concept of the f-divergence associated with a function f : [ 0 , ) . Now, for a given n-tuple of vectors x = ( x 1 , , x n ) K n , a probability distribution q n with all values nonzero and for any nonempty pairwise disjoint subsets J 1 , J 2 , , J m , J ¯ of { 1 , , n } we have

q J ( m ) ( Q J 1 , Q J 2 , , Q J m , Q ¯ J ) m + 1

and

x J ( m ) ( X J 1 , X J 2 , , X J m , X ¯ J ) m + 1 ,

where Q I = i I q i , Q ¯ J = Q J ¯ , and X I = i I x i , X ¯ J = X J ¯ .

Let

(16) I f ( x J ( m ) , q J ( m ) ) i = 1 m Q J i f X J i Q J i + Q ¯ J f X ¯ J Q ¯ J .

The following inequalities for the f-divergence of an n-tuple of vectors in a linear space holds, which are better than the inequalities given in [3].

Theorem 3.3

Let f : K be a convex function on the cone K. Then for any n-tuple of vectors x = ( x 1 , , x n ) K n , a probability distribution q n with all values nonzero and for any nonempty pairwise disjoint subsets J 1 , J 2 , , J m , J ¯ of { 1 , , n } we have

(17) I f ( x , q ) max J , J 1 , J 2 , , J m + 1 I f x J ( m + 1 ) , q J ( m + 1 ) max J , J 1 , J 2 , , J m I f x J ( m ) , q J ( m ) max J , J 1 I f x J ( 1 ) , q J ( 1 ) f ( X n ) ,

where X n i = 1 n x i .

Proof

The aforementioned inequalities are obtained directly from Theorem 2.1 by letting p i q i and x i x i q i .□

Theorem 3.4

Let f : K be a convex function on the cone K. Then for any n-tuple of vectors x = ( x 1 , , x n ) K n , a probability distribution q n with all values nonzero and for any nonempty any nonempty pairwise disjoint subsets J 1 , J 2 , , J m , J ¯ of { 1 , , n } we have

(18) I f ( x , q ) min J , J 1 , J 2 , , J m + 1 I f x J ( m + 1 ) , q J ( m + 1 ) min J , J 1 , J 2 , , J m I f x J ( m ) , q J ( m ) min J , J 1 I f x J ( 1 ) , q J ( 1 ) f ( X n ) ,

where X n i = 1 n x i .

Proof

The aforementioned inequalities are obtained directly from Theorem 2.2 by letting p i q i and x i x i q i .□

In the scalar case and if x = p n , a sufficient condition for the positivity of the f-divergence I f ( p , q ) is that f ( 1 ) 0 . The case of functions of a real variable that is meaningful for applications is involved in the following:

Corollary 3.1

Let I f ( x , q ) be defined as above, under the assumptions of Theorem 3.2, the following inequalities hold:

(19) I f ( p , q ) max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 Q J i f P J i Q J i + Q ¯ J f P ¯ J Q ¯ J max J , J 1 , J 2 , , J m i = 1 m Q J i f P J i Q J i + Q ¯ J f P ¯ J Q ¯ J max J , J 1 Q J 1 f P J 1 Q J 1 + Q ¯ J f P ¯ J Q ¯ J f ( 1 ) = 0 .

In what follows, we provide some lower bounds for a number of f-divergences that are used in various fields of information theory, probability theory and statistics.

The total variation distance is defined by the convex function f ( t ) = | t 1 | , t and given by

(20) V ( p , q ) i =1 n q i p i q i 1 = i =1 n p i q i .

Proposition 3.1

For any p , q n , we have the inequality

(21) V ( p , q ) max J , J 1 , J 2 , , J m + 1 i =1 m + 1 P J i Q J i + P ¯ J Q ¯ J max J , J 1 , J 2 , , J m i =1 m P J i Q J i + P ¯ J Q ¯ J 2 max J , J 1 P J 1 Q J 1 ( 0) .

Proof

The proof follows by the inequalities (19) for the convex function f ( t ) = | t 1 | , t .□

The K. Pearson χ 2 -divergence [21] is obtained for the convex function f ( t ) = ( 1 t ) 2 , t and given by

(22) χ 2 ( p , q ) i = 1 n q i p i q i 1 2 = i = 1 n ( p i q i ) 2 q i .

Proposition 3.2

For any p , q n , we have the inequality

(23) χ 2 ( p , q ) max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 ( P J i Q J i ) 2 Q J i + ( P ¯ J Q ¯ J ) 2 Q ¯ J max J , J 1 , J 2 , , J m i = 1 m + 1 ( P J i Q J i ) 2 Q J i + ( P ¯ J Q ¯ J ) 2 Q ¯ J max J , J 1 ( P J 1 Q J 1 ) 2 Q J 1 ( 1 Q J 1 ) 4 max J , J 1 ( P J 1 Q J 1 ) 2 ( 0 ) .

Proof

Using the inequalities (19) for the convex function f ( t ) = ( 1 t ) 2 , t , we get the inequalities

χ 2 ( p , q ) max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 ( P J i Q J i ) 2 Q J i + ( P ¯ J Q ¯ J ) 2 Q ¯ J max J , J 1 , J 2 , , J m i = 1 m + 1 ( P J i Q J i ) 2 Q J i + ( P ¯ J Q ¯ J ) 2 Q ¯ J max J , J 1 ( P J 1 Q J 1 ) 2 Q J 1 + ( P ¯ J Q ¯ J ) 2 Q ¯ J = max J , J 1 ( P J 1 Q J 1 ) 2 Q J 1 ( 1 Q J 1 ) .

Since

Q J 1 ( 1 Q J 1 ) 1 4 Q J 1 + ( 1 Q J 1 ) 2 = 1 4 ,

then

( P J 1 Q J 1 ) 2 Q J 1 ( 1 Q J 1 ) 4 ( P J 1 Q J 1 ) 2 ,

which proves the last part of inequalities (23).□

The Kullback-Leibler divergence [22] can be obtained for the convex function f ( t ) = t ln t , t > 0 and given by

(24) K L ( p , q ) i = 1 n q i p i q i ln p i q i = i = 1 n p i ln p i q i .

Proposition 3.3

For any p , q n , we have the inequality

(25) K L ( p , q ) ln max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 P J i Q J i P J i P ¯ J Q ¯ J P ¯ J ln max J , J 1 , J 2 , , J m i = 1 m P J i Q J i P J i P ¯ J Q ¯ J P ¯ J ln max J , J 1 P J 1 Q J 1 P J 1 P ¯ J Q ¯ J P ¯ J 0 .

Proof

Using the inequalities (19) for the convex function f ( t ) = t ln t , t > 0 , we get the inequalities

K L ( p , q ) ln max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 P J i Q J i P J i P ¯ J Q ¯ J P ¯ J ln max J , J 1 , J 1 , , J m i = 1 m P J i Q J i P J i P ¯ J Q ¯ J P ¯ J ln max J , J 1 P J 1 Q J 1 P J 1 P ¯ J Q ¯ J P ¯ J = ln max J , J 1 P J 1 Q J 1 P J 1 1 P J 1 1 Q J 1 1 P J 1 .

Utilizing the geometric-harmonic mean inequality

x w y 1 w 1 w x + 1 w y , x , y > 0 , 0 w 1 ,

we have for x = P J 1 Q J 1 , y = 1 P J 1 1 Q J 1 , and w = P J 1 that

P J 1 Q J 1 P J 1 1 P J 1 1 Q J 1 1 P J 1 1 ,

which proves the last part of inequalities (25).□

The Jeffrey’s divergence [23] that has great importance in information theory can be obtained for the convex function f ( t ) = ( t 1 ) ln t , t > 0 and given by

(26) J ( p , q ) i = 1 n q i p i q i 1 ln p i q i = i = 1 n ( p i q i ) ln p i q i .

Proposition 3.4

For any p , q n , we have the inequality

(27) J ( p , q ) ln max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 P J i Q J i P J i Q J i P ¯ J Q ¯ J P ¯ J Q ¯ J ln max J , J 1 , J 2 , , J m i = 1 m + 1 P J i Q J i P J i Q J i P ¯ J Q ¯ J P ¯ J Q ¯ J ln max J , J 1 ( 1 P J 1 ) Q J 1 ( 1 Q J 1 ) P J 1 Q J 1 P J 1 max J , J 1 2 ( Q J 1 P J 1 ) 2 P J 1 + Q J 1 2 P J 1 Q J 1 0 .

Proof

Applying the inequalities (19) for the convex function f ( t ) = ( t 1 ) ln t , t > 0 , we get the inequalities

J ( p , q ) ln max J , J 1 , J 2 , , J m + 1 i = 1 m + 1 P J i Q J i P J i Q J i P ¯ J Q ¯ J P ¯ J Q ¯ J ln max J , J 1 , J 2 , , J m i = 1 m + 1 P J i Q J i P J i Q J i P ¯ J Q ¯ J P ¯ J Q ¯ J ln max J , J 1 ( 1 P J 1 ) Q J 1 ( 1 Q J 1 ) P J 1 Q J 1 P J 1 .

Utilizing the elementary inequality for positive numbers

ln b ln a b a 2 a + b , a , b > 0 ,

we have

( Q J 1 P J 1 ) 2 Q J 1 ( 1 Q J 1 ) ln 1 P J 1 1 Q J 1 ln P J 1 Q J 1 1 P J 1 1 Q J 1 P J 1 Q J 1 ( Q J 1 P J 1 ) 2 Q J 1 ( 1 Q J 1 ) 2 1 P J 1 1 Q J 1 + P J 1 Q J 1 .

This inequality derives

( Q J 1 P J 1 ) ln 1 P J 1 1 Q J 1 ln P J 1 Q J 1 2 ( Q J 1 P J 1 ) 2 P J 1 ( 1 Q J 1 ) + Q J 1 ( 1 P J 1 ) 0 .

Rewriting the aforementioned inequalities the last part of the inequalities (27) can be obtained.□

Moreover, all the aforementioned theorems, corollaries, and propositions can also be changed into comparable versions according to Theorems 2.3 and 2.4.

4 Conclusion

The classical Jensen’s inequality plays a very important role in both theory and applications. In this paper, we have obtained some refinements of Jensen’s inequality (5)–(8) in real linear space using the generalized Popescu et al. functional. Moreover, we have obtained the new and sharp bounds of Shannon’s entropy and several f-divergence measures in information theory. In the future work, we will continue to explore other applications on the inequalities newly obtained in Section 2.

Acknowledgement

The authors would like to thank the editor and referees for their very helpful suggestions and comments on the manuscript sincerely. This manuscript was supported by National Social Science Fund of China (17BTJ007), “the Fundamental Research Funds for the Central Universities,” Zhongnan University of Economics and Law (2722020JCT031), MOE (Ministry of Education in China) Youth Foundation Project of Humanities and Social Sciences (19YJCZH111), Natural Science Foundation of Hubei Province (2017CFB145), and Technology Innovation Special Soft Science Research Program of Hubei Province (2019ADC136).

References

[1] J. L. W. V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta Math. 30 (1906), no. 1, 175–193.10.1007/BF02418571Search in Google Scholar

[2] S. S. Dragomir, A refinement of Jensen’s inequality with applications for f-divergence measures, Taiwanese J. Math. 14 (2010), no. 1, 153–164.10.11650/twjm/1500405733Search in Google Scholar

[3] S. S. Dragomir, A new refinement of Jensen’s inequality in linear spaces with applications, Math. Comput. Model. 52 (2010), 1497–1505.10.1016/j.mcm.2010.05.035Search in Google Scholar

[4] P. G. Popescu, E. I. Slusanschi, V. Iancu, and F. Pop, A new upper bound for Shannon entropy. A novel approach in modeling of Big Data applications, Concurr. Comp.-Pract. Ex. 28 (2016), no. 2, 351–359.10.1002/cpe.3444Search in Google Scholar

[5] L. Horvǎth, A method to refine the discrete Jensen’s inequality for convex and mid-convex functions, Math. Comput. Model. 54 (2011), 2451–2459.10.1016/j.mcm.2011.05.060Search in Google Scholar

[6] L. Horváth, Ɖ. Pečarić and J. Pečarić, Estimations of f- and Rényi divergences by using a cyclic refinement of the Jensens inequality, Bull. Malays. Math. Sci. Soc. 42 (2019), 933–946.10.1007/s40840-017-0526-4Search in Google Scholar

[7] S. Simic, Best possible global bounds for Jensen’s inequality, Appl. Math. Comput. 215 (2009), no. 6, 2224–2228.10.1016/j.amc.2009.08.062Search in Google Scholar

[8] S. Simic, Jensen’s inequality and new entropy bounds, Appl. Math. Lett. 22 (2009), no. 8, 1262–1265.10.1016/j.aml.2009.01.040Search in Google Scholar

[9] L. Horvǎth and J. Pečarić, A refinement of the discrete Jensen’s inequality, Math. Inequal. Appl. 14 (2011), no. 4, 777–791.10.7153/mia-14-64Search in Google Scholar

[10] N. Ţǎpuş and P. G. Popescu, A new entropy upper bound, Appl. Math. Lett. 25 (2012), no. 11, 1887–1890.10.1016/j.aml.2012.02.056Search in Google Scholar

[11] L. Horvǎth, Weighted form of a recent refinement of the discrete Jensens inequality, Math. Inequal. Appl. 17 (2014), no. 3, 947–961.10.7153/mia-17-69Search in Google Scholar

[12] S. G. Walker, On a lower bound for the Jensen inequality, SIAM J. Math. Anal. 46 (2014), no. 5, 3151–3157.10.1137/140954015Search in Google Scholar

[13] S. S. Dragomir, M. A. Khan and A. Abathun, Refinement of the Jensen integral inequality, Open Math. 14 (2016), no. 1, 221–228.10.1515/math-2016-0020Search in Google Scholar

[14] M. Sababheh, Improved Jensen’s inequality, Math. Inequal. Appl. 20 (2017), no. 2, 389–403.10.7153/mia-20-27Search in Google Scholar

[15] G. Lu, New refinements of Jensens inequality and entropy upper bounds, J. Math. Inequal. 12 (2018), no. 2, 403–421.10.7153/jmi-2018-12-30Search in Google Scholar

[16] M. Adil Khan, M. Hanif, Z. A. Khan, K. Ahmad, and Y. M. Chu, Association of Jensen inequality for s-convex function, J. Inequal. Appl. 2019 (2019), art. 162, https://doi.org/10.1186/s13660-019-2112-9.10.1186/s13660-019-2112-9Search in Google Scholar

[17] M. Adil Khan, Z. Husain and Y. M. Chu, New estimates for Csiszár divergence and Zipf-Mandelbrot entropy via Jensen-Mercers inequality, Complexity 2020 (2020), art. 8928691, https://doi.org/10.1155/2020/8928691.10.1155/2020/8928691Search in Google Scholar

[18] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd edn., John Wiley and Sons, Inc., New York, 2006.Search in Google Scholar

[19] I. Csiszár, Information-type measures of differences of probability distributions and indirect observations, Studia Sci. Math. Hung. 2 (1967), 299–318.Search in Google Scholar

[20] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, 1981.Search in Google Scholar

[21] K. Pearson, On the criterion that a give system of deviations from the probable in the case of correlated system of variables in such that it can be reasonable supposed to have arisen from random sampling, Phil. Mag. 50 (1900), no. 302, 157–172.10.1080/14786440009463897Search in Google Scholar

[22] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951), no. 1, 79–86.10.1214/aoms/1177729694Search in Google Scholar

[23] H. Jeffreys, An invariant form for the prior probability in estimation problems, Proc. Roy. Soc. Lon. Ser. A 186 (1946), no. 1007, 453–461.10.1098/rspa.1946.0056Search in Google Scholar PubMed

Received: 2020-01-26
Revised: 2020-11-24
Accepted: 2020-11-25
Published Online: 2020-12-31

© 2020 Lei Xiao and Guoxiang Lu, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Regular Articles
  2. Non-occurrence of the Lavrentiev phenomenon for a class of convex nonautonomous Lagrangians
  3. Strong and weak convergence of Ishikawa iterations for best proximity pairs
  4. Curve and surface construction based on the generalized toric-Bernstein basis functions
  5. The non-negative spectrum of a digraph
  6. Bounds on F-index of tricyclic graphs with fixed pendant vertices
  7. Crank-Nicolson orthogonal spline collocation method combined with WSGI difference scheme for the two-dimensional time-fractional diffusion-wave equation
  8. Hardy’s inequalities and integral operators on Herz-Morrey spaces
  9. The 2-pebbling property of squares of paths and Graham’s conjecture
  10. Existence conditions for periodic solutions of second-order neutral delay differential equations with piecewise constant arguments
  11. Orthogonal polynomials for exponential weights x2α(1 – x2)2ρe–2Q(x) on [0, 1)
  12. Rough sets based on fuzzy ideals in distributive lattices
  13. On more general forms of proportional fractional operators
  14. The hyperbolic polygons of type (ϵ, n) and Möbius transformations
  15. Tripled best proximity point in complete metric spaces
  16. Metric completions, the Heine-Borel property, and approachability
  17. Functional identities on upper triangular matrix rings
  18. Uniqueness on entire functions and their nth order exact differences with two shared values
  19. The adaptive finite element method for the Steklov eigenvalue problem in inverse scattering
  20. Existence of a common solution to systems of integral equations via fixed point results
  21. Fixed point results for multivalued mappings of Ćirić type via F-contractions on quasi metric spaces
  22. Some inequalities on the spectral radius of nonnegative tensors
  23. Some results in cone metric spaces with applications in homotopy theory
  24. On the Malcev products of some classes of epigroups, I
  25. Self-injectivity of semigroup algebras
  26. Cauchy matrix and Liouville formula of quaternion impulsive dynamic equations on time scales
  27. On the symmetrized s-divergence
  28. On multivalued Suzuki-type θ-contractions and related applications
  29. Approximation operators based on preconcepts
  30. Two types of hypergeometric degenerate Cauchy numbers
  31. The molecular characterization of anisotropic Herz-type Hardy spaces with two variable exponents
  32. Discussions on the almost 𝒵-contraction
  33. On a predator-prey system interaction under fluctuating water level with nonselective harvesting
  34. On split involutive regular BiHom-Lie superalgebras
  35. Weighted CBMO estimates for commutators of matrix Hausdorff operator on the Heisenberg group
  36. Inverse Sturm-Liouville problem with analytical functions in the boundary condition
  37. The L-ordered L-semihypergroups
  38. Global structure of sign-changing solutions for discrete Dirichlet problems
  39. Analysis of F-contractions in function weighted metric spaces with an application
  40. On finite dual Cayley graphs
  41. Left and right inverse eigenpairs problem with a submatrix constraint for the generalized centrosymmetric matrix
  42. Controllability of fractional stochastic evolution equations with nonlocal conditions and noncompact semigroups
  43. Levinson-type inequalities via new Green functions and Montgomery identity
  44. The core inverse and constrained matrix approximation problem
  45. A pair of equations in unlike powers of primes and powers of 2
  46. Miscellaneous equalities for idempotent matrices with applications
  47. B-maximal commutators, commutators of B-singular integral operators and B-Riesz potentials on B-Morrey spaces
  48. Rate of convergence of uniform transport processes to a Brownian sheet
  49. Curves in the Lorentz-Minkowski plane with curvature depending on their position
  50. Sequential change-point detection in a multinomial logistic regression model
  51. Tiny zero-sum sequences over some special groups
  52. A boundedness result for Marcinkiewicz integral operator
  53. On a functional equation that has the quadratic-multiplicative property
  54. The spectrum generated by s-numbers and pre-quasi normed Orlicz-Cesáro mean sequence spaces
  55. Positive coincidence points for a class of nonlinear operators and their applications to matrix equations
  56. Asymptotic relations for the products of elements of some positive sequences
  57. Jordan {g,h}-derivations on triangular algebras
  58. A systolic inequality with remainder in the real projective plane
  59. A new characterization of L2(p2)
  60. Nonlinear boundary value problems for mixed-type fractional equations and Ulam-Hyers stability
  61. Asymptotic normality and mean consistency of LS estimators in the errors-in-variables model with dependent errors
  62. Some non-commuting solutions of the Yang-Baxter-like matrix equation
  63. General (p,q)-mixed projection bodies
  64. An extension of the method of brackets. Part 2
  65. A new approach in the context of ordered incomplete partial b-metric spaces
  66. Sharper existence and uniqueness results for solutions to fourth-order boundary value problems and elastic beam analysis
  67. Remark on subgroup intersection graph of finite abelian groups
  68. Detectable sensation of a stochastic smoking model
  69. Almost Kenmotsu 3-h-manifolds with transversely Killing-type Ricci operators
  70. Some inequalities for star duality of the radial Blaschke-Minkowski homomorphisms
  71. Results on nonlocal stochastic integro-differential equations driven by a fractional Brownian motion
  72. On surrounding quasi-contractions on non-triangular metric spaces
  73. SEMT valuation and strength of subdivided star of K 1,4
  74. Weak solutions and optimal controls of stochastic fractional reaction-diffusion systems
  75. Gradient estimates for a weighted nonlinear parabolic equation and applications
  76. On the equivalence of three-dimensional differential systems
  77. Free nonunitary Rota-Baxter family algebras and typed leaf-spaced decorated planar rooted forests
  78. The prime and maximal spectra and the reticulation of residuated lattices with applications to De Morgan residuated lattices
  79. Explicit determinantal formula for a class of banded matrices
  80. Dynamics of a diffusive delayed competition and cooperation system
  81. Error term of the mean value theorem for binary Egyptian fractions
  82. The integral part of a nonlinear form with a square, a cube and a biquadrate
  83. Meromorphic solutions of certain nonlinear difference equations
  84. Characterizations for the potential operators on Carleson curves in local generalized Morrey spaces
  85. Some integral curves with a new frame
  86. Meromorphic exact solutions of the (2 + 1)-dimensional generalized Calogero-Bogoyavlenskii-Schiff equation
  87. Towards a homological generalization of the direct summand theorem
  88. A standard form in (some) free fields: How to construct minimal linear representations
  89. On the determination of the number of positive and negative polynomial zeros and their isolation
  90. Perturbation of the one-dimensional time-independent Schrödinger equation with a rectangular potential barrier
  91. Simply connected topological spaces of weighted composition operators
  92. Generalized derivatives and optimization problems for n-dimensional fuzzy-number-valued functions
  93. A study of uniformities on the space of uniformly continuous mappings
  94. The strong nil-cleanness of semigroup rings
  95. On an equivalence between regular ordered Γ-semigroups and regular ordered semigroups
  96. Evolution of the first eigenvalue of the Laplace operator and the p-Laplace operator under a forced mean curvature flow
  97. Noetherian properties in composite generalized power series rings
  98. Inequalities for the generalized trigonometric and hyperbolic functions
  99. Blow-up analyses in nonlocal reaction diffusion equations with time-dependent coefficients under Neumann boundary conditions
  100. A new characterization of a proper type B semigroup
  101. Constructions of pseudorandom binary lattices using cyclotomic classes in finite fields
  102. Estimates of entropy numbers in probabilistic setting
  103. Ramsey numbers of partial order graphs (comparability graphs) and implications in ring theory
  104. S-shaped connected component of positive solutions for second-order discrete Neumann boundary value problems
  105. The logarithmic mean of two convex functionals
  106. A modified Tikhonov regularization method based on Hermite expansion for solving the Cauchy problem of the Laplace equation
  107. Approximation properties of tensor norms and operator ideals for Banach spaces
  108. A multi-power and multi-splitting inner-outer iteration for PageRank computation
  109. The edge-regular complete maps
  110. Ramanujan’s function k(τ)=r(τ)r2(2τ) and its modularity
  111. Finite groups with some weakly pronormal subgroups
  112. A new refinement of Jensen’s inequality with applications in information theory
  113. Skew-symmetric and essentially unitary operators via Berezin symbols
  114. The limit Riemann solutions to nonisentropic Chaplygin Euler equations
  115. On singularities of real algebraic sets and applications to kinematics
  116. Results on analytic functions defined by Laplace-Stieltjes transforms with perfect ϕ-type
  117. New (p, q)-estimates for different types of integral inequalities via (α, m)-convex mappings
  118. Boundary value problems of Hilfer-type fractional integro-differential equations and inclusions with nonlocal integro-multipoint boundary conditions
  119. Boundary layer analysis for a 2-D Keller-Segel model
  120. On some extensions of Gauss’ work and applications
  121. A study on strongly convex hyper S-subposets in hyper S-posets
  122. On the Gevrey ultradifferentiability of weak solutions of an abstract evolution equation with a scalar type spectral operator on the real axis
  123. Special Issue on Graph Theory (GWGT 2019), Part II
  124. On applications of bipartite graph associated with algebraic structures
  125. Further new results on strong resolving partitions for graphs
  126. The second out-neighborhood for local tournaments
  127. On the N-spectrum of oriented graphs
  128. The H-force sets of the graphs satisfying the condition of Ore’s theorem
  129. Bipartite graphs with close domination and k-domination numbers
  130. On the sandpile model of modified wheels II
  131. Connected even factors in k-tree
  132. On triangular matroids induced by n3-configurations
  133. The domination number of round digraphs
  134. Special Issue on Variational/Hemivariational Inequalities
  135. A new blow-up criterion for the Nabc family of Camassa-Holm type equation with both dissipation and dispersion
  136. On the finite approximate controllability for Hilfer fractional evolution systems with nonlocal conditions
  137. On the well-posedness of differential quasi-variational-hemivariational inequalities
  138. An efficient approach for the numerical solution of fifth-order KdV equations
  139. Generalized fractional integral inequalities of Hermite-Hadamard-type for a convex function
  140. Karush-Kuhn-Tucker optimality conditions for a class of robust optimization problems with an interval-valued objective function
  141. An equivalent quasinorm for the Lipschitz space of noncommutative martingales
  142. Optimal control of a viscous generalized θ-type dispersive equation with weak dissipation
  143. Special Issue on Problems, Methods and Applications of Nonlinear analysis
  144. Generalized Picone inequalities and their applications to (p,q)-Laplace equations
  145. Positive solutions for parametric (p(z),q(z))-equations
  146. Revisiting the sub- and super-solution method for the classical radial solutions of the mean curvature equation
  147. (p,Q) systems with critical singular exponential nonlinearities in the Heisenberg group
  148. Quasilinear Dirichlet problems with competing operators and convection
  149. Hyers-Ulam-Rassias stability of (m, n)-Jordan derivations
  150. Special Issue on Evolution Equations, Theory and Applications
  151. Instantaneous blow-up of solutions to the Cauchy problem for the fractional Khokhlov-Zabolotskaya equation
  152. Three classes of decomposable distributions
Downloaded on 2.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/math-2020-0123/html
Scroll to top button