A new refinement of Jensen’s inequality with applications in information theory

Lei Xiao; Guoxiang Lu

doi:10.1515/math-2020-0123

Article Open Access

A new refinement of Jensen’s inequality with applications in information theory

Lei Xiao and Guoxiang Lu

Published/Copyright: December 31, 2020

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

$Open Mathematics$

From the journal Open Mathematics Volume 18 Issue 1

Abstract

In this paper, we present a new refinement of Jensen’s inequality with applications in information theory. The refinement of Jensen’s inequality is obtained based on the general functional in the work of Popescu et al. As the applications in information theory, we provide new tighter bounds for Shannon’s entropy and some f-divergences.

Keywords: refinements; Jensen’s inequality; information theory; Shannon’s entropy; f-divergences; bounds

MSC 2010: 26B25; 26D15; 94A17

1 Introduction

Let C be a convex subset of the linear space X and f a convex function on C. If p = ( p 1 , … , p n ) is a probability sequence and x = ( x 1 , … , x n ) ∈ C n , then the well-known Jensen’s inequality

(1) f ∑ i = 1 n p i x i ≤ ∑ i = 1 n p i f ( x i )

holds [1]. If f is concave, then the preceding inequality is reversed.

Jensen’s inequality probably plays a crucial role in the theory of mathematical inequalities. It is applied widely in mathematics, statistics, and information theory and can deduce many important inequalities such as arithmetic-geometric mean inequality, Hölder inequality, Minkowski inequality, and Ky Fan’s inequality.

In 2010, Dragomir obtained a refinement of Jensen’s inequality as follows [2]:

Theorem 1.1

If f , x , p are defined as above, then

(2) f ∑ i = 1 n p i x i ≤ min k ∈ { 1 , 2 , … , n } ( 1 − p k ) f ∑ i = 1 n p i x i − p k x k 1 − p k + p k f ( x k ) ≤ 1 n ∑ k = 1 n ( 1 − p k ) f ∑ i = 1 n p i x i − p k x k 1 − p k + ∑ k = 1 n p k f ( x k ) ≤ max k ∈ { 1 , 2 , … , n } ( 1 − p k ) f ∑ i = 1 n p i x i − p k x k 1 − p k + p k f ( x k ) ≤ ∑ i = 1 n p i f ( x i ) .

The same year, Dragomir has also obtained a different refinement of Jensen’s inequality as follows [3]:

Theorem 1.2

(S. S. Dragomir) Let C be a convex subset in the real linear space X and assume that f : C → ℝ is a convex function on C. If x k ∈ C and p k > 0 , k ∈ { 1 , 2 , … , n } with ∑ k = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , … , n } , we have

(3) ∑ k = 1 n p k f ( x k ) ≥ D ( f , p , x ; J ) ≥ f ∑ k = 1 n p k x k ,

where D ( f , p , x ; J ) is a functional defined as follows:

D ( f , p , x ; J ) ≔ P J f 1 P J ∑ i ∈ J p i x i + P ¯ J f 1 P ¯ J ∑ j ∈ J ¯ p j x j

with J a nonempty subset of { 1 , 2 , … , n } , J ¯ ≔ { 1 , 2 , … , n } \ J , P J ≔ ∑ i ∈ J p i and P ¯ J ≔ P J ¯ = ∑ j ∈ J ¯ p j = 1 − ∑ i ∈ J p i , where J ≠ { 1 , 2 , … , n } .

It is easy to find that if J = { k } , then inequalities (3) imply inequalities (2).

In 2016, Popescu et al. defined a new refined functional as follows [4]:

D ( f , p , x ; J , J 1 , J 2 , … , J m ) ≔ ∑ i = 1 m P J i f 1 P J i ∑ j ∈ J i p j x j + P ¯ J f 1 P ¯ J ∑ j ∈ J ¯ p j x j ,

where J 1 , J 2 , … , J m are nonempty, pairwise disjoint subsets of J, with J = ⋃ i J i and P J i ≔ ∑ j ∈ J i p j . It is easy to observe that

∑ i = 1 m P J i + P ¯ J = 1 ,

and in order to make sense, m should be less or equal with the cardinal of J, that is, 1 ≤ m ≤ J . If m = 1 , then

D ( f , p , x ; J , J 1 ) = D ( f , p , x ; J ) .

Then Theorem 1.2 can be generalized as follows:

Theorem 1.3

(P. G. Popescu et al.) Let C be a convex subset in the real linear space X and assume that f : C → ℝ is a convex function on C. If x k ∈ C and p k > 0 , k ∈ { 1 , 2 , … , n } with ∑ k = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , … , n } , we have

(4) ∑ k = 1 n p k f ( x k ) ≥ D ( f , p , x ; J , J 1 , J 2 , … , J m ) ≥ D ( f , p , x ; J ) ≥ f ∑ k = 1 n p k x k ,

where J 1 , J 2 , … , J m are nonempty, pairwise disjoint subsets of J, with J = ⋃ i J i and m ≤ | J | .

In [5], Horváth developed a general method to refine the discrete Jensen’s inequality in the convex and mid-convex cases. The main part of the inequalities in Theorems 1.2 and 1.3 are special cases of Theorem 1 in the paper. Recently, Horváth et al. [6] presented new upper bounds for the Shannon entropy (see Corollary 1) and defined an extended f-divergence functional (see Definition 2) by applying a cyclic refinement of Jensen’s inequality. For more other refinements and applications related to Jensen’s inequality, see [7–17].

The main aim of this paper is to extend the results of Dragomir [3] and Popescu et al. [4] by the aforementioned functional. In Section 2, we give refinement of Jensen’s inequality associated with the general functionals. The refinement demonstrates some estimates of Jensen’s gap and tightens the inequalities (4). In Section 3, we show the applications in information theory. We propose and prove new tighter upper bounds for Shannon’s entropy compared to the bound given in [4]. At last, we obtain new bounds for some f-divergences better than the bounds given in [3].

2 General inequalities by generalization

We continue to use the aforementioned definition and show the main results.

Theorem 2.1

Let C be a convex subset in the real linear space X and assume that f : C → ℝ is a convex function on C. If x k ∈ C and p k > 0 , k ∈ { 1 , 2 , … , n } with ∑ i = 1 n p k = 1 , then for any nonempty subset J of { 1 , 2 , … , n } , we have

(5) ∑ k =1 n p k f ( x k ) ≥ max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m + 1 ) ≥ max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m ) ≥ f ∑ k =1 n p k x k .

Proof

We assume the value of max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m ) is obtained for J i = J i ( m ) , 1 ≤ i ≤ m .

If m + 1 = n and each subset J i ( m ) ( 1 ≤ i ≤ m ) and J ¯ ( m ) contain one element, then we can easily obtain that the inequalities (5) hold as follows:

∑ k = 1 n p k f ( x k ) = max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m + 1 ) = max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m ) ≥ f ∑ k = 1 n p k x k .

Otherwise, there exists a subset J i ( m ) ( 1 ≤ i ≤ m ) or J ¯ ( m ) , which contains more than one element. Without loss of generality we assume J m ( m ) contains more than one element. Then we find two nonempty subsets J m ( m + 1 ) , J m + 1 ( m + 1 ) such that J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) = J m ( m ) and J m ( m + 1) ∩ J m + 1 ( m + 1) = ∅ . By using Jensen’s inequality, we have

P J m ( m + 1 ) P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) ∑ j ∈ J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) ∑ j ∈ J m + 1 ( m + 1 ) p j x j ≥ f P J m ( m + 1 ) P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) ⋅ 1 P J m ( m + 1 ) ∑ j ∈ J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) ⋅ 1 P J m + 1 ( m + 1 ) ∑ j ∈ J m + 1 ( m + 1 ) p j x j = f 1 P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) ∑ j ∈ J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) p j x j .

The aforementioned inequality can be rewritten as:

P J m ( m + 1 ) f 1 P J m ( m + 1 ) ∑ j ∈ J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) ∑ j ∈ J m + 1 ( m + 1 ) p j x j ≥ P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) ∑ j ∈ J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) p j x j .

So let J i ( m + 1 ) = J i ( m ) , 1 ≤ i ≤ m − 1 , we can deduce that

max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m + 1 ) ≥ D ( f , p , x ; J ( m + 1 ) , J 1 ( m + 1 ) , J 2 ( m + 1 ) , … , J m − 1 ( m + 1 ) , J m ( m + 1 ) , J m + 1 ( m + 1 ) ) = ∑ i = 1 m − 1 P J i ( m + 1 ) f 1 P J i ( m + 1 ) ∑ j ∈ J i ( m + 1 ) p j x j + P J m ( m + 1 ) f 1 P J m ( m + 1 ) ∑ j ∈ J m ( m + 1 ) p j x j + P J m + 1 ( m + 1 ) f 1 P J m + 1 ( m + 1 ) ∑ j ∈ J m + 1 ( m + 1 ) p j x j + P ¯ J ( m + 1 ) f 1 P ¯ J ( m + 1 ) ∑ j ∈ J ¯ ( m + 1 ) p j x j ≥ ∑ i = 1 m − 1 P J i ( m + 1 ) f 1 P J i ( m + 1 ) ∑ j ∈ J i ( m + 1 ) p j x j + P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) f 1 P J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) ∑ j ∈ J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) p j x j + P ¯ J ( m + 1 ) f 1 P ¯ J ( m + 1 ) ∑ j ∈ J ¯ ( m + 1 ) p j x j = ∑ i = 1 m − 1 P J i ( m ) f 1 P J i ( m ) ∑ j ∈ J i ( m ) p j x j + P J m ( m ) f 1 P J m ( m ) ∑ j ∈ J m ( m ) p j x j + P ¯ J ( m ) f 1 P ¯ J ( m ) ∑ j ∈ J ¯ ( m ) p j x j = D ( f , p , x ; J ( m ) , J 1 ( m ) , J 2 ( m ) , … , J m − 1 ( m ) , J m ( m ) ) = max ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m ) .

So the middle inequality in (5) holds.

The first inequality and the last inequality in (5) can be seen from Theorem 1.3.□

Theorem 2.2

Let C be a convex subset in the real linear space X and assume that f : C → ℝ is a convex function on C. If x k ∈ C and p k > 0 , k ∈ { 1 , 2 , … , n } with ∑ i = k n p k = 1 , then for any nonempty subset J of { 1 , 2 , … , n } , we have

(6) ∑ k = 1 n p k f ( x k ) ≥ min ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m + 1 ) ≥ min ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m ) ≥ f ∑ k = 1 n p k x k .

Proof

We assume the value of min ∅ ≠ J ⊂ { 1 , … , n } D ( f , p , x ; J , J 1 , J 2 , … , J m + 1 ) is obtained for J i = J i ( m + 1 ) , 1 ≤ i ≤ m + 1 . Then we let two nonempty subsets J m ( m + 1 ) , J m + 1 ( m + 1 ) such that J m ( m + 1 ) ∪ J m + 1 ( m + 1 ) = J m ( m ) . Using the similar method in Theorem 2.1, inequalities (6) can be obtained.□

Now we say that S 1 , S 2 , … , S m generate a partition of the set S ≠ ∅ if they are pairwise disjoint and nonempty sets with ∪ i = 1 m S i = S . Then the main results above are given as follows:

Theorem 2.3

Let C be a convex subset in the real linear space X and assume that f : C → ℝ is a convex function on C. Assume further that x k ∈ C and p k > 0 , k ∈ { 1 , 2 , … , n } with ∑ i = 1 n p k = 1 . If A m denotes all partitions of the set { 1 , 2 , … , n } with m elements ( m = 1 , 2 , … , n ) , then

(7) ∑ k = 1 n p k f ( x k ) ≥ max { J 1 , J 2 , … , J n − 1 } ∈ A n − 1 D ( f , p , x ; J 1 , J 2 , … , J n − 1 ) ≥ ⋯ ≥ max { J 1 , J 2 , … , J m } ∈ A m D ( f , p , x ; J 1 , J 2 , … , J m ) ≥ ⋯ ≥ max { J 1 , J 2 } ∈ A 2 D ( f , p , x ; J 1 , J 2 ) ≥ f ∑ k = 1 n p k x k ,

where

D ( f , p , x ; J 1 , J 2 , … , J m ) ≔ ∑ i = 1 m P J i f 1 P J i ∑ j ∈ J i p j x j , m = 1 , 2 , … , n .

Proof

Since the first inequality and the last inequality follow from Theorem 1.3, we can suppose that n ≥ 4 , and we need only to prove that

max { J 1 , J 2 , … , J m + 1 } ∈ A m + 1 D ( f , p , x ; J 1 , J 2 , … , J m + 1 ) ≥ max { J 1 , J 2 , … , J m } ∈ A m D ( f , p , x ; J 1 , J 2 , … , J m ) .

for every m = 2 , … , n − 2 . It is enough to show that for each fixed { J 1 , J 2 , … , J m } ∈ A m there exists { K 1 , K 2 , … , K m + 1 } ∈ A m + 1 such that

D ( f , p , x ; K 1 , K 2 , … , K m + 1 ) ≥ D ( f , p , x ; J 1 , J 2 , … , J m ) .

Since n ≥ 4 and m ∈ { 2 , … , n − 2 } , one of the sets J 1 , J 2 , … , J m contains at least two elements. We can suppose that

J m = K m ∪ K m + 1 ,

where K m and K m + 1 are disjoint and nonempty sets. Then { J 1 , J 2 , … , J m − 1 , K m , K m + 1 } ∈ A m + 1 and

P J m f 1 P J m ∑ j ∈ J m p j x j = P J m f P K m P J m 1 P K m ∑ j ∈ K m p j x j + P K m + 1 P J m 1 P K m + 1 ∑ j ∈ K m + 1 p j x j .

By P K m P J m + P K m + 1 P J m = 1 , Jensen’s inequality can be applied, and we obtained from the aforementioned equality that

P J m f 1 P J m ∑ j ∈ J m p j x j ≤ P K m f 1 P K m ∑ j ∈ K m p j x j + P K m + 1 f 1 P K m + 1 ∑ j ∈ K m + 1 p j x j ,

and this gives the result.

The proof is complete.□

Theorem 2.4

(8) ∑ k = 1 n p k f ( x k ) ≥ min { J 1 , J 2 , … , J n − 1 } ∈ A n − 1 D ( f , p , x ; J 1 , J 2 , … , J n − 1 ) ≥ ⋯ ≥ min { J 1 , J 2 , … , J m } ∈ A m D ( f , p , x ; J 1 , J 2 , … , J m ) ≥ ⋯ ≥ min { J 1 , J 2 } ∈ A 2 D ( f , p , x ; J 1 , J 2 ) ≥ f ∑ k = 1 n p k x k ,

where

D ( f , p , x ; J 1 , J 2 , … , J m ) ≔ ∑ i = 1 m P J i f 1 P J i ∑ j ∈ J i p j x j , m = 1 , 2 , … , n .

Proof

Analyzing the proof of Theorem 2.3, we can see the next: if { K 1 , K 2 , … , K m + 1 } ∈ A m + 1 is a refinement of { J 1 , J 2 , … , J m } ∈ A m (every element of { K 1 , K 2 , … , K m + 1 } is contained in an element of { J 1 , J 2 , … , J m } ), then

D ( f , p , x ; K 1 , K 2 , … , K m + 1 ) ≥ D ( f , p , x ; J 1 , J 2 , … , J m )

holds. Since each partition from A m + 1 is a refinement of a partition from A m , the result follows.

The proof is complete.□

3 Applications in information theory

3.1 New upper bounds for Shannon’s entropy

As the consistent work, bounds for Shannon’s entropy [18] can be found in [4,8,10,15]. For further discussion, we present the definition of Shannon’s entropy first. If the discrete probability distribution ℙ n is given by P ( X = i ) = p i , p i > 0 , i = 1 , 2 , … , n , s . t . ∑ i = 1 n p i = 1 , then Shannon’s entropy is defined as

H ( X ) ≔ ∑ i = 1 n p i log 1 p i .

In [4], Popescu et al. obtained a new upper bound for entropy as follows:

(9) H ( X ) ≤ min J , J 1 , J 2 , … , J m log ∏ i = 1 m | J i | P J i P J i | J ¯ | P ¯ J P ¯ J .

Furthermore, considering the aforementioned results the following tighter bounds for Shannon’s entropy are presented.

Theorem 3.1

Let H ( X ) be defined as above, under the assumptions of Theorem 2.1, the following inequalities hold:

(10) H ( X ) ≤ ⋯ ≤ min J , J 1 , J 2 , … , J m + 1 log ∏ i =1 m + 1 | J i | P J i P J i | J ¯ | P ¯ J P ¯ J ≤ min J , J 1 , J 2 , … , J m log ∏ i =1 m | J i | P J i P J i J ¯ P ¯ J P ¯ J ≤ ⋯ ≤ min J , J 1 log | J 1 | P J 1 P J 1 | J ¯ | P ¯ J P ¯ J ≤ log n .

Proof

Taking into consideration the inequalities of Theorem 2.1 applied for the convex function f ( x ) = − log x and x i = 1 / p i , 1 ≤ i ≤ n , then

− ∑ k = 1 n p k log 1 p k ≥ max J , J 1 , J 2 , … , J m + 1 − ∑ i = 1 m + 1 P J i log 1 P J i ∑ j ∈ J i p j ⋅ 1 p j − P ¯ J log 1 J ¯ ∑ j ∈ J ¯ p j ⋅ 1 p j ≥ max J , J 1 , J 2 , … , J m − ∑ i = 1 m P J i log 1 P J i ∑ j ∈ J i p j ⋅ 1 p j − P ¯ J log 1 P ¯ J ∑ j ∈ J ¯ p j ⋅ 1 p j ≥ − log ∑ k = 1 n p k ⋅ 1 p k .

Those inequalities are equivalent with

H ( X ) ≤ min J , J 1 , J 2 , … , J m + 1 ∑ i = 1 m + 1 log | J i | P J i P J i + log | J ¯ | P ¯ J P ¯ J ≤ min J , J 1 , J 2 , … , J m ∑ i = 1 m log | J i | P J i P J i + log | J ¯ | P ¯ J P ¯ J ≤ log n .

Let m have the value from 1 to n − 1 and the inequalities (10) are deduced.□

Theorem 3.2

Let H ( X ) be defined as above, under the assumptions of Theorem 2.3, the following inequalities hold:

(11) H ( X ) ≤ min { J 1 , J 2 , … , J n − 1 } ∈ A n - 1 log ∏ i = 1 n − 1 | J i | P J i P J i ≤ ⋯ ≤ min { J 1 , J 2 , … , J m } ∈ A m log ∏ i = 1 m | J i | P J i P J i ≤ ⋯ ≤ min { J 1 , J 2 } ∈ A 2 log | J 1 | P J 1 P J 1 | J 2 | P J 2 P J 2 ≤ log n .

Proof

Taking into consideration the inequalities of Theorem 2.3, we have the inequalities (11) by the similar method above.□

3.2 New lower bounds for f-divergence measures

Given a convex function f : [ 0 , ∞ ) → ℝ , the f-divergence functional

(12) I f ( p , q ) ≔ ∑ i = 1 n q i f p i q i ,

where p = ( p 1 , … , p n ) , q = ( q 1 , … , q n ) are positive sequences, was introduced by Csiszár in [19], as a generalized measure of information, a “distance function” on the set of probability distributions ℙ n . As in [19], we interpret undefined expressions by

f ( 0 ) = lim t → 0 + f ( t ) ; 0 f 0 0 = 0 ; 0 f a 0 = lim q → 0 + q f a q = a lim t → ∞ f ( t ) t , a > 0 .

The following results were essentially given by Csiszár and Körner [20]:

If f is convex, then I f ( p , q ) is jointly convex in p and q ;
For every p , q ∈ ℝ + n , we have

(13) I f ( p , q ) ≥ ∑ i = 1 n q i f ∑ i = 1 n p i ∑ i = 1 n q i .

If f is strictly convex, equality holds in (13) iff

p 1 q 1 = p 2 q 2 = ⋯ = p n q n .

If f is normalized, i.e., f ( 1 ) = 0 , then for every p , q ∈ ℝ + n with ∑ i = 1 n p i = ∑ i = 1 n q i , we have the inequality

(14) I f ( p , q ) ≥ 0 .

In particular, if p , q ∈ ℙ n , then (14) holds. This is the well-known nonnegative property of the f-divergence.

Dragomir gives the concept for functions defined on a cone in a linear space as follows [3]:

In the first place, we recall that the subset K in a linear space X is a cone if the following two conditions are satisfied:

for any x , y ∈ K we have x + y ∈ K ;
for any x ∈ K and any α ≥ 0 we have α x ∈ K .

For a given n-tuple of vectors z = ( z 1 , … , z n ) ∈ K n and a probability distribution q ∈ ℙ n with all values nonzero, we can define, for the convex function f : K → ℝ , the following f-divergence of z with the distribution q

(15) I f ( z , q ) ≔ ∑ i = 1 n q i f z i q i .

It is obvious that if X ∈ ℝ , K = [ 0 , ∞ ) and x = p ∈ ℙ n , then we obtain the usual concept of the f-divergence associated with a function f : [ 0 , ∞ ) → ℝ . Now, for a given n-tuple of vectors x = ( x 1 , … , x n ) ∈ K n , a probability distribution q ∈ ℙ n with all values nonzero and for any nonempty pairwise disjoint subsets J 1 , J 2 , … , J m , J ¯ of { 1 , … , n } we have

q J ( m ) ≔ ( Q J 1 , Q J 2 , … , Q J m , Q ¯ J ) ∈ ℙ m + 1

and

x J ( m ) ≔ ( X J 1 , X J 2 , … , X J m , X ¯ J ) ∈ ℙ m + 1 ,

where Q I = ∑ i ∈ I q i , Q ¯ J = Q J ¯ , and X I = ∑ i ∈ I x i , X ¯ J = X J ¯ .

Let

(16) I f ( x J ( m ) , q J ( m ) ) ≔ ∑ i = 1 m Q J i f X J i Q J i + Q ¯ J f X ¯ J Q ¯ J .

The following inequalities for the f-divergence of an n-tuple of vectors in a linear space holds, which are better than the inequalities given in [3].

Theorem 3.3

Let f : K → ℝ be a convex function on the cone K. Then for any n-tuple of vectors x = ( x 1 , … , x n ) ∈ K n , a probability distribution q ∈ ℙ n with all values nonzero and for any nonempty pairwise disjoint subsets J 1 , J 2 , … , J m , J ¯ of { 1 , … , n } we have

(17) I f ( x , q ) ≥ ⋯ ≥ max J , J 1 , J 2 , … , J m + 1 I f x J ( m + 1 ) , q J ( m + 1 ) ≥ max J , J 1 , J 2 , … , J m I f x J ( m ) , q J ( m ) ≥ ⋯ ≥ max J , J 1 I f x J ( 1 ) , q J ( 1 ) ≥ f ( X n ) ,

where X n ≔ ∑ i = 1 n x i .

Proof

The aforementioned inequalities are obtained directly from Theorem 2.1 by letting p i → q i and x i → x i q i .□

Theorem 3.4

Let f : K → ℝ be a convex function on the cone K. Then for any n-tuple of vectors x = ( x 1 , … , x n ) ∈ K n , a probability distribution q ∈ ℙ n with all values nonzero and for any nonempty any nonempty pairwise disjoint subsets J 1 , J 2 , … , J m , J ¯ of { 1 , … , n } we have

(18) I f ( x , q ) ≥ ⋯ ≥ min J , J 1 , J 2 , … , J m + 1 I f x J ( m + 1 ) , q J ( m + 1 ) ≥ min J , J 1 , J 2 , … , J m I f x J ( m ) , q J ( m ) ≥ ⋯ ≥ min J , J 1 I f x J ( 1 ) , q J ( 1 ) ≥ f ( X n ) ,

where X n ≔ ∑ i = 1 n x i .

Proof

The aforementioned inequalities are obtained directly from Theorem 2.2 by letting p i → q i and x i → x i q i .□

In the scalar case and if x = p ∈ ℙ n , a sufficient condition for the positivity of the f-divergence I f ( p , q ) is that f ( 1 ) ≥ 0 . The case of functions of a real variable that is meaningful for applications is involved in the following:

Corollary 3.1

Let I f ( x , q ) be defined as above, under the assumptions of Theorem 3.2, the following inequalities hold:

(19) I f ( p , q ) ≥ ⋯ ≥ max J , J 1 , J 2 , … , J m + 1 ∑ i = 1 m + 1 Q J i f P J i Q J i + Q ¯ J f P ¯ J Q ¯ J ≥ max J , J 1 , J 2 , … , J m ∑ i = 1 m Q J i f P J i Q J i + Q ¯ J f P ¯ J Q ¯ J ≥ ⋯ ≥ max J , J 1 Q J 1 f P J 1 Q J 1 + Q ¯ J f P ¯ J Q ¯ J ≥ f ( 1 ) = 0 .

In what follows, we provide some lower bounds for a number of f-divergences that are used in various fields of information theory, probability theory and statistics.

The total variation distance is defined by the convex function f ( t ) = | t − 1 | , t ∈ ℝ and given by

(20) V ( p , q ) ≔ ∑ i =1 n q i p i q i − 1 = ∑ i =1 n p i − q i .

Proposition 3.1

For any p , q ∈ ℙ n , we have the inequality

(21) V ( p , q ) ≥ ⋯ ≥ max J , J 1 , J 2 , … , J m + 1 ∑ i =1 m + 1 P J i − Q J i + P ¯ J − Q ¯ J ≥ max J , J 1 , J 2 , … , J m ∑ i =1 m P J i − Q J i + P ¯ J − Q ¯ J ≥ ⋯ ≥ 2 max J , J 1 P J 1 − Q J 1 ( ≥ 0) .

Proof

The proof follows by the inequalities (19) for the convex function f ( t ) = | t − 1 | , t ∈ ℝ .□

The K. Pearson χ 2 -divergence [21] is obtained for the convex function f ( t ) = ( 1 − t ) 2 , t ∈ ℝ and given by

(22) χ 2 ( p , q ) ≔ ∑ i = 1 n q i p i q i − 1 2 = ∑ i = 1 n ( p i − q i ) 2 q i .

Proposition 3.2

For any p , q ∈ ℙ n , we have the inequality

(23) χ 2 ( p , q ) ≥ ⋯ ≥ max J , J 1 , J 2 , … , J m + 1 ∑ i = 1 m + 1 ( P J i − Q J i ) 2 Q J i + ( P ¯ J − Q ¯ J ) 2 Q ¯ J ≥ max J , J 1 , J 2 , … , J m ∑ i = 1 m + 1 ( P J i − Q J i ) 2 Q J i + ( P ¯ J − Q ¯ J ) 2 Q ¯ J ≥ ⋯ ≥ max J , J 1 ( P J 1 − Q J 1 ) 2 Q J 1 ( 1 − Q J 1 ) ≥ 4 max J , J 1 ( P J 1 − Q J 1 ) 2 ( ≥ 0 ) .

Proof

Using the inequalities (19) for the convex function f ( t ) = ( 1 − t ) 2 , t ∈ ℝ , we get the inequalities

χ 2 ( p , q ) ≥ ⋯ ≥ max J , J 1 , J 2 , … , J m + 1 ∑ i = 1 m + 1 ( P J i − Q J i ) 2 Q J i + ( P ¯ J − Q ¯ J ) 2 Q ¯ J ≥ max J , J 1 , J 2 , … , J m ∑ i = 1 m + 1 ( P J i − Q J i ) 2 Q J i + ( P ¯ J − Q ¯ J ) 2 Q ¯ J ≥ ⋯ ≥ max J , J 1 ( P J 1 − Q J 1 ) 2 Q J 1 + ( P ¯ J − Q ¯ J ) 2 Q ¯ J = max J , J 1 ( P J 1 − Q J 1 ) 2 Q J 1 ( 1 − Q J 1 ) .

Since

Q J 1 ( 1 − Q J 1 ) ≤ 1 4 Q J 1 + ( 1 − Q J 1 ) 2 = 1 4 ,

then

( P J 1 − Q J 1 ) 2 Q J 1 ( 1 − Q J 1 ) ≥ 4 ( P J 1 − Q J 1 ) 2 ,

which proves the last part of inequalities (23).□

The Kullback-Leibler divergence [22] can be obtained for the convex function f ( t ) = t ln t , t > 0 and given by

(24) K L ( p , q ) ≔ ∑ i = 1 n q i p i q i ln p i q i = ∑ i = 1 n p i ln p i q i .

Proposition 3.3

For any p , q ∈ ℙ n , we have the inequality

(25) K L ( p , q ) ≥ ⋯ ≥ ln max J , J 1 , J 2 , … , J m + 1 ∏ i = 1 m + 1 P J i Q J i P J i ⋅ P ¯ J Q ¯ J P ¯ J ≥ ln max J , J 1 , J 2 , … , J m ∏ i = 1 m P J i Q J i P J i ⋅ P ¯ J Q ¯ J P ¯ J ≥ ⋯ ≥ ln max J , J 1 P J 1 Q J 1 P J 1 ⋅ P ¯ J Q ¯ J P ¯ J ≥ 0 .

Proof

Using the inequalities (19) for the convex function f ( t ) = t ln t , t > 0 , we get the inequalities

K L ( p , q ) ≥ ⋯ ≥ ln max J , J 1 , J 2 , … , J m + 1 ∏ i = 1 m + 1 P J i Q J i P J i ⋅ P ¯ J Q ¯ J P ¯ J ≥ ln max J , J 1 , J 1 , … , J m ∏ i = 1 m P J i Q J i P J i ⋅ P ¯ J Q ¯ J P ¯ J ≥ ⋯ ≥ ln max J , J 1 P J 1 Q J 1 P J 1 ⋅ P ¯ J Q ¯ J P ¯ J = ln max J , J 1 P J 1 Q J 1 P J 1 ⋅ 1 − P J 1 1 − Q J 1 1 − P J 1 .

Utilizing the geometric-harmonic mean inequality

x w y 1 − w ≥ 1 w x + 1 − w y , x , y > 0 , 0 ≤ w ≤ 1 ,

we have for x = P J 1 Q J 1 , y = 1 − P J 1 1 − Q J 1 , and w = P J 1 that

P J 1 Q J 1 P J 1 ⋅ 1 − P J 1 1 − Q J 1 1 − P J 1 ≥ 1 ,

which proves the last part of inequalities (25).□

The Jeffrey’s divergence [23] that has great importance in information theory can be obtained for the convex function f ( t ) = ( t − 1 ) ln t , t > 0 and given by

(26) J ( p , q ) ≔ ∑ i = 1 n q i p i q i − 1 ln p i q i = ∑ i = 1 n ( p i − q i ) ln p i q i .

Proposition 3.4

For any p , q ∈ ℙ n , we have the inequality

(27) J ( p , q ) ≥ ⋯ ≥ ln max J , J 1 , J 2 , … , J m + 1 ∏ i = 1 m + 1 P J i Q J i P J i − Q J i ⋅ P ¯ J Q ¯ J P ¯ J − Q ¯ J ≥ ln max J , J 1 , J 2 , … , J m ∏ i = 1 m + 1 P J i Q J i P J i − Q J i ⋅ P ¯ J Q ¯ J P ¯ J − Q ¯ J ≥ ⋯ ≥ ln max J , J 1 ( 1 − P J 1 ) Q J 1 ( 1 − Q J 1 ) P J 1 Q J 1 − P J 1 ≥ max J , J 1 2 ( Q J 1 − P J 1 ) 2 P J 1 + Q J 1 − 2 P J 1 Q J 1 ≥ 0 .

Proof

Applying the inequalities (19) for the convex function f ( t ) = ( t − 1 ) ln t , t > 0 , we get the inequalities

J ( p , q ) ≥ ⋯ ≥ ln max J , J 1 , J 2 , … , J m + 1 ∏ i = 1 m + 1 P J i Q J i P J i − Q J i ⋅ P ¯ J Q ¯ J P ¯ J − Q ¯ J ≥ ln max J , J 1 , J 2 , … , J m ∏ i = 1 m + 1 P J i Q J i P J i − Q J i ⋅ P ¯ J Q ¯ J P ¯ J − Q ¯ J ≥ ⋯ ≥ ln max J , J 1 ( 1 − P J 1 ) Q J 1 ( 1 − Q J 1 ) P J 1 Q J 1 − P J 1 .

Utilizing the elementary inequality for positive numbers

ln b − ln a b − a ≥ 2 a + b , a , b > 0 ,

we have

( Q J 1 − P J 1 ) 2 Q J 1 ( 1 − Q J 1 ) ⋅ ln 1 − P J 1 1 − Q J 1 − ln P J 1 Q J 1 1 − P J 1 1 − Q J 1 − P J 1 Q J 1 ≥ ( Q J 1 − P J 1 ) 2 Q J 1 ( 1 − Q J 1 ) ⋅ 2 1 − P J 1 1 − Q J 1 + P J 1 Q J 1 .

This inequality derives

( Q J 1 − P J 1 ) ln 1 − P J 1 1 − Q J 1 − ln P J 1 Q J 1 ≥ 2 ( Q J 1 − P J 1 ) 2 P J 1 ( 1 − Q J 1 ) + Q J 1 ( 1 − P J 1 ) ≥ 0 .

Rewriting the aforementioned inequalities the last part of the inequalities (27) can be obtained.□

Moreover, all the aforementioned theorems, corollaries, and propositions can also be changed into comparable versions according to Theorems 2.3 and 2.4.

4 Conclusion

The classical Jensen’s inequality plays a very important role in both theory and applications. In this paper, we have obtained some refinements of Jensen’s inequality (5)–(8) in real linear space using the generalized Popescu et al. functional. Moreover, we have obtained the new and sharp bounds of Shannon’s entropy and several f-divergence measures in information theory. In the future work, we will continue to explore other applications on the inequalities newly obtained in Section 2.

Acknowledgement

The authors would like to thank the editor and referees for their very helpful suggestions and comments on the manuscript sincerely. This manuscript was supported by National Social Science Fund of China (17BTJ007), “the Fundamental Research Funds for the Central Universities,” Zhongnan University of Economics and Law (2722020JCT031), MOE (Ministry of Education in China) Youth Foundation Project of Humanities and Social Sciences (19YJCZH111), Natural Science Foundation of Hubei Province (2017CFB145), and Technology Innovation Special Soft Science Research Program of Hubei Province (2019ADC136).

References

[1] J. L. W. V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta Math. 30 (1906), no. 1, 175–193.10.1007/BF02418571Search in Google Scholar

[2] S. S. Dragomir, A refinement of Jensen’s inequality with applications for f-divergence measures, Taiwanese J. Math. 14 (2010), no. 1, 153–164.10.11650/twjm/1500405733Search in Google Scholar

[3] S. S. Dragomir, A new refinement of Jensen’s inequality in linear spaces with applications, Math. Comput. Model. 52 (2010), 1497–1505.10.1016/j.mcm.2010.05.035Search in Google Scholar

[4] P. G. Popescu, E. I. Slusanschi, V. Iancu, and F. Pop, A new upper bound for Shannon entropy. A novel approach in modeling of Big Data applications, Concurr. Comp.-Pract. Ex. 28 (2016), no. 2, 351–359.10.1002/cpe.3444Search in Google Scholar

[5] L. Horvǎth, A method to refine the discrete Jensen’s inequality for convex and mid-convex functions, Math. Comput. Model. 54 (2011), 2451–2459.10.1016/j.mcm.2011.05.060Search in Google Scholar

[6] L. Horváth, Ɖ. Pečarić and J. Pečarić, Estimations of f- and Rényi divergences by using a cyclic refinement of the Jensens inequality, Bull. Malays. Math. Sci. Soc. 42 (2019), 933–946.10.1007/s40840-017-0526-4Search in Google Scholar

[7] S. Simic, Best possible global bounds for Jensen’s inequality, Appl. Math. Comput. 215 (2009), no. 6, 2224–2228.10.1016/j.amc.2009.08.062Search in Google Scholar

[8] S. Simic, Jensen’s inequality and new entropy bounds, Appl. Math. Lett. 22 (2009), no. 8, 1262–1265.10.1016/j.aml.2009.01.040Search in Google Scholar

[9] L. Horvǎth and J. Pečarić, A refinement of the discrete Jensen’s inequality, Math. Inequal. Appl. 14 (2011), no. 4, 777–791.10.7153/mia-14-64Search in Google Scholar

[10] N. Ţǎpuş and P. G. Popescu, A new entropy upper bound, Appl. Math. Lett. 25 (2012), no. 11, 1887–1890.10.1016/j.aml.2012.02.056Search in Google Scholar

[11] L. Horvǎth, Weighted form of a recent refinement of the discrete Jensens inequality, Math. Inequal. Appl. 17 (2014), no. 3, 947–961.10.7153/mia-17-69Search in Google Scholar

[12] S. G. Walker, On a lower bound for the Jensen inequality, SIAM J. Math. Anal. 46 (2014), no. 5, 3151–3157.10.1137/140954015Search in Google Scholar

[13] S. S. Dragomir, M. A. Khan and A. Abathun, Refinement of the Jensen integral inequality, Open Math. 14 (2016), no. 1, 221–228.10.1515/math-2016-0020Search in Google Scholar

[14] M. Sababheh, Improved Jensen’s inequality, Math. Inequal. Appl. 20 (2017), no. 2, 389–403.10.7153/mia-20-27Search in Google Scholar

[15] G. Lu, New refinements of Jensens inequality and entropy upper bounds, J. Math. Inequal. 12 (2018), no. 2, 403–421.10.7153/jmi-2018-12-30Search in Google Scholar

[16] M. Adil Khan, M. Hanif, Z. A. Khan, K. Ahmad, and Y. M. Chu, Association of Jensen inequality for s-convex function, J. Inequal. Appl. 2019 (2019), art. 162, https://doi.org/10.1186/s13660-019-2112-9.10.1186/s13660-019-2112-9Search in Google Scholar

[17] M. Adil Khan, Z. Husain and Y. M. Chu, New estimates for Csiszár divergence and Zipf-Mandelbrot entropy via Jensen-Mercers inequality, Complexity 2020 (2020), art. 8928691, https://doi.org/10.1155/2020/8928691.10.1155/2020/8928691Search in Google Scholar

[18] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd edn., John Wiley and Sons, Inc., New York, 2006.Search in Google Scholar

[19] I. Csiszár, Information-type measures of differences of probability distributions and indirect observations, Studia Sci. Math. Hung. 2 (1967), 299–318.Search in Google Scholar

[20] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, 1981.Search in Google Scholar

[21] K. Pearson, On the criterion that a give system of deviations from the probable in the case of correlated system of variables in such that it can be reasonable supposed to have arisen from random sampling, Phil. Mag. 50 (1900), no. 302, 157–172.10.1080/14786440009463897Search in Google Scholar

[22] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951), no. 1, 79–86.10.1214/aoms/1177729694Search in Google Scholar

[23] H. Jeffreys, An invariant form for the prior probability in estimation problems, Proc. Roy. Soc. Lon. Ser. A 186 (1946), no. 1007, 453–461.10.1098/rspa.1946.0056Search in Google Scholar PubMed

Received: 2020-01-26

Revised: 2020-11-24

Accepted: 2020-11-25

Published Online: 2020-12-31

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/math-2020-0123

Keywords for this article

refinements; Jensen’s inequality; information theory; Shannon’s entropy; f-divergences; bounds

Creative Commons

BY 4.0