Upper bound for variance of finite mixtures of power exponential distributions

Hok Shing Kwong; Saralees Nadarajah

doi:10.1515/ms-2022-0055

Article Open Access

Upper bound for variance of finite mixtures of power exponential distributions

Hok Shing Kwong and Saralees Nadarajah

Published/Copyright: June 11, 2022

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal Mathematica Slovaca Volume 72 Issue 3

Abstract

Both variance and entropy are commonly used measures for uncertainty. There exists many cases where variance is infinite and entropy is finite. In this note, we derive an upper bound illustrating the relationship between variance and entropy of random variables having a special class of distributions. We also derive an upper bound for kth absolute central moment proportional to entropy power for a special class of distributions.

MSC 2010: Primary 62E99

Keywords: Differential entropy; finite mixture distribution; generalized normal distribution; sharp bounds; variance

1 Introduction

Both variance and differential entropy are commonly used metrics to measure uncertainty of random variables. While variance measures how a random variable is spread out from its expected value, differential entropy takes a different approach. Differential entropy measures uncertainty from an information approach, it measures the average amount of information of a random variable. Although variance and entropy are generally not the same, there is one case when variance and entropy are equivalent measures. When normal distribution is considered, the maximum entropy given variance is attained, this suggests that variance and entropy are related to some extent. For comprehensive reviews on entropy, we refer to [2] and [7]. See also [4].

When the underlying probability distribution is multimodal, variance becomes inefficient in representing the uncertainty of random variables. For example, consider a finite mixture of two equally weighted univariate normal distributions with unit variance and different means μ₁ and μ₂. The variance of the overall distribution approaches ∞ as ∣μ₁ − μ₂∣ → ∞. On the other hand, differential entropy remains stable in this situation, approaching log(2πe)2+log(2) as ∣μ₁ − μ₂∣ → ∞. Therefore, differential entropy appears to be a more appropriate measure of uncertainty of multimodal random variables.

Let X denote a random variable with probability density function f. It is well known that given the value of differential entropy,

h ( f ) = − ∫ − ∞ ∞ f ( x ) log f ( x ) d x ,

the lower bound of variance of X is achieved by normal distribution. That is, there exists a sharp lower bound on variance proportional to entropy power e^2h(f), as shown in (1):

(1) 1 2 π e e 2 h ( f ) ≤ Var ( X ) .

This result can be extended (see, for example, [6]) for lower bounds of the kth absolute central moment such that the lower bound of the kth absolute central moment is proportional to e^{kh (f)} as shown in (2):

(2) 1 k Γ ( 1 / k ) k e e k h ( f ) ≤ E | X − μ | k .

In this case, the lower bound is attained by power exponential distribution (or generalised normal distribution) with shape parameter β=k. The probability density function of power exponential distribution is

f ( x ) = β 2 α Γ ( 1 / β ) e − ( | x − μ | / α ) β ,

where β>0 is the shape parameter, α is the scale parameter, and −∞ < μ < ∞ is the location parameter.

In general, a similar form of upper bound is not known. There are many cases where the variance is not bounded given finite differential entropy. For instance, non-logconcave distributions, for example, Pareto distributions with shape parameter ⩽2, or finite mixture distributions, do not generally have an upper bound on variance given entropy. [1] considered a finite mixture of power exponential distributions with equal means; they showed that the variance has an upper bound proportional to entropy power, if the maximum ratio of variances between component distributions is finite.

Although multimodal distributions are generally not upper bounded by entropy power, we believe it would be useful to derive an upper bound under some specific conditions, giving further understanding between differential entropy and variance. In this note, we consider a special class of multimodal distributions, a multimodal finite mixture of power exponential distributions. Entropy of finite mixture distributions cannot be expressed in closed-form generally, and the evaluation relies on various approximation methods. For example, [3] proposed a method based on a Taylor-series expansion method to approximate entropy of Gaussian mixtures. We show that the variance has a upper bound proportional to e^2h(f), if ratios of variances, and ratios of squared means between all component distributions are finite.

The main result on the upper bound of variance is given in Section 2. A numerical study is presented in Section 3. An extension for kth absolute central moments is given in Section 4. Conclusions are given in Section 5.

2 Upper bound on variance for multimodal distributions

Consider f(x) specified as follows

(3) f ( x ) = ∑ i = 1 n ω i f i ( x ) = ∑ i = 1 n ω i β i 2 α i Γ 1 / β i e − x − μ i / α i β i ,

where β_i > 0 are the shape parameters, α_i are the scale parameters, −∞ < μ_i < ∞ are the location parameters, n is the number of components and 0 < ω_i < 1 are weights with ω₁ + ⋅+ ω_n=1.

Let

r σ 2 = max i , j σ i 2 σ j 2 , r μ 2 = max i , j μ i 2 μ j 2 , M ( r ) = ( r − 1 ) r 1 r − 1 e log r .

Theorem 2.1

(Mitrinovic & Vasic, 1970, p. 79). If a_iare strictly positive quantities, there exists an inequality between weighted geometric mean and weighted arithmetic mean as follows:

∏ i = 1 n a i ω i ≥ 1 M ( r ) ∑ i = 1 n ω i a i ,

where

r = max i , j a i a j .

The value of M(r) is monotonically increasing with r, i.e., the difference between weighted geometric mean and weighted arithmetic mean is increasing.

Proposition 2.1.1

Let X have the probability density function (3) with n=1, α₁=α, β₁=βandμ₁=μ. Thekthabsolute central moment ofXis

E | X − μ | k = ∫ − ∞ ∞ | x − μ | k β 2 α Γ ( 1 / β ) e − ( | x − μ | / α ) β d x = α k + 1 β β α Γ ( 1 / β ) ∫ 0 ∞ y ( k + 1 ) / β − 1 e − y d y = α k Γ k + 1 β Γ ( 1 / β ) .

Proposition 2.1.2

Let X have the probability density function (3) with n=1, α₁=α, β₁=βandμ₁=μ. The differential entropy ofXis

h ( f ) = − ∫ ∞ ∞ β 2 α Γ ( 1 / β ) e − ( | x − μ | / α ) β log β 2 α Γ ( 1 / β ) e − ( | x − μ | / α ) β d x = − log β 2 α Γ ( 1 / β ) − 1 Γ ( 1 / β ) ∫ 0 ∞ y 1 / β e − y d y = log 2 α Γ ( 1 / β ) β + 1 β .

Proposition 2.1.3

From Propositions 2.1.1 and 2.1.2, we have the following relationship between differential entropy and thekth absolute central moment ofX:

e k h ( f ) = 2 k α k Γ ( 1 / β ) k β k e k / β = E | X − μ | k A ( k , β ) ,

where

A ( k , β ) = 2 k e k / β Γ ( 1 / β ) k + 1 β k Γ k + 1 β .

Lemma 2.1.1

Let X have the probability density function (3). Due to concavity of h(f), the entropy of finite mixture distributions has the following lower bound:

h ( f ) ≥ ∑ i = 1 n ω i h f i .

proof. The result follows because

h ( f ) − ∑ i = 1 n ω i h f i = − ∫ − ∞ ∞ f ( x ) log f ( x ) d x + ∑ i = 1 n ω i ∫ − ∞ ∞ f i ( x ) log f i ( x ) d x = − ∑ i = 1 n ω i ∫ − ∞ ∞ f i ( x ) log f ( x ) f i ( x ) d x = ∑ i = 1 n ω i D K L f i ∥ f ≥ 0 ,

where

D K L f i ∥ f = ∫ − ∞ ∞ f i ( x ) log f ( x ) f i ( x ) d x .

Theorem 2.2

Let X have the probability density function (3). An upper bound of variance of X can be written proportional to entropy power as follows:

Var ( X ) ≤ [ M r σ 2 + M r μ 2 ∏ i = 1 n r i 2 ω i ] [ ∏ i = 1 n 1 A 2 , β i ω i ] e 2 h ( f ) ,

where μ i 2 = σ i 2 r i 2 .

Proof. From Theorem 2.1, Proposition 2.1.3 and Lemma 2.1.1, we can write

e 2 h ( f ) ≥ e 2 ∑ i = 1 n ω i h f i = ∏ i = 1 n e 2 h f i ω i = ∏ i = 1 n A 2 , β i ω i σ i 2 ω i ≥ 1 M r σ 2 ∏ i = 1 n A 2 , β i ω i ∑ i = 1 n ω i σ i 2 ,

implying that

(4) ∑ i = 1 n ω i σ i 2 ≤ M r σ 2 ∏ i = 1 n 1 A 2 , β i ω i e 2 h ( f ) .

Also

e 2 h ( f ) ≥ ∏ i = 1 n A 2 , β i ω i σ i 2 ω i

can be rewritten as follows

e 2 h ( f ) ≥ ∏ i = 1 n A 2 , β i ω i μ i 2 r i 2 ω i ≥ ∏ i = 1 n A 2 , β i ω i 1 r i 2 ω i 1 M r μ 2 ∑ i = 1 n ω i μ i 2 ,

implying that

∑ i = 1 n ω i μ i 2 ≤ M r μ 2 ∏ i = 1 n 1 A 2 , β i ω i r i 2 ω i e 2 h ( f ) .

Therefore,

(5) Var ( X ) = E X 2 − [ E ( X ) ] 2 = ∑ i = 1 n ω i σ i 2 + ∑ i = 1 n ω i μ i 2 − ∑ i = 1 n ω i μ i 2 ≤ ∑ i = 1 n ω i σ i 2 + ∑ i = 1 n ω i μ i 2 ≤ M r σ 2 + M r μ 2 ∏ i = 1 n r i 2 ω i ∏ i = 1 n 1 A 2 , β i ω i e 2 h ( f ) .

3 Special cases

It is easy to see that ri2→0 when μ_i → 0, and (5) reduces to the original upper bounds shown in [1], i.e., (4). The variance of X becomes equal to (5) if all component distributions are identical with zero mean, i.e., σi2=σj2, β_i=β_j for all i, j=1, …, n and μi2=0 for i=1, …, n. In this case, the variance equals to 1A(2,β)e2h(f).

If only σi2=σj2 for all i, j=1, …, n then (5) reduces to

[ 1 + M r μ 2 ∏ i = 1 n r i 2 ω i ] [ ∏ i = 1 n 1 A 2 , β i ω i ] e 2 h ( f ) .

If only β_i=β_j for all i, j=1, ⋅, n then (5) reduces to

[ M r σ 2 + M r μ 2 ∏ i = 1 n r i 2 ω i ] [ 1 A 2 , β ] e 2 h ( f ) .

If only μ_i=μ_j for all i, j=1, …, n then (5) reduces to

[ M r σ 2 + ∏ i = 1 n r i 2 ω i ] [ ∏ i = 1 n 1 A 2 , β i ω i ] e 2 h ( f ) .

If only μ_i=0 for all i=1, …, n then (5) reduces to

M r σ 2 [ ∏ i = 1 n 1 A 2 , β i ω i ] e 2 h ( f ) .

4 Upper bound on the kth absolute central moments for unimodal distributions

The variance upper bound for finite mixture of generalized normal distributions can be easily extended for kth absolute central moment. We consider a finite mixture of generalized normal distributions in (3).

Theorem 4.1

Let X have the probability density function (3) with μ _i=μfor alli=1, …, n. An upper bound for thekth absolute moment ofXcan be written proportional to entropy power as follows:

E | X − μ | k ≤ M r ξ [ ∏ i = 1 n 1 A β i , k ω i ] e k h ( f ) .

Proof. As shown in Proposition 2.1.1, the kth absolute central moment given α, β is given by αkΓk+1βΓ1β. Let ξ_i(k) denote the kth absolute central moment of ith component distribution. Then

e k h ( f ) ≥ e k ∑ i = 1 n ω i h f i = ∏ i = 1 n e k h f i ω i = ∏ i = 1 n A β i , k ω i ξ i ( k ) ω i ≥ 1 M r ξ ∏ i = 1 n A β i , k ω i ∑ i = 1 n ω i ξ i ( k ) ,

implying that

(6) ∑ i = 1 n ω i ξ i ( k ) ≤ M r ξ ∏ i = 1 n 1 A β i , k ω i e k h ( f ) , where r ξ = max i , j ξ i ( k ) ξ j ( k ) .

Therefore, using (6), we have the following bound:

E | X − μ | k ≤ M r ξ ∏ i = 1 n 1 A β i , k ω i e k h ( f ) .

A lower bound can also be obtained in the case of Theorem 4.1. Using (2), we obtain

(7) 1 k Γ ( 1 / k ) k e ∑ i = 1 n w i e k h f i ≤ E | X − μ | k .

If k=2, (7) reduces to

1 2 π e ∑ i = 1 n w i e 2 h f i ≤ Var ⁡ ( X ) .

5 Conclusions

In this note, we have derived upper bounds on variance for a special class of multimodal distributions under some constraints. Closeness of the upper bound under different conditions have been presented. We have also derived upper bounds on the kth absolute central moment for a special class of unimodal distributions.

(Communicated by Gejza Wimmer)

Acknowledgement

The authors would like to thank the Editor and the referee for careful reading and comments which greatly improved the paper.

References

[1] Chung, H. W.—Sadler, B. M.—Hero, A. O.: Bounds on variance for unimodal distributions IEEE Transactions on Information Theory 63 (2017), 6936–6949.10.1109/TIT.2017.2749310Search in Google Scholar

[2] Cover, T. M.—Thomas, J. A.: Elements of Information Theory 2nd ed., Wiley, New York, 2006.Search in Google Scholar

[3] Huber, M. F.—Bailey, T.—Durrant-Whyte, H.—Hanebeck, U. D.: On entropy approximation for Gaussian mixture random vectors In: Proceedings of 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008, pp. 181–188.10.1109/MFI.2008.4648062Search in Google Scholar

[4] Kumar, P.—Hooda, D. S.: On generalized measures of entropy and dependence Math. Slovaca 58 (2008), 377–386.10.2478/s12175-008-0081-4Search in Google Scholar

[5] Mitrinovic, D. S.—Vasic, P. M.: Analytic Inequalities Vol. 1, Springer, New York, 1970.10.1007/978-3-642-99970-3Search in Google Scholar

[6] Zamir, R.—Feder, M.: On universal quantization by randomized uniform/lattice quantizers IEEE Transactions on Information Theory 38 (1992), 428–436.10.1109/18.119699Search in Google Scholar

[7] Zidek, J. V.—van Eeden, C.: Uncertainty, entropy, variance and the effect of partial information Lecture Notes–Monograph Series 42 (2003), 155–167.10.1214/lnms/1215091936Search in Google Scholar

Received: 2020-11-28

Accepted: 2021-06-15

Published Online: 2022-06-11

Published in Print: 2022-06-27

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/ms-2022-0055

Keywords for this article

Differential entropy; finite mixture distribution; generalized normal distribution; sharp bounds; variance

Creative Commons

BY 4.0