Some common and dynamic properties of logarithmic Pareto distribution with applications

Mohamed Kayid

doi:10.1515/phys-2021-0082

Article Open Access

Some common and dynamic properties of logarithmic Pareto distribution with applications

Mohamed Kayid

Published/Copyright: November 15, 2021

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Open Physics Volume 19 Issue 1

Abstract

The Pareto distribution satisfies the power law, which is widely used in physics, biology, earth and planetary sciences, economics, finance, computer science, and many other fields. In this article, the logarithmic Pareto distribution, a logarithmic transformation of the Pareto distribution, is presented and studied. The moments, percentiles, skewness, kurtosis, and some dynamic measures such as hazard rate, mean residual life, and quantile residual life are discussed. The parameters were estimated by quantile and maximum likelihood methods. A simulation study was conducted to investigate the efficiency, consistency, and behavior of the maximum likelihood estimator. Finally, the proposed distribution was fitted to some datasets to show its usefulness.

Keywords: Pareto distribution; logarithmic transformation; maximum likelihood estimation

1 Introduction

The Pareto distribution, a power law model, is useful in analyzing observations from physics, biology, earth and planetary sciences, economics, finance, computer science, social science, geophysics, actuarial science, quality control, and many other fields [1].

The Pareto model can be applied in situations where there is an equilibrium in the distribution of “small” to “large” values. There are many such cases, e.g., the size of files transmitted over the Internet network TCP/IP consisting of many small and few large files, the error rates of hard disks consisting of many small and few large error rates, the size of human settlements consisting of many small values for hamlets/villages and few large values for cities, and the oil reserves in oil fields consisting of many small and few large fields. The sizes of solar flares, lunar craters, earthquakes, and wars also follow the power law.

Among the many studies on this topic, Schroeder et al. [2] fitted plate fault data to this distribution, and Burroughs and Tebbens [3] analyzed earthquake and wildfire observations. From the long list of research in this area, we can also refer to Sornette [4], Olami et al. [5], Bak and Sneppen [6], and Carlson and Doyle [7].

In recent decades, many generalizations of the Pareto distribution have been proposed to make complicated real-world situations more understandable. Mead [8] generalized a variant of the Pareto model. Ghitany et al. [9] used the incomplete gamma function to propose an alternative to the Pareto distribution. Ihtisham et al. [10] provided the power transformation of the single-parameter Pareto distribution. Haj AHmad and Almetwally [11] used the generalization of Marshal and Olkin [12] to extend the Pareto distribution. Many other research papers have used Pareto as a basic model.

Vasileios et al. [13] introduced the logarithmic transformation to provide a transformed version of an extension of the Weibull distribution. Dey et al. [14] and Nassar et al. [15] applied this idea to extend a generalized exponential distribution and a generalized Weibull model, respectively. In this article, we apply the logarithmic transformation to the Pareto distribution.

The aim of this article is to present and study a logarithmic transformation of the Pareto distribution. The article is organized as follows. Section 2 introduces the transformed Pareto distribution and its basic properties are studied. Section 3 is devoted to the topic of parameter estimation. In Section 4, the behavior of the maximum likelihood estimator (MLE) has been studied. In Section 5, the model was fitted to some real data sets to illustrate its applicability. Finally, Section 6 briefly summarizes the results.

2 Alpha logarithmic Pareto distribution

Consider a distribution G with the reliability function G ¯ ( x ) = 1 − G ( x ) , x > 0 . Vasileios et al. [13] utilized the logarithmic transformation of G by the reliability function

(1) F ¯ ( x ) = ln ( 1 − α ¯ G ¯ ) ln ( α ) ,

where α ¯ = 1 − α , and studied a Weibull extension based on it. Here, we apply the Pareto distribution with the cumulative distribution function (CDF)

(2) G ( x ) = 1 − 1 + θ x λ − 1 θ ,

as the baseline model. This model is itself a generalization of exponential distribution and when θ → 0 , it tends to exponential distribution with mean λ .

So, we define the logarithmic Pareto distribution, LP ( α , θ , λ ) , with the CDF

(3) F ( x ) = 1 − ln 1 − α ¯ 1 + θ x λ − 1 θ ln ( α ) , x ≥ 0 ,

where α > 0 , θ > 0 , λ > 0 . The probability density function (PDF) is

(4) f ( x ) = α − 1 λ ln ( α ) 1 + θ x λ − 1 θ − 1 1 − ( 1 − α ) 1 + θ x λ − 1 θ , x ≥ 0 .

The following immediate lemma makes the proof of next propositions more clear.

Lemma 1

Let 0 < β 1 < 1 and β 2 > 0 . There exists integrals

(5) I 1 = ∫ 0 β 1 d u u a ( 1 − u ) ,

and

(6) I 2 = ∫ 0 β 2 d u u a ( 1 + u ) ,

which are finite when a < 1 and are infinite when a ≥ 1 .

Proof

When a = 1 , it is straightforward to see that I 1 = ∞ . So, we have

∫ 0 β 1 d u u a ( 1 − u ) ≥ ∫ 0 β 1 d u u ( 1 − u ) = ∞ ,

for any a ≥ 1 . On the other hand, it is clear that for any a < 1

∫ 0 β 1 d u u a ( 1 − u ) < 1 1 − β 1 ∫ 0 β 1 d u u a < ∞ ,

which completes the argument for I 1 .

Similarly, it is easy to check that I 2 = ∞ when a = 1 . So

∫ 0 β 2 d u u a ( 1 + u ) = ∫ 0 1 d u u a ( 1 + u ) + ∫ 1 β 2 d u u a ( 1 + u ) > ∫ 0 1 d u u ( 1 + u ) + β 2 − 1 2 = ∞ ,

for any a ≥ 1 . On the other hand,

∫ 0 β 2 d u u a ( 1 + u ) < ∫ 0 β 2 d u u a < ∞ ,

for any a < 1 which shows the result.□

Proposition 1

The kth moment of LP ( α , θ , λ ) , for 0 < α < 1 is

(7) E ( X k ) = − 1 ln ( α ) λ θ k ∑ j = 0 k k j ( − 1 ) k − j α ¯ θ j × ∫ 0 α ¯ d u u θ j ( 1 − u ) ,

and for α > 1 is

(8) E ( X k ) = 1 ln ( α ) λ θ k ∑ j = 0 k k j ( − 1 ) k − j ( − α ¯ θ j ) × ∫ 0 − α ¯ d u u θ j ( 1 + u ) .

Moreover, the kth moment exists and is finite iff θ < 1 k .

Proof

The kth moment of LP ( α , θ , λ ) is

(9) E ( X k ) = ∫ 0 ∞ x k α − 1 λ ln ( α ) 1 + θ x λ − 1 θ − 1 1 − ( 1 − α ) 1 + θ x λ − 1 θ d x .

When 0 < α < 1 , it can be simplified as follows:

(10) E ( X k ) = − 1 ln ( α ) λ θ k ∫ 0 α ¯ ( α ¯ θ u − θ − 1 ) k 1 − u d u = − 1 ln ( α ) λ θ k ∑ j = 0 k k j ( − 1 ) k − j α ¯ θ j ∫ 0 α ¯ d u u θ j ( 1 − u ) d u .

By Lemma 1, the integrals inside the last expression exist and are finite iff θ < 1 k . Note that if θ ≥ 1 k , some of these integrals are infinite by Lemma 1.

Let α > 1 , then (9) can be simplified as follows:

(11) E ( X k ) = 1 ln ( α ) λ θ k ∫ 0 − α ¯ ( ( − α ¯ ) θ u − θ − 1 ) k 1 + u d u = 1 ln ( α ) λ θ k ∑ j = 0 k k j ( − 1 ) k − j ( − α ¯ ) θ j × ∫ 0 − α ¯ d u u θ j ( 1 + u ) d u .

By Lemma 1, the integrals inside the last expression exist and are finite iff θ < 1 k . If θ ≥ 1 k , some of these integrals are infinite.□

Applying the fact that

e t X = 1 + ∑ k = 1 ∞ ( t X ) k k !

and Proposition 1, it follows that the moment generating function is not finite.

The quantile function is

(12) F − 1 ( p ) = λ θ 1 − α p ¯ 1 − α − θ − 1 ,

where p ¯ = 1 − p . Due to the fact that the moments may not be finite for some parameters, the quantiles can be used instead of them for describing the model and estimation of parameters.

There are various measures for skewness of a distribution. The Pearson moment coefficient of skewness is

(13) μ ˜ 3 = E X − μ σ 3 ,

where μ = E ( X ) and σ = Var ( X ) . So, E ( X ) , E ( X 2 ) , and E ( X 3 ) should be finite. Thus, this measure can be computed for θ < 1 3 . Another measure which can be computed for all parameter values is the Bowley measure of skewness, see ref. [16]

B = F − 1 ( 0.25 ) + F − 1 ( 0.75 ) − 2 F − 1 ( 0.50 ) F − 1 ( 0.25 ) − F − 1 ( 0.75 ) ,

which can be simplified to

B = ( 1 − p 0.25 ) − θ + ( 1 − p 0.75 ) − θ − 2 ( 1 − p 0.50 ) − θ ( 1 − p 0.25 ) − θ − ( 1 − p 0.75 ) − θ .

The kurtosis of LP is

(14) K = E X − μ σ 4 ,

which is finite for θ < 1 4 .

The Lorenz curve is finite for θ < 1 equals to

L ( p ) = 1 μ ∫ 0 F − 1 ( p ) x f ( x ) d x = α ¯ μ ln ( α ) λ θ ∫ l ( p , α ) 1 1 − u − θ 1 − α ¯ u d u ,

where l ( p , α ) = 1 − α p ¯ 1 − α .

2.1 Dynamic measures

Dynamic measures such as failure rate (FR), reversed failure rate (RFR), mean residual life (MRL), mean inactivity time (MIT), p-quantile residual life (p-QRL), and p-quantile inactivity time (p-QIT) play key role in reliability-physics and survival analysis. The FR function and the RFR rate of LP ( α , θ , λ ) are, respectively,

h ( t ) = f ( t ) F ¯ ( t ) = α − 1 λ × 1 + θ t λ − 1 θ − 1 1 − α ¯ 1 + θ t λ − 1 θ ln 1 − α ¯ 1 + θ t λ − 1 θ , t > 0 ,

and

r h ( t ) = f ( t ) F ( t ) = α − 1 λ 1 1 − α ¯ 1 + θ t λ − 1 θ × 1 + θ t λ − 1 θ − 1 ln ( α ) − ln 1 − α ¯ 1 + θ t λ − 1 θ , t > 0 .

In the following proposition, G ¯ shows the reliability function of the baseline Pareto distribution.

Proposition 2

The MRL function of LP ( α , θ , λ ) , for 0 < α < 1 is

m ( t ) = λ α ¯ θ θ ln ( 1 − α ¯ G ¯ ( t ) ) ∫ 0 α ¯ G ¯ ( t ) α ¯ − θ − u − θ 1 − u d u − t , t > 0 ,

and for α > 1 is of the form

m ( t ) = λ ( − α ¯ ) θ θ ln ( 1 − α ¯ G ¯ ( t ) ) × ∫ 0 − α ¯ G ¯ ( t ) u − θ − ( − α ¯ ) − θ 1 + u d u − t , t > 0 .

Moreover, there exists MRL function, which is finite for 0 < θ < 1 and is infinite for θ ≥ 1 .

Proof

The MRL can be written as

(15) m ( t ) = 1 F ¯ ( t ) ∫ t ∞ x f ( x ) d x − t ,

where f and F ¯ are the density function and reliability function of LP distribution, respectively. Let 0 < α < 1 , then we have

∫ t ∞ x f ( x ) d x = λ α ¯ θ θ ln ( α ) ∫ 0 α ¯ G ¯ ( t ) α ¯ − θ − u − θ 1 − u d u .

Similarly, for α > 1 , we have

∫ t ∞ x f ( x ) d x = λ ( − α ¯ ) θ θ ln ( α ) ∫ 0 − α ¯ G ¯ ( t ) u − θ − ( − α ¯ ) − θ 1 + u d u .

Now, by Lemma 1, it follows that these integrals are finite iff θ < 1 .□

The MIT can be written as

(16) ν ( t ) = t − 1 F ( t ) ∫ 0 t x f ( x ) d x , t > 0 .

Similar to Proposition 1, for 0 < α < 1 , the MIT is of the form

ν ( t ) = t − λ α ¯ θ θ ( ln α − ln ( 1 − α ¯ G ¯ ( t ) ) ) × ∫ α ¯ G ¯ ( t ) α ¯ α ¯ − θ − u − θ 1 − u d u , t > 0 ,

and for α > 1 it is of the form

ν ( t ) = t − λ ( − α ¯ ) θ θ ( ln α − ln ( 1 − α ¯ G ¯ ( t ) ) ) × ∫ − α ¯ G ¯ ( t ) − α ¯ u − θ − ( − α ¯ ) − θ 1 + u d u , t > 0 .

Moreover, because the integrands are bounded on the intervals of the integrals, the MIT function is finite for all parameter values.

The p-QRL function of LP is

q p ( t ) = F ¯ − 1 ( p ¯ F ¯ ( t ) ) − t = λ θ 1 − α p ¯ F ¯ ( x ) 1 − α − θ − 1 − t , t > 0 .

The integral that appears in the MRL is finite for θ < 1 and should be computed numerically. But, the p-QRL has simpler form for computation rather than MRL and is finite for all parameter values.

Also, the p-QIT function is of the form

qit p ( t ) = t − F − 1 ( p ¯ F ( t ) ) = t − λ θ 1 − α 1 − p ¯ F ( t ) 1 − α − θ − 1 , t > 0 .

The density and FR functions have been plotted for some parameters in Figure 1. The density is decreasing for all parameter values. However, the FR may initially decrease or increase and then decrease, i.e., it is eventually decreasing. Figure 2 plots the MRL and median residual life ( q 0.5 ( t ) ) for some parameters, showing an increasing and bathtub shape.

$Figure 1 The PDF and FR function of LP ( α , θ , λ ) {\rm{LP}}\left(\alpha ,\theta ,\lambda ) for some values of parameters.$

Figure 1

The PDF and FR function of LP ( α , θ , λ ) for some values of parameters.

$Figure 2 The MRL function and the median residual life function of LP ( α , θ , λ ) {\rm{LP}}\left(\alpha ,\theta ,\lambda ) for some values of parameters.$

Figure 2

The MRL function and the median residual life function of LP ( α , θ , λ ) for some values of parameters.

3 Estimation of the parameters

By Proposition 1, the kth moment of this distribution has different formulas for 0 < α < 1 and α > 1 , and some of the moments may be infinite. So, the moment method for estimation of parameters is not suitable. Instead, we can use the quartiles of the data and equation (12) to estimate the parameters. Let X i , i = 1 , 2 , … , n represent an independent and identically distributed (iid) sample from LP ( α , λ , θ ) . So, we should solve the following equations in terms of α , θ , and λ simultaneously.

(17) Q 0.25 = λ θ 1 − α 0.75 1 − α − θ − 1 , Q 0.50 = λ θ 1 − α 0.50 1 − α − θ − 1 , Q 0.75 = λ θ 1 − α 0.25 1 − α − θ − 1 .

The sample quartiles Q 0.25 , Q 0.50 , and Q 0.75 are computed from data. The fraction of the first and second (the second and third) equations is free from λ and can help to find a system with two parameters and two equations. Note that the quartile estimation of the parameters is a preliminary estimation and can be applied as initial values for MLE computation or more robust estimation.

Let G ¯ ( x i ) = 1 + θ x i λ − 1 θ be the reliability function of the underlying Pareto distribution. If x 1 , x 2 , … , x n show a realization from LP ( α , θ , λ ) , the log-likelihood function is

(18) l ( α , θ , λ ; x ) = n ln α − 1 ln α − n ln λ + ∑ i = 1 n ( θ + 1 ) ln G ¯ ( x i ) − ∑ i = 1 n ln ( 1 − α ¯ G ¯ ( x i ) ) .

The likelihood equations are as follows:

(19) ∂ l ∂ α = n α − 1 1 − α α ln α − 1 − ∑ i = 1 n G ¯ ( x i ) 1 − α ¯ G ¯ ( x i ) = 0 ,

(20) ∂ l ∂ θ = ∑ i = 1 n ln G ¯ ( x i ) + ( θ + 1 ) ∂ G ¯ ( x i ) ∂ θ 1 G ¯ ( x i ) + ∑ i = 1 n α ¯ ∂ G ¯ ( x i ) ∂ θ 1 1 − α ¯ G ¯ ( x i ) = 0 ,

and

(21) ∂ l ∂ λ = − n λ + ∑ i = 1 n ( θ + 1 ) x i λ 2 G ¯ θ ( x i ) + ∑ i = 1 n α ¯ x i λ 2 G ¯ θ + 1 ( x i ) 1 − α ¯ G ¯ ( x i ) = 0 ,

where

∂ G ¯ ( x i ) ∂ θ = − 1 θ 2 G ¯ ( x i ) ln G ¯ θ ( x i ) + θ x i λ + θ x i .

Here we maximize the log-likelihood function to calculate the MLE.

Let l = ln f ( X ) , then the Fisher information matrix about ( α , θ , λ ) is

(22) K = E − ∂ 2 l ∂ α 2 E − ∂ 2 l ∂ α ∂ θ E − ∂ 2 l ∂ α ∂ λ E − ∂ 2 l ∂ θ ∂ α E − ∂ 2 l ∂ θ 2 E − ∂ 2 l ∂ θ ∂ λ E − ∂ 2 l ∂ λ ∂ α E − ∂ 2 l ∂ λ ∂ θ E − ∂ 2 l ∂ λ 2 .

Suppose that K − 1 be the inverse of the information matrix, then Var ( α ˆ , θ ˆ , λ ˆ ) ≈ n − 1 K − 1 . Moreover, let X i , i = 1 , 2 , … , n represent an iid sample from LP ( α 0 , θ 0 , λ 0 ) then

(23) n ( α ˆ − α 0 , θ ˆ − θ 0 , λ ˆ − λ 0 ) T ∼ N ( 0 , K − 1 ) ,

asymptotically, which can be used for constructing confidence intervals for the parameters of the model or in hypothesis testing about them.

3.1 Right censored data

Let X i , i = 1 , 2 , … , n be censored from right by a censoring random variable C i . So, the observed data are T i = min ( X i , C i ) and d i , where d i = 1 , when X i is not censored, X i ≤ C i , and d i = 0 , when it is censored, X i > C i . It is assumed that the distribution function of C i is unknown. All information that a censored element ( t i , 0 ) gives is that the event time is greater than t i , i.e., X i > t i . Thus, the log-likelihood function which does not depend on the distribution of the censorship random variable is

(24) l c ( α , θ , λ ; t , d ) = ∑ i = 1 n d i ln f ( t i ) + ∑ i = 1 n ( 1 − d i ) ln F ¯ ( t i ) ,

where f and F ¯ represent the density and the reliability functions of the LP distribution, respectively. We can maximize this function in terms of the parameters to obtain the MLE. Moreover, the variance matrix can be approximated by the reverse of the Fisher information matrix.

4 Simulation

To generate one realization of the LP ( α , θ , λ ) , we solve the equation F − 1 ( U ) = X in terms of X numerically, where U is a simulated random variable from standard uniform distribution. For generating one right-censored random sample, we should assume a distribution for C i and some level of censorship rates p . Here, the C i has been considered to follow the degenerate distribution with mean M . When the censorship rate has been considered to be p , we can compute M by the relation M = F − 1 ( p ¯ ) , where F − 1 is defined in (12).

In this investigation, in every run r = 500 random samples of the LP with selected parameters, each of sizes n = 80 , 120, 400, and 1,000 have been extracted. For each of 500 replicate, the MLEs of parameters have been computed by maximizing the log-likelihood function directly. Then, the quadruples ( B α , AB α , RAB α , MSE α ) , ( B θ , AB θ , RAB θ , MSE θ ) , and ( B λ , AB λ , RAB λ , MSE λ ) have been computed, where

(25) B α = 1 r ∑ i = 1 r ( α ˆ i − α ) ,

(26) AB α = 1 r ∑ i = 1 r ∣ α ˆ i − α ∣ ,

(27) RAB α = 1 r ∑ i = 1 r ∣ α ˆ i − α ∣ α ,

and

(28) MSE α = 1 r ∑ i = 1 r ( α ˆ i − α ) 2 ,

and the measures for θ and λ are defined similarly. Here, B α , AB α , RAB α , and MSE α represent the bias, the absolute bias, the relative absolute bias, and the mean squared error of α estimator, respectively. The relative absolute bias is not affected by small or large values of the parameter, so it is more suitable for detecting accuracy of the estimator (Table 1).

Table 1

Simulation results for MLE of the parameters of LP distribution

p	α	θ	λ	n = 80	n = 120
0	2.5	0.01	6	(0.89174, 1.71082, 0.68432, 3.80379)	(0.92590, 1.61671, 0.64668, 3.54421)
				(0.01465, 0.02871, 2.87198, 0.00259)	(0.01690, 0.03000, 3.00039, 0.00249)
				( − 0.15260 , 1.06665, 1.77776, 1.73402)	( − 0.24555 , 0.98249, 0.16374, 1.43174)
	1.2	0.01	3	(0.50030, 0.80417, 0.67014, 0.85992)	(0.45354, 0.74937, 0.62448, 0.77589)
				(0.02160, 0.03549, 3.54977, 0.00422)	(0.02216, 0.03490, 3.49035, 0.00360)
				( − 0.18255 , 0.52506, 0.17502, 0.40837)	( − 0.17792 , 0.48792, 0.16264, 0.33693)
	0.8	0.1	2	(0.16873, 0.50522, 0.63153, 0.33917)	(0.12730, 0.47109, 0.58886, 0.30042)
				( − 0.03485 , 0.09025, 0.90256, 0.01013)	( − 0.02775 , 0.08500, 0.85005, 0.00898)
				(0.15910, 0.50561, 0.25280, 0.44575)	(0.12446, 0.45755, 0.22877, 0.32284)
0.2	2.5	0.01	6	(1.22873, 1.89350, 0.75740, 4.39057)	(1.31658, 1.85531, 0.74212, 4.26110)
				(0.06785, 0.07971, 7.97135, 0.02186)	(0.04911, 0.06101, 6.10162, 0.01226)
				( − 0.36531 , 1.41673, 0.23612, 4.07246)	( − 0.56778 , 1.27720, 0.21286, 2.33137)
	1.2	0.01	3	(0.70674, 0.99847, 0.83205, 1.15670)	(0.75526, 0.98150, 0.81792, 1.14132)
				(0.09370, 0.10498, 10.49829, 0.03502)	(0.08372, 0.09432, 9.43294, 0.02669)
				( − 0.29865 , 0.79823, 0.26607, 0.98154)	( − 0.35685 , 0.71145, 0.23715, 0.71591)
	0.8	0.1	2	(0.38695, 0.63007, 0.78758, 0.47145)	(0.35472, 0.61527, 0.76909, 0.45566)
				(0.05126, 0.16572, 1.65724, 0.05239)	(0.04857, 0.15382, 1.53827, 0.03961)
				( − 0.06967 , 0.58250, 0.29125, 0.62841)	( − 0.05599 , 0.56583, 0.28291, 0.49795)

Every cell consists of quadruples ( B α , AB α , RAB α , MSE α ) , ( B θ , AB θ , RAB θ , MSE θ ) , and ( B λ , AB λ , RAB λ , MSE λ ) from top to bottom.

The study has been conducted for uncensored data ( p = 0 ) and censored data with censorship rate p = 0.2 . The results of the simulation study are abstracted in Table 2, which show efficiency and consistency of the MLE. Every cell of the table shows results for one run. The following observations immediately follow.

As sample size increases, the absolute bias and the MSE decrease, i.e., the MLE is consistent.
All measures except the relative absolute bias for α show larger values for larger α s. Similar statements holds for θ and λ . The relative absolute bias is not affected by magnitude of the parameters.
For censored samples, the absolute bias, the relative absolute bias, and the MSE show larger values.

Table 2

Simulation results for MLE of the parameters of LP distribution

p	α	θ	λ	n = 400	n = 1,000
0	2.5	0.01	6	(0.76272, 1.27136, 0.50854, 2.52256)	(0.53958, 0.94178, 0.37671, 1.63359)
				(0.01309, 0.02503, 2.50366, 0.00140)	(0.01381, 0.02477, 2.47780, 0.00127)
				( − 0.27271 , 0.71880, 0.11980, 0.76439)	( − 0.24674 , 0.55354, 0.09225, 0.50746)
	1.2	0.01	3	(0.31080, 0.52708, 0.43923, 0.46621)	(0.25235, 0.38621, 0.32184, 0.28918)
				(0.01582, 0.02750, 2.75041, 0.00181)	(0.01603, 0.02611, 2.61128, 0.00143)
				( − 0.13972 , 0.35020, 0.11673, 0.18359)	( − 0.13638 , 0.26340, 0.08780, 0.11692)
	0.8	0.1	2	(0.14517, 0.40042, 0.50053, 0.24135)	(0.05472, 0.29209, 0.36512, 0.13692)
				( − 0.01009 , 0.06514, 0.65144, 0.00575)	( − 0.00970 , 0.05074, 0.50749, 0.00360)
				(0.03282, 0.36875, 0.18437, 0.19381)	(0.04850, 0.28627, 0.14313, 0.11543)
0.2	2.5	0.01	6	(1.16702, 1.59334, 0.63733, 3.49978)	(1.15371, 1.46776, 0.58710, 3.17303)
				(0.04031, 0.05117, 5.11732, 0.00694)	(0.04256, 0.05201, 5.20183, 0.00571)
				( − 0.53834 , 0.98509, 0.16418, 1.33335)	( − 0.59639 , 0.89226, 0.14871, 1.09865)
	1.2	0.01	3	(0.69566, 0.86318, 0.71931, 0.97805)	(0.55288, 0.69740, 0.58116, 0.74195)
				(0.06552, 0.07530, 7.53080, 0.01388)	(0.05474, 0.06438, 6.43874, 0.00922)
				( − 0.37650 , 0.57143, 0.19047, 0.43260)	( − 0.32186 , 0.47367, 0.15789, 0.32068)
	0.8	0.1	2	(0.28161, 0.51082, 0.63853, 0.36350)	(0.26390, 0.45907, 0.57383, 0.31362)
				(0.02262, 0.12045, 1.20457, 0.02045)	(0.02012, 0.10935, 1.09352, 0.01534)
				( − 0.06567 , 0.44902, 0.22451, 0.26132)	( − 0.09006 , 0.39631, 0.19815, 0.20169)

Every cell consists of quadruples ( B α , AB α , RAB α , MSE α ) , ( B θ , AB θ , RAB θ , MSE θ ) , and ( B λ , AB λ , RAB λ , MSE λ ) from top to bottom.

5 Applications

Table 3 shows intervals between successive failures of the air conditioning system for some airplanes in a fleet of 13 Boeing 720 jet airplanes. The data were reported by Proschan [17] and were analyzed by Kus [18], Tahir et al. [19], and many others. The LP distribution has been fitted to the data. The MLE estimate of the parameters is ( α ˆ , θ ˆ , λ ˆ ) = ( 10.501539 , 0.376145 , 27.845052 ) . The Akaike information criterion (AIC) is 1960.1484. The Kolmogorov–Smirnov (K–S) statistic and its p-value are 0.041423 and 0.8764, respectively, which indicate a good fit. Three alternative models, namely, the Pareto distribution with CDF (2), the power Pareto distribution with CDF

F ( x ) = 1 − 1 + θ x γ λ − 1 θ , θ , λ , γ > 0 , x ≥ 0 ,

and the Gumbel–Lomax distribution proposed by Tahir et al. [19] with CDF

F ( x ) = exp − 1 + x λ α − 1 − 1 θ , α , θ , λ > 0 , x ≥ 0 ,

were fitted to this data set. The results of fit are gathered in Table 5. Among these models, the smallest AIC and K–S statistic belong to the LP distribution. Thus, based on the AIC and K–S statistic the LP distribution outperforms other considered models. Figure 3, left side, draws the empirical and fitted CDF of the data and graphically confirms a good fit. Also, Figure 4, left side, draws the CDF of all fitted models.

Table 3

Interval between successive failures of the air conditioning system

50	130	487	57	102	15	14	10	57	320	261	51	44	9	254
493	33	18	209	41	58	60	48	56	87	11	102	12	5	14
14	29	37	186	29	104	7	4	72	270	283	7	61	100	61
502	220	120	141	22	603	35	98	54	100	11	181	65	49	12
239	14	18	39	3	12	5	32	9	438	43	134	184	20	386
182	71	80	188	230	152	5	36	79	59	33	246	1	79	3
27	201	84	27	156	21	16	88	130	14	118	44	15	42	106
46	230	26	59	153	104	20	206	5	66	34	29	26	35	5
82	31	118	326	12	54	36	34	18	25	120	31	22	18	216
139	67	310	3	46	210	57	76	14	111	97	62	39	30	7
44	11	63	23	22	23	14	18	13	34	16	18	130	90	163
208	1	24	70	16	101	52	208	95	62	11	191	14	71

Figure 3

The empirical distribution and fitted LP distribution for data set of Table 3 (left side) and data set of Table 4 (right side).

Figure 4

Some alternative models fitted to data set of Table 3 (left side) and data set of Table 4 (right side).

The second data set consists of the survival times (in years) of a group of patients given chemotherapy treatment, which was reported and analyzed by Tahir et al. [19] and Bekker et al. [20]. Table 4 shows the data. Similarly, the LP distribution has been fitted to this data set, and in turn the MLE has been computed ( α ˆ , θ ˆ , λ ˆ ) = ( 0.591632 , 0.0000002 , 1.547971 ) . The AIC value is 120.2418. The K–S statistic and p-value are 0.069681 and 0.9703. The small K–S statistic value and the near to one value of p-value indicate a good fit. The empirical and fitted CDF of LP is plotted in Figure 3. Moreover, the Pareto, the power Pareto, and the Gumbel–Lomax distributions are fitted to this data set. The results which are reported in Table 5 show that the LP distribution describes the data better than these alternatives. Figure 4, right side, draws the CDF of all fitted models. Note that Figure 3 shows only that the CDF of the data is well fit by the model. In particular, it does not imply that parameter values are well captured. It is possible for the fitted parameter values to deviate from the true parameter values very substantially, while the fitted CDF is close to the CDF of data.

Table 4

Survival times (in years) of a group of patients

0.047	0.115	0.121	0.132	0.164	0.197	0.203	0.260	0.282	0.296
0.334	0.395	0.458	0.466	0.501	0.507	0.529	0.534	0.540	0.641
0.644	0.696	0.841	0.863	1.099	1.219	1.271	1.326	1.447	1.485
1.553	1.581	1.589	2.178	2.343	2.416	2.444	2.825	2.830	3.578
3.658	3.743	3.978	4.003	4.033

Table 5

Results of fitting the LP distribution and some alternatives to data sets

Data set	Model	α ˆ ( γ ˆ )	θ ˆ	λ ˆ	AIC	K–S statistic	p–v (K–S)
Table 3	LP	10.5015	0.376145	27.8450	1960.1484	0.041423	0.8764
	Pareto		0.203456	71.4876	1963.444	0.045621	0.8503
	Power Pareto	1.094655	0.341468	97.9318	1964.86	0.047856	0.8069
	Gumbel–Lomax	21.17544	1.950514	995.2855	1961.591	0.048029	0.8034
Table 4	LP	0.591632	0.0000002	1.547971	119.2418	0.069681	0.9703
	Pareto		4.9397 × 1 0 − 8	1.341056	120.4372	0.090757	0.8200
	Power Pareto	1.053597	7.893881	1.393271	122.2474	0.10952	0.6138
	Gumbel–Lomax	20545234	1.719971	18486148	120.0156	0.096074	0.7646

6 Conclusion

The Pareto model has been applied in many fields, including finance, survival, insurance, and many others. Because of its importance, researchers have extended some generalizations of the model in recent decades. Here, a new distribution, namely LP, has been developed by applying the logarithmic transformation and its basic properties have been studied. Also, the quantile and MLE methods for estimating the parameters have been discussed. The simulation results show that the MLE method is consistent and efficient. Some datasets were considered and it was found that the LP model can describe these data sets very well. Overall, the LP distribution can be useful in both theoretical and applied situations.

Acknowledgements

The author would like to thank the editor and two anonymous reviewers for their suggestions and very constructive comments that improved the presentation and readability of the article.

Funding information: This work was supported by Researchers Supporting Project number (RSP-2021/392), King Saud University, Riyadh, Saudi Arabia.
Conflict of interest: Author states no conflict of interest.
Data availability statement: The data used to support the findings of this study are included within the article.

References

[1] Newman MEJ. Power laws, Pareto distributions and Zipf’s law. Contemp Phys. 2005;46:323–51. 10.1080/00107510500052444. Search in Google Scholar

[2] Schroeder B, Damouras S, Gill P. Understanding latent sector error and how to protect against them. ACM Trans Storage. 2010;6(3):8. 10.1145/1837915.1837917Search in Google Scholar

[3] Burroughs SM, Tebbens SF. Upper-truncated power law distributions. Fractals. 2001;9:209–22. 10.1142/S0218348X01000658Search in Google Scholar

[4] Sornette D. Multiplicative processes and power laws. Phys Rev E. 1998;57:4811–13. 10.1103/PhysRevE.57.4811Search in Google Scholar

[5] Olami Z, Feder HJS, Christensen K. Self organized criticality in a continuous, nonconservative cellular automaton modeling earthquakes. Phys Rev Lett. 1992;68:1244–7. 10.1103/PhysRevLett.68.1244Search in Google Scholar PubMed

[6] Bak P, Sneppen K. Punctuated equilibrium and criticality in a simple model of evolution. Phys Rev Lett. 1993;74:4083–6. 10.1103/PhysRevLett.71.4083Search in Google Scholar PubMed

[7] Carlson JM, Doyle J. Highly optimized tolerance: a mechanism for power laws in designed systems. Phys Rev E. 1999;60:1412–27. 10.1103/PhysRevE.60.1412Search in Google Scholar PubMed

[8] Mead M. An extended Pareto distribution. Pakistan J Statist Operat Res. 2014;10(3):313–29. 10.18187/pjsor.v10i3.766.Search in Google Scholar

[9] Ghitany ME, Gómez-Déniz E, Nadarajah S. A new generalization of the Pareto distribution and its application to insurance data. J Risk Financ Manag. 2018;11(1):10. 10.3390/jrfm11010010Search in Google Scholar

[10] Ihtisham S, Khalil A, Manzoor S, Khan SA, Ali A. Alpha-Power Pareto distribution: Its properties and applications. PLoS One. 2019;14(6):e0218027. 10.1371/journal.pone.0218027.Search in Google Scholar PubMed PubMed Central

[11] Haj AHmad H, Almetwally E. Marshall-Olkin generalized Pareto distribution: Bayesian and non-Bayesian estimation. Pakistan J Statist Operat Res. 2020;16(1):21–3. 10.18187/pjsor.v16i1.2935. Search in Google Scholar

[12] Marshall AW, Olkin I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika. 1997;84:641–52. 10.1093/biomet/84.3.641Search in Google Scholar

[13] Vasileios P, Konstantinos A, Sotirios L. A family of lifetime distributions. J Qual Reliab Eng. 2012;2012:760687. 10.1155/2012/760687.Search in Google Scholar

[14] Dey S, Nassar M, Kumar D. α Logarithmic transformed family of distributions with application. Ann. Data. Sci. 2017;4:457–82. 10.1007/s40745-017-0115-2. Search in Google Scholar

[15] Nassar M, Afify AZ, Dey S, Kumar D. A new extension of Weibull distribution: properties and different methods of estimation. J Comput Appl Math. 2018;336:439–57. 10.1016/j.cam.2017.12.001. Search in Google Scholar

[16] Bowley AL. Elements of statistics. London: P.S. King and Son, Or in a later edition: BOWLEY, AL; 1901. Search in Google Scholar

[17] Proschan F. Theoretical explanation of observed decreasing failure rate. Technometrics. 1963;5:375–83. 10.1080/00401706.1963.10490105Search in Google Scholar

[18] Kus C. A new lifetime distribution. Comput Stat Data Anal. 2007;51:4497–509. 10.1016/j.csda.2006.07.017Search in Google Scholar

[19] Tahir MH, AdnanHussain M, Cordeiro GM, Hamedani GG, Mansoor M, Zubair M. The Gumbel-Lomax distribution: properties and applications. J Stat Theory Appl. 2016;15(1):61–79. 10.2991/jsta.2016.15.1.6Search in Google Scholar

[20] Bekker A, Roux J, Mostert P. A generalization of the compound Rayleigh distribution: using a Bayesian methods on cancer survival times. Commun Stat Theory Method. 2000;29:1419–33. 10.1080/03610920008832554Search in Google Scholar

Received: 2021-07-18

Revised: 2021-10-24

Accepted: 2021-10-29

Published Online: 2021-11-15

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/phys-2021-0082

Keywords for this article

Pareto distribution; logarithmic transformation; maximum likelihood estimation

Creative Commons

BY 4.0

50	130	487	57	102	15	14	10	57	320	261	51	44	9	254
493	33	18	209	41	58	60	48	56	87	11	102	12	5	14
14	29	37	186	29	104	7	4	72	270	283	7	61	100	61
502	220	120	141	22	603	35	98	54	100	11	181	65	49	12
239	14	18	39	3	12	5	32	9	438	43	134	184	20	386
182	71	80	188	230	152	5	36	79	59	33	246	1	79	3
27	201	84	27	156	21	16	88	130	14	118	44	15	42	106
46	230	26	59	153	104	20	206	5	66	34	29	26	35	5
82	31	118	326	12	54	36	34	18	25	120	31	22	18	216
139	67	310	3	46	210	57	76	14	111	97	62	39	30	7
44	11	63	23	22	23	14	18	13	34	16	18	130	90	163
208	1	24	70	16	101	52	208	95	62	11	191	14	71

50	130	487	57	102	15	14	10	57	320	261	51	44	9	254
493	33	18	209	41	58	60	48	56	87	11	102	12	5	14
14	29	37	186	29	104	7	4	72	270	283	7	61	100	61
502	220	120	141	22	603	35	98	54	100	11	181	65	49	12
239	14	18	39	3	12	5	32	9	438	43	134	184	20	386
182	71	80	188	230	152	5	36	79	59	33	246	1	79	3
27	201	84	27	156	21	16	88	130	14	118	44	15	42	106
46	230	26	59	153	104	20	206	5	66	34	29	26	35	5
82	31	118	326	12	54	36	34	18	25	120	31	22	18	216
139	67	310	3	46	210	57	76	14	111	97	62	39	30	7
44	11	63	23	22	23	14	18	13	34	16	18	130	90	163
208	1	24	70	16	101	52	208	95	62	11	191	14	71

50	130	487	57	102	15	14	10	57	320	261	51	44	9	254
493	33	18	209	41	58	60	48	56	87	11	102	12	5	14
14	29	37	186	29	104	7	4	72	270	283	7	61	100	61
502	220	120	141	22	603	35	98	54	100	11	181	65	49	12
239	14	18	39	3	12	5	32	9	438	43	134	184	20	386
182	71	80	188	230	152	5	36	79	59	33	246	1	79	3
27	201	84	27	156	21	16	88	130	14	118	44	15	42	106
46	230	26	59	153	104	20	206	5	66	34	29	26	35	5
82	31	118	326	12	54	36	34	18	25	120	31	22	18	216
139	67	310	3	46	210	57	76	14	111	97	62	39	30	7
44	11	63	23	22	23	14	18	13	34	16	18	130	90	163
208	1	24	70	16	101	52	208	95	62	11	191	14	71