Abstract
In this paper we show that a regression-based approach can be used to estimate generalised entropy and Atkinson inequality indices and their associated standard errors. The applicability of this approach is demonstrated using the health expenditure data from the United States (US) medical expenditure panel survey (MEPS).
1 Introduction
Measures of inequality are commonly employed as valuable tools in the field of economics to delve into issues related to income distribution and the overall well-being of a society. These measures provide quantitative insights into the degree of disparity that exists within a given population, helping researchers, policymakers, and economists gain a deeper understanding of how income and resources are distributed among individuals or households. Atkinson (1970) inequality and generalised entropy inequality measures are commonly used instruments to quantify and analyse income inequality.
The necessity for statistical inference becomes apparent when dealing with small samples, but it remains relevant even for large samples as it can yield statistical measures of precision. Conducting statistical inference for many of the inequality measures used in the literature poses a challenge due to their non-linear nature as functions of a random variable, typically income. Consequently, the standard error provided by asymptotic theory may not accurately capture the true variability.
Several authors have introduced various methods for calculating sampling variances of inequality measures. Notably, Ogwang (2000) and Karagiannis and Kovacevic (2000) have proposed formulae for easily computable jackknife variance estimators specifically for the Gini coefficient. Karoly (1989) has provided formulae for jackknife estimators for other indices, including the Atkinson and generalized entropy (GE) classes. Mills and Zandvakili (1997) have proposed bootstrap resampling methods for the Gini coefficient and the two Theil measures, which were later extended by Biewen (2002) to encompass all Atkinson, GE, and Kolm indices, incorporating weighting. Biewen and Jenkins (2006) have developed the sampling variances for GE and Atkinson indices, considering the complexities associated with survey design aspects. Additionally, Clarke and Roy (2012) examined inference for Generalized Entropy and Atkinson inequality measures using complex survey data. They employed Wald statistics with variance–covariance matrices estimated through a linearization approximation method.
In their study, Davidson and Flachaire (2007) examine the finite-sample performance of asymptotic and bootstrap inference methods for Theil inequality measure. Their simulation-based findings indicate that neither asymptotic nor standard bootstrap inference demonstrates satisfactory performance, even in the presence of very large samples. The authors delve into the underlying causes of this subpar performance and discover that both asymptotic and bootstrap inference are highly sensitive to the specific characteristics of the upper tail of the income distribution. Moreover, in their concluding section, Davidson and Flachaire (2007) note that numerous parametric income distributions are heavy-tailed, characterized by the upper tail decaying in a manner akin to a power function. If this decay in the upper tail is gradual, the variance may become infinite, leading to a lack of consistency in both asymptotic and bootstrap methods. Additionally, even if the variance remains finite, the frequent existence of extreme observations in sample data presents challenges for the bootstrap method.
Ogwang (2000) utilised a regression-based method to estimate the Gini coefficient and proposed a simple algorithm for calculating its jackknife standard error. On the other hand, Giles (2004) chose not to employ the jackknife approximation and instead derived an exact analytical expression for the standard error of the Gini index. Modarres and Gastwirth (2006) showed that the regression-based approach for estimating the standard error of the Gini index can lead to inaccurate results, as it fails to account for the correlations[1] introduced in the error terms when the data is ordered. Unlike the Gini coefficient, which is sensitive to the ranking of observations, the Atkinson and GE indices are rank-independent measures. As a result, the regression errors can be assumed to be independent. In this paper, we expand upon the regression framework to estimate both the Atkinson and GE measures of inequality, while also accounting for their respective standard errors.
The structure of the paper is as follows. Section 2 presents the formulation of regression-based estimators for GE and Atkinson inequality measures. Section 3 offers an empirical demonstration of the proposed estimators using health expenditure data from the US medical expenditure panel survey (MEPS). Finally, Section 4 concludes the paper with closing remarks.
2 Estimating Generalised Entropy and Atkinson Indices
We suppose that
where y
i
is some function of x
i
, say,
For the generalised entropy measure of inequality, parameter α in the above equation is essential as it determines how the weight is allocated to distances between incomes across different segments of the income distribution. When α is large, the index becomes highly responsive to the presence of large incomes, whereas when it is small, the index becomes particularly sensitive to the existence of small incomes.
The expression for
Hence, the ordinary least squares (OLS) estimates from equation (1) for different values of parameter α would be
and
The standard errors for the entropy indices for different values of parameter α can be calculated as
In the context of the Atkinson index of inequality, the parameter α is referred to as the “inequality aversion parameter”. This parameter plays a crucial role in determining the degree of sensitivity to social welfare losses caused by inequality. The Atkinson index itself is defined[3] in relation to a specific social welfare function (W), where the product of the mean income and one minus the Atkinson index represents the welfare equivalent of income being equally distributed among individuals. Consequently, the Atkinson index reveals the proportion of current income that could be foregone, without compromising social welfare, in a hypothetical scenario of complete equality. Under the utilitarian ethical standard, and with the inclusion of certain assumptions such as a homogeneous population and constant elasticity of substitution utility, the parameter α in this context is synonymous with the income elasticity of marginal utility of income.
The Atkinson inequality measure is estimated in two steps. In the first step
and OLS estimates based on equation (1) are obtained as
and
In the second step estimates of the corresponding Atkinson indices are derived as
and
The standard errors for the aforementioned indices are computed using a Taylor series approximation:[4]
and
The standard errors for the
3 An Empirical Illustration
This study utilises total health expenditure[5] data derived from the 2020 Medical Expenditure Panel Survey (MEPS) of the USA. The MEPS survey, administered by the Department of Health and Human Services’ Agency for Healthcare Research and Quality (AHRQ), is a comprehensive and nationally representative survey conducted among the civilian non-institutionalized population across the United States. It aims to assess their health status, insurance coverage, as well as healthcare utilization and expenditures. MEPS gathers data on distinct health services, recording their frequencies, costs, payment methods, and information regarding health insurance coverage among US workers. For our study, we have utilised the publicly available dataset from the 2020 full year consolidated data file, which was released in August 2022. This dataset encompasses variables and frequency distributions pertaining to 27,805 individuals who participated in the MEPS Household Component of the Medical Expenditure Panel Survey in 2020 (for further details see MEPS background, data and associated code book documents listed in the reference section below). In this sample, the average health expenditure was $6,309, accompanied by a standard deviation of $21,986. Within this survey, 17.6 % (comprising 4,889 individuals) reported zero health expenditure, while the highest expenditure recorded was $1,662,894.
We conducted a comparison between the precise standard errors obtained through OLS and those produced by the jackknife[6] estimator, utilising the above-mentioned individual-level US health expenditure data. Table 1 displays the obtained results. Within our sample, which consisted of 27,805 individuals, we observed minimal disparities between the standard errors obtained from OLS and those from the jackknife. The percentage distortion
Health expenditure inequality in US, 2020 – generalised entropy and Atkinson index estimates, with standard errors.
| OLS | JackKnife | ||||
|---|---|---|---|---|---|
| (1) | (2) | (3) | (4) | (5) | (6) |
| Index |
|
|
|
|
D |
| GE(-1) | 562.6614 | 7.1893 | 562.6614 | 7.1888 | −0.0071 |
| GE(0) | 2.5992 | 0.0196 | 2.5992 | 0.0196 | 0.0224 |
| GE(1) | 1.5058 | 0.0835 | 1.5058 | 0.0835 | 0.0001 |
| GE(2) | 6.0707 | 1.3483 | 6.0707 | 1.3483 | −0.0001 |
| GE(2.5) | 26.2696 | 11.1191 | 26.2696 | 11.1191 | 0.0000 |
| A(0.25) | 0.3322 | 0.0108 | 0.3322 | 0.0108 | 0.0045 |
| A(0.5) | 0.5913 | 0.0059 | 0.5913 | 0.0059 | −0.0043 |
| A(1) | 0.9257 | 0.0015 | 0.9257 | 0.0015 | 0.0224 |
| A(1.5) | 0.9962 | 0.0001 | 0.9962 | 0.0001 | −0.0042 |
| A(2) | 0.9991 | 0.0000 | 0.9991 | 0.0000 | −0.0070 |
-
Health expenditure is explained in the main text above. The sample consists of 27,805 individuals, among whom 4,889 had zero health expenditures. The computation of generalised entropy indices GE(-1), GE(0), and GE(1), as well as Atkinson indices A(1), A(1.5), and A(2), cannot be performed when there are zero health expenditures. To address this limitation, we included zero health expenditures by adding the positive number one.
In Table 1, columns 2 and 3 show the estimates obtained through OLS, while columns 4 and 5 present the Jackknife estimates. It is worth noting that all the indices were estimated with high precision. Specifically, all estimates, except for GE(2.5), showed statistical significance at the 1 % level, and GE(2.5) demonstrated statistical significance at the 2 % level. Moreover, when comparing the standard errors obtained through jackknife with those from OLS (column 6), the distortions for all indices were found to be less than 0.1 % in absolute value.
4 Conclusions
The generalised entropy and Atkinson indices are widely used measures of inequality. In order to construct confidence intervals or conduct tests for these indices, it is imperative to estimate their standard errors. Nevertheless, performing statistical inference for these inequality measures is complex due to their reliance on non-linear transformations of a random variable, typically income. Consequently, some researchers have recommended the use of the jackknife technique to estimate standard errors, especially when dealing with large sample sizes. However, it is worth noting that since these indices can be obtained through a straightforward OLS regression-based approach, calculating the exact standard error is a fairly straightforward task.
We calculate the GE and Atkinson indices, along with their corresponding standard errors, using the health expenditure data from the US Medical Expenditure Panel Survey (MEPS). In our illustrative application, we observe minimal disparity between the standard errors obtained through OLS and the jackknife technique. The OLS method offers the advantage of higher computational efficiency when compared to the jackknife approach.
Acknowledgment
I extend my gratitude to Satya Paul for his comments on an earlier draft of this paper. I also thank the anonymous referee for their constructive feedback.
References
Atkinson, A. B. 1970. “On the Measurement of Inequality.” Journal of Economic Theory 2 (3): 244–63. https://doi.org/10.1016/0022-0531(70)90039-6.Search in Google Scholar
Biewen, M. 2002. “Bootstrap Inference for Inequality, Mobility and Poverty Measurement.” Journal of Econometrics 108 (2): 317–42. https://doi.org/10.1016/s0304-4076(01)00138-5.Search in Google Scholar
Biewen, M., and S. P. Jenkins. 2006. “Variance Estimation for Generalized Entropy and Atkinson Inequality Indices: The Complex Survey Data Case.” Oxford Bulletin of Economics & Statistics 68 (3): 371–83. https://doi.org/10.1111/j.1468-0084.2006.00166.x.Search in Google Scholar
Clarke, J. A., and N. Roy. 2012. “On Statistical Inference for Inequality Measures Calculated from Complex Survey Data.” Empirical Economics 43: 499–524. https://doi.org/10.1007/s00181-011-0499-3.Search in Google Scholar
Davidson, R., and E. Flachaire. 2007. “Asymptotic and Bootstrap Inference for Inequality and Poverty Measures.” Journal of Econometrics 141 (1): 141–66. https://doi.org/10.1016/j.jeconom.2007.01.009.Search in Google Scholar
Giles, D. E. 2004. “Calculating a Standard Error for the Gini Coefficient: Some Further Results.” Oxford Bulletin of Economics & Statistics 66 (3): 425–33. https://doi.org/10.1111/j.1468-0084.2004.00086.x.Search in Google Scholar
Karagiannis, E., and M. Kovacevic. 2000. “A Method to Calculate the Jackknife Variance Estimator for the Gini Coefficient.” Oxford Bulletin of Economics & Statistics 62 (1): 119–22. https://doi.org/10.1111/1468-0084.00163.Search in Google Scholar
Karoly, L. 1989. Computing Standard Errors for Measures of Inequality Using the Jackknife. Unpublished Paper. Santa Monica: RAND Corporation.Search in Google Scholar
Medical Expenditure Panel Survey Background. Also available at: https://meps.ahrq.gov/mepsweb/about_meps/survey_back.jsp (accessed July 28, 2023).Search in Google Scholar
MEPS HC-224 2020 Full Year Consolidated Data Codebook. Also available at: https://meps.ahrq.gov/data_stats/download_data/pufs/h224/h224cb.pdf (accessed July 28, 2023).Search in Google Scholar
MEPS HC-224 2020 Full Year Consolidated Data File. Also available at: https://meps.ahrq.gov/data_stats/download_data/pufs/h224/h224doc.pdf (accessed July 28, 2023).Search in Google Scholar
Mills, J. A., and S. Zandvakili. 1997. “Statistical Inference via Bootstrapping for Measures of Inequality.” Journal of Applied Econometrics 12 (2): 133–50. https://doi.org/10.1002/(sici)1099-1255(199703)12:2<133::aid-jae433>3.3.co;2-8.10.1002/(SICI)1099-1255(199703)12:2<133::AID-JAE433>3.0.CO;2-HSearch in Google Scholar
Modarres, R., and J. L. Gastwirth. 2006. “A Cautionary Note on Estimating the Standard Error of the Gini Index of Inequality.” Oxford Bulletin of Economics & Statistics 68 (3): 385–90. https://doi.org/10.1111/j.1468-0084.2006.00167.x.Search in Google Scholar
Ogwang, T. 2000. “A Convenient Method of Computing the Gini Index and its Standard Error.” Oxford Bulletin of Economics & Statistics 62 (1): 123–9. https://doi.org/10.1111/1468-0084.00164.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/snde-2024-0021).
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Research Articles
- Multivariate Stochastic Volatility with Co-Heteroscedasticity
- Extreme Return Spillover Between the WTI, the VIX, and Six Latin American Stock Markets: A Quantile Connectedness Approach
- A Regression-based Method for Estimating Generalised Entropy and Atkinson Inequality Indices and their Standard Errors
- Homogeneity Pursuit in the Functional-Coefficient Quantile Regression Model for Panel Data with Censored Data
- Asymptotic Efficiency of Joint Estimator Relative to Two-Stage Estimator Under Misspecified Likelihoods
- Chinese Crude Oil Futures and Sectoral Stocks: Copula-Based Dependence Structure and Connectedness
Articles in the same Issue
- Frontmatter
- Research Articles
- Multivariate Stochastic Volatility with Co-Heteroscedasticity
- Extreme Return Spillover Between the WTI, the VIX, and Six Latin American Stock Markets: A Quantile Connectedness Approach
- A Regression-based Method for Estimating Generalised Entropy and Atkinson Inequality Indices and their Standard Errors
- Homogeneity Pursuit in the Functional-Coefficient Quantile Regression Model for Panel Data with Censored Data
- Asymptotic Efficiency of Joint Estimator Relative to Two-Stage Estimator Under Misspecified Likelihoods
- Chinese Crude Oil Futures and Sectoral Stocks: Copula-Based Dependence Structure and Connectedness