Asymmetric Laplace Regression: Maximum Likelihood, Maximum Entropy and Quantile Regression

Anil K. Bera; Antonio F. Galvao; Gabriel V. Montes-Rojas; Sung Y. Park

doi:10.1515/jem-2014-0018

Artikel

Asymmetric Laplace Regression: Maximum Likelihood, Maximum Entropy and Quantile Regression

, , und

Veröffentlicht/Copyright: 3. März 2015

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Econometric Methods Band 5 Heft 1

Abstract

This paper studies the connections among the asymmetric Laplace probability density (ALPD), maximum likelihood, maximum entropy and quantile regression. We show that the maximum likelihood problem is equivalent to the solution of a maximum entropy problem where we impose moment constraints given by the joint consideration of the mean and median. The ALPD score functions lead to joint estimating equations that delivers estimates for the slope parameters together with a representative quantile. Asymptotic properties of the estimator are derived under the framework of the quasi maximum likelihood estimation. With a limited simulation experiment we evaluate the finite sample properties of our estimator. Finally, we illustrate the use of the estimator with an application to the US wage data to evaluate the effect of training on wages.

Keywords: asymmetric Laplace distribution; quantile regression; treatment effects

JEL Classification:: C14; C31

Corresponding author: Gabriel V. Montes-Rojas, Department of Economics, City University London, 10 Northampton Square, London EC1V 0HB, UK, E-mail: Gabriel.Montes-Rojas.1@city.ac.uk

Acknowledgments

We are very grateful to the Editor, two anonymous referees, Arnold Zellner, Jushan Bai, Rong Chen, Daniel Gervini, Yongmiao Hong, Carlos Lamarche, Ehsan Soofi, Liang Wang, Zhijie Xiao, and the participants in seminars at University of Wisconsin-Milwaukee, City University London, Info-Metrics Institute Conference, September 2010, World Congress of the Econometric Society, Shanghai, August 2010, Latin American Meeting of the Econometric Society, Argentina, October 2009, Summer Workshop in Econometrics, Tsinghua University, Beijing, China, May 2009, for helpful comments and discussions. However, we retain the responsibility for any remaining errors.

Appendix

A. Interpretation of the Z-estimator

In order to interpret θ₀, we take the expectation of the estimating equations with respect to the unknown true density. To simplify the exposition we consider a simple model without covariates: y_i=α+u_i. Our estimating equation vector is defined as:

E(Ψθ(y))=E(1σ(τ−1(y<α))(1−2ττ(1−τ)−(y−α)σ)−1σ+1σ2ρτ(y−α))=0,

and the estimator is such that

1n∑i=1nψθ(yi)=0

Let F(y) be the cdf of the random variable y. Now we need to find E[Ψ_θ(y)].

For the first component we have

1σE[τ−I(y<α)]=1σ(∫ℝ(τ−1(y<α))dF(y))=1σ(τ−∫−∞αdF(y))=1σ(τ−F(α)).

Thus if we set this equal to zero, we have

α=F−1(τ),

which is the usual quantile. Thus, the interpretation of the parameter α is analogous to QR if covariates are included.

For the third term in the vector, −1σ+1σ2ρτ(y−α), we have

E[−1σ+1σ2ρτ(y−α)]=0,

that is,

σ=E[ρτ(y−α)].

Thus, as in the least squares case, the scale parameter σ can be interpreted as the expected value of the loss function.

Finally, we can interpret τ using the second equation,

E[1−2ττ(1−τ)−(y−α)σ]=0,

which implies that

1−2ττ(1−τ)=E[y]−F−1(τ)σ.

Note that s(τ)≡1−2ττ(1−τ) is a measure of the skewness of the distribution. Thus, τ should be chosen to set s(τ) equal to a measure of asymmetry of the underline distribution F(·) given by the difference of τ-quantile with the mean (and standardized by σ). In the special case of a symmetric distribution, the mean coincides with the median and mode, such that E[y]=F^–1(1/2) and τ=1/2, which is the most probable quantile and a solution to our Z-estimator.

B. Lemma A1

In this appendix we state an auxiliary result that states Donskerness and stochastic equicontinuity. Let ℱ≡{ψθ(y, x), θ∈Θ}, and define the following empirical process notation for w=(y, x):

f↦En[f(w)]=1n∑i=1nf(wi) f↦Gn[f(w)]=1n∑i=1n(f(wi)−Ef(wi)).

We follow the literature using empirical process exploiting the monotonicity and boundedness of the indicator function, the boundedness of the moments of x and y, and that the problem is a parametric one.

Lemma A1 Under Assumptions A1–A4 ℱ is Donsker. Furthermore,

θ↦Gnψθ(y, x)

is stochastically equicontinuous, that is

sup||θ−θ0||≤δn||Gnψθ(y, x)−Gnψθ0(y, x)||=op(1),

for any δ_n↓0.

Proof: The proof of this result follows similar steps to those in Chernozhukov and Hansen (2006). To prove the lemma we check the conditions for independent but not identically distributed process stated in Theorem 2.11.1 of van der Vaart and Wellner (1996). It is important to note that a class ℱ of a vector-valued functions f:x↦ℝk is Donsker if each of the classes of coordinates fj:x↦ℝk with f=(f₁, …, f_k) ranging over ℱ(j=1, 2,...,k) is Donsker (van der Vaart 1998, 270).

First, one can check the random-entropy condition by checking that ℱ satisfies a uniform entropy condition and the envelope is square integrable. The first element of the vector is ψ1θ(y, x)=(τ−1(y<x′β))xσ. Note that the functional class 𝔄={τ−1{y<x′β}, τ∈T, β∈ℬ} is a VC subgraph class, with envelope 2. Its product with x also forms a class with a square integrable envelope 2 max_j|x_j|. Finally, the class ℱ1 is defined as the product of the latter with 1/σ, which is bounded by assumption A3. Thus, by assumption A4 ℱ1 is satisfies a uniform entropy condition and the envelope function is square integrable. Therefore, the random entropy condition (2.11.2) in van der Vaart and Wellner (1996) is satisfied.

The second element of the vector is ψ2θ(y, x)=(1−2ττ(1−τ)−(y−x′β)σ). Define 𝕳={(y−x′β), β∈ℬ}. Note that

|(y−x′β1)−(y−x′β2)|=|x′(β2−β1)|≤||x||||β2−β1||,

where the inequality follows from Cauchy-Schwartz inequality. Thus by Assumptions A3–A4 the class 𝕳 has envelope function square integrable. In addition, note that, 𝕳 belongs to a VC class satisfying a uniform entropy condition, since this class is a subset of the vector space of functions spanned by (y, x₁, …, x_p), where p is the fixed dimension of x (see e.g., example 19.7 in van der Vaart (1998)). Thus, the class defined by 1/σ𝕳 has envelope of 𝕳 (|y|+const*|x|) which is square integrable by assumptions A3–A4. Therefore, ℱ2 satisfies the random entropy condition.

The third element of the vector is ψ3θ(y, x)=(−1σ+1σ2ρτ(y−x′β)). Consider the following random process defined by 𝕵={ρτ(y−x′β), τ∈T, β∈ℬ} which satisfies a uniform entropy condition and square integrable envelope function. The latter is given by Assumptions A3–A4 and the quantile regression check function properties as ρ_τ(x+y)–ρ_τ(y)≤2|x| and ρτ1(y−x′t)−ρτ2(y−x′t)=(τ2−τ1)(y−x′t). The former follows from the fact that this is a parametric class collection of measurable functions indexed in a bounded subset. Hence, ℱ3 the random entropy condition and the envelope function is square integrable.

Now we turn our attention to the second condition of Theorem 2.11.1 in van der Vaart and Wellner (1996). The process θ↦Gnψθ(y, x) is stochastically equicontinuous over Θ with respect to a L₂(P) pseudo-metric. First, as in Angrist, Chernozhukov, and Fernández-Val (2006) and Chernozhukov and Hansen (2006), we define the distance d as the following L₂(P) pseudo-metric

d(θ′, θ′′)=E([ψθ′−ψθ′′]2).

Thus, as ||θ–θ₀||→0 we need to show that

(28)d(θ, θ0)→0, (28)

and the final follows from Theorem 2.11.1 of van der Vaart and Wellner (1996).

To show (28), first note that for each i=1, …, n,

d1i(θ′, θ)=E([ψ1θ′−ψ1θ]2)=E([(τ′−1(yi−xiβ′))xiσ′−(τ−1(yi−xiβ))xiσ]2)≤[(E|1σ′(τ′−1(yi−xiβ′))−1σ1(τ−(yi−xiβ))|2(2+ϵ)ϵ)ϵ(2+ϵ)⋅(E(|xi|2)2+ϵ2)2(2+ϵ)]12=(E|(τ′σ′−τσ)+(1σ1(yi≤xiβ)−1σ′1(yi≤xiβ′))|2(2+ϵ)ϵ)ε2(2+ϵ)⋅(E(|xi|2)2+ϵ2)1(2+ϵ)≤[((|τ′σ′−τσ|)2(2+ϵ)ϵ)ϵ2(2+ϵ)+(E(|1σ1(yi≤xiβ)−1σ′1(yi≤xiβ′)|)2(2+ϵ)ϵ)ϵ2(2+ϵ)]⋅(E(|xi|2)2+ϵ2)1(2+ϵ)≤[|τ′σ′−τσ|+(E|g¯i⋅xi(β′σ′−βσ)|)ϵ2(2+ϵ)]⋅(E||xi||2+ϵ)1(2+ϵ)≤[|τ′σ′−τσ|+(E g¯i||xi||β′σ′−βσ||)ϵ2(2+ϵ)]⋅(E||xi||2+ϵ)1(2+ϵ),

where the first inequality is Holder’s inequality, the second is Minkowski’s inequality, the third is a Taylor expansion as in Angrist, Chernozhukov and Fernández-Val (2006), p.560) where g̅_i is the upper bound of g_i(y_i|x) (using A2), and the last is Cauchy-Schwarz inequality. Therefore, by assumption A2–A4 sup||θ′−θ||<δn∑i=1nd1i→0 when δ_n→0.

Now rewrite ψ2θ(y, x)=(σ1−2ττ(1−τ)−(y−x′β)) and note that

d2i(θ′,θ)=E([ψ2θ′−ψ2θ]2)=E([σ′1−2τ′τ′(1−τ′)−(yi−x′iβ′)−σ1−2ττ(1−τ)+(yi−x′iβ)]2)=E(|σ′1−2τ′τ′(1−τ′)−σ1−2ττ(1−τ)+(x′i(β−β′))|2)≤(|σ′1−2τ′τ′(1−τ′)−σ1−2ττ(1−τ)|2)1/2+(E|xi′(β−β′)|2)1/2≤(|σ′1−2τ′τ′(1−τ′)−σ1−2ττ(1−τ)|2)1/2+||β′−β||(E||xi||2)1/2,

where the first inequality is given by Minkowski’s inequality (E|X+Y|^p)^1/p≤(E|X|^p)^1/p+(E|Y|^p)^1/p for p≥1, and the second inequality is Cauchy-Schwarz inequality. Hence, assumptions A3–A4 ensure that sup||θ′−θ||<δn∑i=1nd2i→0 when δ_n→0.

Finally, rewrite ψ_3θ(y, x)=(–σ+ρ_τ(γ–x′β)), and thus

d3i(θ′, θ)=E([ψ3θ′−ψ3θ]2)=E([−σ′+ρτ′(yi−xiβ′)+σ−ρτ(yi−xiβ)]2)=E([−σ′+σ+ρτ′(yi−xiβ′)−1σ2ρτ(yi−xiβ)]2)≤(−σ′+σ)2+E([ρτ′(yi−xiβ′)−ρτ(yi−xiβ)]2)=(−σ′+σ)2+E([ρτ′(yi−xiβ′)−ρτ′(yi−xiβ)+ρτ′(yi−xiβ)−ρτ(yi−xiβ)]2)≤|σ−σ′|+E([||xi(β′−β)||+|τ′−τ|(yi−xiβ)]2)≤|σ−σ′|+E([||xi||||β′−β||+|τ′−τ|(yi−xiβ)]2)≤|σ−σ′|+(E[||xi||||β′−β||]2)1/2+(E[|τ′−τ|(yi−xiβ)]2)1/2=|σ−σ′|+||β′−β||(E||xi||2)1/2+|τ′−τ|(E[(yi−xiβ)]2)1/2≤const⋅(|σ−σ′|+||β′−β||+|τ′−τ|),

where the first inequality is given by Minkowski’s inequality, the second inequality is given again by QR check function properties as ρ_τ(x+y)–ρ_τ(y)≤2|x| and ρτ1(y−x′t)−ρτ2(y−x′t)=(τ2−τ1)(y−x′t). Third inequality is Cauchy-Schwarz inequality. Fourth is Minkowski’s inequality. Last inequality uses assumption A4, and finally we have that sup||θ′−θ||<δn∑i=1nd3i→0 when δ_n→0.

Thus, ||θ′–θ||→0 implies that d(θ′, θ)→0 in every case, and therefore, sup||θ′−θ||<δn∑i=1ndi→0 when δ_n→0. The final condition in Theorem 2.11.1 in van der Vaart and Wellner (1996) is a Lindeberg condition, which is guaranteed by assumptions A1–A4. Therefore, we conclude that ℱ is Donsker and

sup||θ−θ0||≤δn||Gnψθ(y, x)−Gnψθ0(y, x)||=op(1).■

References

Abadie, A., J. Angrist, and G. Imbens. 2002. “Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings.” Econometrica 70: 91–117.10.1111/1468-0262.00270Suche in Google Scholar

Angrist, J., V. Chernozhukov, and I. Fernández-Val. 2006. “Quantile Regression Under Misspecification, with an Application to the U.S. Wage Structure.” Econometrica 74: 539–563.10.1111/j.1468-0262.2006.00671.xSuche in Google Scholar

Bloom, H. S. B., L. L. Orr, S. H. Bell, G. Cave, F. Doolittle, W. Lin, and J. M. Bos. 1997. “The Benefits and Costs of JTPA Title II-a Programs. Key Findings from the National Job Training Partnership Act Study.” Journal of Human Resources 32: 549–576.10.2307/146183Suche in Google Scholar

Chernozhukov, V., and C. Hansen. 2006. “Instrumental Quantile Regression Inference for Structural and Treatment Effects Models.” Journal of Econometrics 132: 491–525.10.1016/j.jeconom.2005.02.009Suche in Google Scholar

Chernozhukov, V., and C. Hansen. 2008. “Instrumental Variable Quantile Regression: A Robust Inference Approach.” Journal of Econometrics 142: 379–398.10.1016/j.jeconom.2007.06.005Suche in Google Scholar

Chernozhukov, V., I. Fernández-Val, and B. Melly. 2009. “Inference on Counterfactual Distributions.” CEMMAP Working Paper CWP09/09.10.2139/ssrn.1235529Suche in Google Scholar

Ebrahimi, N., E. S. Soofi, and R. Soyer. 2008. “Multivariate Maximum Entropy Identification, Transformation, and Dependence.” Journal of Multivariate Analysis 99: 1217–1231.10.1016/j.jmva.2007.08.004Suche in Google Scholar

Firpo, S. 2007. “Efficient Semiparametric Estimation of Quantile Treatment Effects.” Econometrica 75: 259–276.10.1111/j.1468-0262.2007.00738.xSuche in Google Scholar

Geraci, M., and M. Botai. 2007. “Quantile Regression for Longitudinal Data Using the Asymmetric Laplace Distribution.” Biostatistics 8: 140–154.10.1093/biostatistics/kxj039Suche in Google Scholar PubMed

He, X., and Q.-M. Shao. 1996. “A General Bahadur Representation of M-Estimators and its Applications to Linear Regressions with Nonstochastic Designs.” Annals of Statistics 24: 2608–2630.10.1214/aos/1032181172Suche in Google Scholar

He, X., and Q.-M. Shao. 2000. “Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models.” Statistica Sinica 10: 129–140.Suche in Google Scholar

Hinkley, D. V., and N. S. Revankar. 1997. “Estimation of the Pareto Law from Underreported Data: A Further Analysis.” Journal of Econometrics 5: 1–11.10.1016/0304-4076(77)90031-8Suche in Google Scholar

Huber, P. J. 1967. “The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions.” In Fifth Symposium on Mathematical Statistics and Probability, 179–195. California: Unibersity of California, Berkeley.Suche in Google Scholar

Kitamura, Y., and M. Stutzer. 1997. “An Information-Theoretic Alternative to Generalized Method of Moments Estimation.” Econometrica 65: 861–874.10.2307/2171942Suche in Google Scholar

Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press.10.1017/CBO9780511754098Suche in Google Scholar

Koenker, R., and G. W. Bassett. 1978. “Regression Quantiles.” Econometrica 46: 33–49.10.2307/1913643Suche in Google Scholar

Koenker, R., and J. A. F. Machado. 1999. “Godness of Fit and Related Inference Processes for Quantile Regression.” Journal of the American Statistical Association 94: 1296–1310.10.1080/01621459.1999.10473882Suche in Google Scholar

Koenker, R., and Z. Xiao. 2002. “Inference on the Quantile Regression Process.” Econometrica 70: 1583–1612.10.1111/1468-0262.00342Suche in Google Scholar

Komunjer, I. 2005. “Quasi-Maximum Likelihood Estimation for Conditional Quantiles.” Journal of Econometrics 128: 137–164.10.1016/j.jeconom.2004.08.010Suche in Google Scholar

Komunjer, I. 2007. “Asymmetric Power Distribution: Theory and Applications to Risk Measurement.” Journal of Applied Econometrics 22: 891–921.10.1002/jae.961Suche in Google Scholar

Kosorok, M. R. 2008. Introduction to Empirical Processes and Semiparametric Inference. New York, New York: Springer-Verlag Press.10.1007/978-0-387-74978-5Suche in Google Scholar

Kotz, S., T. J. Kozubowski, and K. Podgórsk. 2002a. “Maximum Entropy Characterization of Asymmetric Laplace Distribution.” International Mathematical Journal 1: 31–35.Suche in Google Scholar

Kotz, S., T. J. Kozubowski, and K. Podgórsk. 2002b. “Maximum Likelihood Estimation of Asymmetric Laplace Distributions.” Annals of the Institute Statistical Mathematics 54: 816–826.10.1023/A:1022467519537Suche in Google Scholar

LaLonde, R. J. 1995. “The Promise of Public-Sponsored Training Programs.” Journal of Economic Perspectives 9: 149–168.10.1257/jep.9.2.149Suche in Google Scholar

Machado, J. A. F. 1993. “Robust Model Selection and M-Estimation.” Econometric Theory 9: 478–493.10.1017/S0266466600007775Suche in Google Scholar

Manski, C. F. 1991. “Regression.” Journal of Economic Literature 29: 34–50.Suche in Google Scholar

Park, S. Y., and A. K. Bera. 2009. “Maximum Entropy Autoregressive Conditional Heteroskedasticity Model.” Journal of Econometrics 150: 219–230.10.1016/j.jeconom.2008.12.014Suche in Google Scholar

Schennach, S. M. 2008. “Quantile Regression with Mismeasured Covariates.” Econometric Theory 24: 1010–1043.10.1017/S0266466608080390Suche in Google Scholar

Soofi, E. S., and J. J. Retzer. 2002. “Information Indices: Unification and Applications.” Journal of Econometrics 107: 17–40.10.1016/S0304-4076(01)00111-7Suche in Google Scholar

van der Vaart, A. 1998. Asymptotic Statistics. Cambridge: Cambridge University Press.10.1017/CBO9780511802256Suche in Google Scholar

van der Vaart, A., and J. A. Wellner. 1996. Weak Convergence and Empirical Processes. New York, New York: Springer-Verlag.10.1007/978-1-4757-2545-2Suche in Google Scholar

Wei, Y., and R. J. Carroll. 2009. “Quantile Regression with Measurement Error.” Journal of the American Statistical Association 104: 1129–1143.10.1198/jasa.2009.tm08420Suche in Google Scholar

Yu, K., and R. A. Moyeed. 2001. “Bayesian Quantile Regression.” Statistics & Probability Letters 54: 437–447.10.1016/S0167-7152(01)00124-9Suche in Google Scholar

Yu, K., and J. Zhang. 2005. “A Three-Parameter Asymmetric Laplace Distribution and Its Extension.” Communications in Statistics – Theory and Methods 34: 1867–1879.10.1080/03610920500199018Suche in Google Scholar

Zhao, Z., and Z. Xiao. 2011. “Efficient Regressions Via Optimally Combining Quantile Information.” Manuscript, University of Illinois at Urbana-Champaign.Suche in Google Scholar

Zou, H., and M. Yuan. 2008. “Composite Quantile Regression and the Oracle Model Selection Theory.” Annals of Statistics 36: 1108–1126.10.1214/07-AOS507Suche in Google Scholar

Published Online: 2015-3-3

Published in Print: 2016-1-1

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/jem-2014-0018

Schlagwörter für diesen Artikel

asymmetric Laplace distribution; quantile regression; treatment effects