Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset

Dursun Aydın; Ersin Yılmaz; Nur Chamidah; Budi Lestari

doi:10.1515/ijb-2022-0044

Article

Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset

Dursun Aydın , Ersin Yılmaz , Nur Chamidah and Budi Lestari

Published/Copyright: May 31, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal The International Journal of Biostatistics Volume 20 Issue 1

Abstract

This paper considers a partially linear regression model relating a right-censored response variable to predictors and an extra covariate with measured error. The main problem here is that censorship and measurement error problems need to be solved to estimate the model correctly. In this sense, we propose three modified semiparametric estimators obtained from local polynomial regression, kernel smoothing, and B-spline smoothing methods based on kernel deconvolution approach and synthetic data transformation. Here, kernel deconvolution technique is used to solve the measurement error problem in the model and synthetic data transformation is considered to add the effect of censorship to the estimation procedure, which is a very common method in the literature. The performances of the introduced estimators are compared in the detailed Monte-Carlo simulation study. In addition, Carotid endarterectomy data is used as real-world data example and results are presented. According to the results, it is seen that the deconvoluted local polynomial method gives more qualified estimates than other two methods.

Keywords: B-spline; deconvolution; kernel smoothing; local polynomial; synthetic data

Corresponding author: Ersin Yılmaz, Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Mugla, Türkiye, E-mail: ersinyilmaz@mu.edu.tr

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix A1: Proof of Lemma 2.1

Proof

From the Assumptions 2.1 (ii), we obtain

E Y i G * X , T = E δ i Y i G ̄ Y i X , W = E I Z i ≤ C i min Z i , C i G ̄ min Z i , C i X , W = E I Z i ≤ C i Z i G ̄ Y i X , W = E E Z i G ̄ Y i I Z i ≤ C i X , W , Z X , W = E Z i G ̄ Y i G ̄ Y i X , W = E Z i X , W .

This result shows that Theorem 4.1 has been proven.

Appendix A2: Derivations of Equations (3.10a–d)

Let us consider the deconvoluted kernel smoother matrix (i.e., S h D K with entries S i h D K t i given in (3.7). Using this smoother matrix, we have the following partial residuals from the matrix X and synthetic response vector Y G ̂ * , respectively:

(A2.1) I − S h D K X = X ̃ a n d I − S h D K Y G ̂ * = Y ̃ G ̂ *

We also notice that the matrix S h D K performs the same task as the weight matrix in Nadaraya [19], Watson [20] estimator. In analogy with Speckman [1], this leads to the following kernel estimator

(A2.2) f ̂ D K = S h D K Y G ̂ * − X β

Supposing that X ̃ has full rank, one can estimate the vector parameter β in (A2.2) from the partial residuals in (A2.1). The deconvoluted kernel estimator β ̂ D K can be defined as a solution for the β to a sum of the squares of the partial residuals

W L S D K β = Y ̃ G ̂ * − X ̃ β ′ Y ̃ G ̂ * − X ̃ β

Simplifying,

W L S D K β = Y ̃ G ̂ * ′ Y ̃ G ̂ * − Y ̃ G ̂ * ′ X ̃ β − X ̃ ′ β ′ Y ̃ G ̂ * + X ̃ ′ β ′ X ̃ β = Y ̃ G ̂ * ′ Y ̃ G ̂ * − 2 X ̃ ′ β Y ̃ G ̂ * + β ′ β X ̃ ′ X ̃

If we take the derivative with respect to β and set it equal to zero:

(A2.3) ∂ W L S D K β / ∂ β = β X ̃ ′ X ̃ − X ̃ ′ Y ̃ G ̂ * = 0

Setting (A2.3) to zero and replacing β with β ̂ D K , the normal equations are defined as

(A2.4) X ̃ ′ X ̃ β ̂ DK = X ̃ ′ Y ̃ G ̂ *

The solution to the normal Equation (A2.4) is

(A2.5) β ̂ D K = X ̃ ′ X ̃ − 1 X ̃ ′ Y ̃ G ̂ *

as defined in (3.10a). If we replace the β in (A2.1) with the β ̂ D K in (A2.5), the deconvoluted kernel estimator of function f . in matrix form can be re-expressed as

(A2.6) f ̂ D K = S h DK Y G ̂ * − X β ̂ DK

and the vector of fitted values based on deconvoluted kernel estimator is

(A2.7) μ = X β ̂ DK + f ̂ DK = X X ̃ ′ X ̃ − 1 X ̃ ′ Y ̃ G ̂ * + S h DK Y G ̂ * − X β ̂ DK = X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * + S h DK Y G ̂ * − X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * = X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * + S h DK Y G ̂ * − S h DK X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * = S h DK Y G ̂ * + X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * − S h DK X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * = S h DK Y G ̂ * + I − S h DK X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * = S h DK + I − S h DK X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK Y G ̂ * = H h DK Y G ̂ * = Y ̂ G ̂ *

where

(A2.8) H h D K = S h D K + I − S h D K X X ̃ ′ X ̃ − 1 X ̃ ′ I − S h DK

as stated in Section 3.1.

Appendix A3: Derivations of Equations (3.18a–d)

Let Ω = diag K U W i − t 0 / h be the n × n smoothing matrix of weights based on a bandwidth parameter h. We assume that for a given Ω matrix, the deconvoluted polynomial estimators are obtained by minimizing the weighted least squares criterion

(A3.1) W L S D L ( b , β ) = min b , β Y G ̂ * − Q b − X β ′ Ω Y G ̂ * − Q b − X β

After algebraic operations, this expression can be written as

W L S D L b , β = Y G ̂ * ′ Ω Y G ̂ * − Y G ̂ * ′ Ω Q b − Y G ̂ * ′ Ω X β − Q ′ b ′ Ω Y G ̂ * + Q ′ b ′ Ω Q b + Q ′ b ′ Ω X β − β ′ X ′ Ω Y G ̂ * + β ′ X ′ Ω Q b + β ′ X ′ Ω X β = Y G ̂ * ′ Ω Y G ̂ * − 2 Q ′ b ′ Ω Y G ̂ * − 2 β ′ X ′ Ω Y G ̂ * + Q ′ b ′ Ω Q b + 2 Q ′ b ′ Ω X β + β ′ X ′ Ω X β

If we get the derivative with respect to b and set it equal to zero:

∂ W L S D L ( b , β ) / ∂ b = − 2 Q ′ Ω Y G ̂ * + 2 b Q ′ Ω Q + 2 Q ′ Ω X β = 0 b Q ′ Ω Q + Q ′ Ω X β = Q Ω Y G ̂ *

Hence, we obtain the weighted least squares normal equations

(A3.2) b Q ′ Ω Q = Q ′ Ω Y G ̂ * − X β

The solution to the (A3.2) is described as

(A3.3) b ̂ = Q ′ Ω Q − 1 Q ′ Ω Y G ̂ * − X β

Note that (A3.3) gives the coefficients of the Taylor expansion (3.11), but we needs to select the first element of the vector b ̂ = ( b ̂ 0 , … , b ̂ p ) in order to obtain f ̂ t 0 = b ̂ 0 . Similar to (A2.2), a convenient local polynomial estimator providing the f ̂ t 0 can be written of the form

(A3.4) f ̂ h i D L t 0 ; h = ∑ i = 1 n ω i ′ Q ′ i Ω i Q i − 1 Q ′ i Ω i Y i G ̂ * − X ′ i β = S h D L Y G ̂ * − X β = f ̂ D L

as claimed. Then, as in (A2.1), if X ̃ = I − S h D L X and Y ̃ G ̂ * = I − S h D L Y G ̂ * are written based on smoothing matrix of DL method which is given in (3.18b), criterion (A3.1) is rewritten as follows:

W L S D L β = Y ̃ G ̂ * − X ̃ β ′ Ω Y ̃ G ̂ * − X ̃ β

W L S D L β = Y ̃ G ̂ * ′ Ω Y ̃ G ̂ * − Y ̃ G ̂ * ′ Ω X ̃ β − X ̃ ′ β ′ Ω Y ̃ G ̂ * + X ̃ ′ β ′ Ω X ̃ β = Y ̃ G ̂ * ′ Ω Y ̃ G ̂ * − 2 X ̃ ′ β Ω Y ̃ G ̂ * + β ′ β X ̃ ′ Ω X ̃

Derivative of W L S D L β with respect to β is given by:

(A3.5) ∂ W L S D L β / ∂ β = β X ̃ ′ Ω X ̃ − X ̃ ′ Ω Y ̃ G ̂ * = 0

From (A3.5) β ̂ D L can be obtained as

(A3.6) β ̂ D L = X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω Y ̃ G ̂ *

By using (A3.5) and (A3.6), derivation of hat matrix H h D L can be shown as follows:

(A3.7) μ = X β ̂ D L + f ̂ D L = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω Y ̃ G ̂ * + S h D L Y G ̂ * − X β ̂ D L = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * + S h D L Y G ̂ * − X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * + S h D L Y G ̂ * − S h D L X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * = S h D L Y G ̂ * + X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * − S h D L X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * = S h D L Y G ̂ * + I − S h D L X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * = S h D L + I − S h D L X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L Y G ̂ * = H h D L Y G ̂ * = Y ̂ G ̂ *

where

(A3.8) H h D L = S h D L + I − S h D L X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D L

as stated in Section 3.2.

Appendix A4: Derivations of Equations (3.27a–d)

Minimization criterion for the B-spline estimator is given in Equation (3.23). Accordingly, proofs of estimates c ̂ and β ̂ DB can be obtained by rewriting (3.23) as WLS _DB:

(A4.1) W L S D B ( c , β ) = min b , β Y G ̂ * − B c − X β ′ Ω Y G ̂ * − B c − X β

Expansion of (A4.1) is given by:

W L S D B c , β = Y G ̂ * ′ Ω Y G ̂ * − Y G ̂ * ′ Ω B c − Y G ̂ * ′ Ω X β − B ′ c ′ Ω Y G ̂ * + B ′ c ′ Ω B c + B ′ c ′ Ω X β − β ′ X ′ Ω Y G ̂ * + β ′ X ′ Ω B c + β ′ X ′ Ω X β = Y G ̂ * ′ Ω Y G ̂ * − 2 B ′ c ′ Ω Y G ̂ * − 2 β ′ X ′ Ω Y G ̂ * + B ′ c ′ Ω B c + 2 B ′ c ′ Ω X β + β ′ X ′ Ω X β

From that with derivative of the equation at above, normal equation is calculated for c:

∂ W L S D B ( c , β ) / ∂ c = − 2 B ′ Ω Y G ̂ * + 2 c B ′ Ω B + 2 B ′ Ω X β = 0

c B ′ Ω B + B ′ Ω X β = B Ω Y G ̂ *

(A4.2) c B ′ Ω B = B ′ Ω Y G ̂ * − X β

By using (A4.2), c ̂ is obtained as follows:

(A4.3) c ̂ = B ′ Ω B − 1 B ′ Ω Y G ̂ * − X β

Similar to (A3.3) deconvoluted B-spline estimator of unknown smooth function f ̂ t 0 can be written of the form

(A4.4) f ̂ h i D B t 0 ; h = ∑ i = 1 n B ′ i Ω i B i − 1 B ′ i Ω i Y i G ̂ * − X ′ i β = S h D L Y G ̂ * − X β = f ̂ D B

as claimed. Partial residuals for DB is computed with X ̃ = I − S h D B X and Y ̃ G ̂ * = I − S h D B Y G ̂ * where S h D B is provided in (3.25). Based on X ̃ and Y ̃ G ̂ * , minimization criterion (A4.1) is rewritten as follows and as estimate of β is obtained similar to (A3.5):

W L S D B β = Y ̃ G ̂ * − X ̃ β Ω Y ̃ G ̂ * − X ̃ β

W L S D B β = Y ̃ G ̂ * ′ Ω Y ̃ G ̂ * − Y ̃ G ̂ * ′ Ω X ̃ β − X ̃ ′ β ′ Ω Y ̃ G ̂ * + X ̃ ′ β ′ Ω X ̃ β = Y ̃ G ̂ * ′ Ω Y ̃ G ̂ * − 2 X ̃ ′ β Ω Y ̃ G ̂ * + β ′ β X ̃ ′ Ω X ̃

Then, ∂ W L S D L β / ∂ β is calculated as

(A4.5) ∂ W L S D B β / ∂ β = β X ̃ ′ Ω X ̃ − X ̃ ′ Ω Y ̃ G ̂ * = 0

Finally β ̂ D B is given by:

(A4.6) β ̂ D B = X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω Y ̃ G ̂ *

Also, using with (A4.3) and (A4.6) derivation of hat matrix H h D B can be shown as follows:

(A4.7) μ = X β ̂ D B + f ̂ D B = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω Y ̃ G ̂ * + S h D L Y G ̂ * − X β ̂ D B = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * + S h D B Y G ̂ * − X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * = X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * + S h D B Y G ̂ * − S h D B X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * = S h D B Y G ̂ * + X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * − S h D B X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * = S h D B Y G ̂ * + I − S h D B X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * = S h D B + I − S h D B X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B Y G ̂ * = H h D B Y G ̂ * = Y ̂ G ̂ *

where

(A4.8) H h D B = S h D B + I − S h D B X X ̃ ′ Ω X ̃ − 1 X ̃ ′ Ω I − S h D B

as explained in Section 3.3.

Appendix A5: Censoring mechanism in simulation design

Algorithm 1

Generation of censoring variable C _i.

Input: Completely observed Z _i

Output: Right-censored dependent variable Y _i

1: For given censoring level (C.L.), generate δ i = I Z i ≤ C i from the bernoulli distribution

2: for i i n 1 t o n

3: If (δ _i = 0)

4: while (Z _i ≤ C _i)

5: generate C i ∼ N μ Z , σ Z 2

6: Else

7: C _i = Z _i

8: end (for loop in Step 2)

9: for i i n 1 t o n

10: If (Z _i ≤ C _i)

11: Y _i = Z _i

12: Else

13: Y _i = C _i

14: end (for loop in Step 9)

References

1. Speckman, P. Kernel smoothing in partial linear models. J Roy Stat Soc B 1988;50:413–36. https://doi.org/10.1111/j.2517-6161.1988.tb01738.x.Search in Google Scholar

2. Green, PJ, Silverman, BW. Nonparametric regression and generalized linear models. Number 58 in monographs on statistics and applied probability. In: Nonparametric regression and generalized linear models. New York, NY: CRC Press; 1994.10.1007/978-1-4899-4473-3Search in Google Scholar

3. Ruppert, D, Wand, MP, Carroll, RJ. Semiparametric regression (no. 12). Cambridge: Cambridge University Press; 2003.10.1017/CBO9780511755453Search in Google Scholar

4. Fuller, WA. Measurement error models. New York: Wiley; 1987.10.1002/9780470316665Search in Google Scholar

5. Carroll, RJ, Küchenhoff, H, Lombard, F, Stefanski, LA. Asymptotics for the SIMEX estimator in nonlinear measurement error models. J Am Stat Assoc 1996;91:242–50. https://doi.org/10.1080/01621459.1996.10476682.Search in Google Scholar

6. Liang, H, Härdle, W, Carroll, RJ. Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 1999;27:1519–35. https://doi.org/10.1214/aos/1017939140.Search in Google Scholar

7. Orbe, J, Ferreira, E, Núñez‐Antón, V. Censored partial regression. Biostatistics 2003;4:109–21. https://doi.org/10.1093/biostatistics/4.1.109.Search in Google Scholar PubMed

8. Qin, G, Jing, BY. Asymptotic properties for estimation of partial linear models with censored data. J Stat Plann Inference 2000;84:95–110. https://doi.org/10.1016/s0378-3758(99)00141-x.Search in Google Scholar

9. Aydin, D, Yilmaz, E. Modified estimators in semiparametric regression models with right-censored data. J Stat Comput Simulat 2018;88:1470–98. https://doi.org/10.1080/00949655.2018.1439032.Search in Google Scholar

10. Koul, H, Susarla, V, Van Ryzin, J. Regression analysis with randomly right-censored data. Ann Stat 1981;9:1276–88. https://doi.org/10.1214/aos/1176345644.Search in Google Scholar

11. Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann Stat 1993;21:146–56. https://doi.org/10.1214/aos/1176349019.Search in Google Scholar

12. Stute, W. Nonlinear censored regression. Stat Sin 1999;9:1089–102.Search in Google Scholar

13. Kaplan, EL, Meier, P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:457–81. https://doi.org/10.1080/01621459.1958.10501452.Search in Google Scholar

14. Fan, J, Truong, YK. Nonparametric regression with errors in variables. Ann Stat 1993;21:1900–25.10.1214/aos/1176349402Search in Google Scholar

15. Carroll, RJ, Ruppert, D, Stefanski, LA, Crainiceanu, CM. Measurement error in nonlinear models: a modern perspective. New York, NY: Chapman and Hall/CRC; 2006.10.1201/9781420010138Search in Google Scholar

16. Delaigle, A, Meister, A. Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. J Am Stat Assoc 2007;102:1416–26. https://doi.org/10.1198/016214507000000987.Search in Google Scholar

17. Wang, XF, Wang, B. Deconvolution estimation in measurement error models: the R package decon. J Stat Software 2011;39:i10. https://doi.org/10.18637/jss.v039.i10.Search in Google Scholar

18. Stefanski, LA, Carroll, RJ. Deconvolving kernel density estimators. Statistics 1990;21:169–84. https://doi.org/10.1080/02331889008802238.Search in Google Scholar

19. Nadaraya, EA. On estimating regression. Theor Probab Appl 1964;9:141–2. https://doi.org/10.1137/1109020.Search in Google Scholar

20. Watson, GS. Smooth regression analysis. Sankhya Indian J Stat 1964;26:359–72.Search in Google Scholar

21. Fan, J, Gijbels, I, Hu, TC, Huang, LS. A study of variable bandwidth selection for local polynomial regression. Stat Sin 1996;6:113–27.Search in Google Scholar

22. De Boor, C, De Boor, C. A practical guide to splines. New York: Springer-Verlag; 1978, vol 27:325 p.10.1007/978-1-4612-6333-3Search in Google Scholar

23. Theobald, CM. Generalizations of mean square error applied to ridge regression. J R Stat Soc Ser B Methodol 1974;36:103–6.10.1111/j.2517-6161.1974.tb00990.xSearch in Google Scholar

24. Fan, J. On the optimal rates of convergence for nonparametric deconvolution problems. Ann Stat 1991;19:1257–72. https://doi.org/10.1214/aos/1176348248.Search in Google Scholar

25. Li, T, Vuong, Q. Nonparametric estimation of the measurement error model using multiple indicators. J Multivariate Anal 1998;65:139–65. https://doi.org/10.1006/jmva.1998.1741.Search in Google Scholar

26. Heckman, NE. Spline smoothing in a partly linear model. J Roy Stat Soc B 1986;48:244–8. https://doi.org/10.1111/j.2517-6161.1986.tb01407.x.Search in Google Scholar

27. Rice, J. Convergence rates for partially splined models. Stat Probab Lett 1986;4:203–8. https://doi.org/10.1016/0167-7152(86)90067-2.Search in Google Scholar

28. Härdle, W, Liang, H, Gao, J. Partially linear models. Berlin: Springer Science & Business Media; 2000.10.1007/978-3-642-57700-0Search in Google Scholar

29. Han, K, Park, BU. Smooth backfitting for errors-in-variables additive models. Ann Stat 2018;46:2216–50. https://doi.org/10.1214/17-aos1617.Search in Google Scholar

30. Lee, ER, Han, K, Park, BU. Estimation of errors-in-variables partially linear additive models. Stat Sin 2018;28:2353–73. https://doi.org/10.5705/ss.202017.0101.Search in Google Scholar

31. Efron, B. Computers and the theory of statistics: thinking the unthinkable. SIAM Rev 1979;21:460–80. https://doi.org/10.1137/1021092.Search in Google Scholar

32. Hurvich, CM, Simonoff, JS, Tsai, CL. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J Roy Stat Soc B 1998;60:271–93. https://doi.org/10.1111/1467-9868.00125.Search in Google Scholar

Received: 2022-02-07

Accepted: 2023-05-11

Published Online: 2023-05-31

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ijb-2022-0044

Keywords for this article

B-spline; deconvolution; kernel smoothing; local polynomial; synthetic data