The Youden Index in the Generalized Receiver Operating Characteristic Curve Context

Pablo Martínez-Camblor; Juan Carlos Pardo-Fernández

doi:10.1515/ijb-2018-0060

Artikel

The Youden Index in the Generalized Receiver Operating Characteristic Curve Context

Pablo Martínez-Camblor und Juan Carlos Pardo-Fernández

Veröffentlicht/Copyright: 3. April 2019

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift The International Journal of Biostatistics Band 15 Heft 1

Abstract

The receiver operating characteristic (ROC) curve and their associated summary indices, such as the Youden index, are statistical tools commonly used to analyze the discrimination ability of a (bio)marker to distinguish between two populations. This paper presents the concept of Youden index in the context of the generalized ROC (gROC) curve for non-monotone relationships. The interval estimation of the Youden index and the associated cutoff points in a parametric (binormal) and a non-parametric setting is considered. Monte Carlo simulations and a real-world application illustrate the proposed methodology.

Keywords: (Bio)markers; cutoff point; generalized receiver-operating characteristic (gROC) curve; receiver-operating characteristic (ROC) curve; Youden index

Acknowledgements

This work is financially supported by the Grants MTM2014-55966-P and MTM2017-89422-P Spanish Ministry of Economy, Industry and Competitiveness; State Research Agency; and FEDER funds. J.C. Pardo-Fernández also acknowledges funding from Banco Santander and Complutense University of Madrid (project PR26/16-5B-1). P-MC is also supported by the Grant FC-GRUPIN- IDI/2018/000132 of the Asturies Goverment. The authors thank too anonymous reviewers for their constructive comments and suggestions.

Appendix

In this Appendix we formalize the proofs of the anticipated asymptotic distributions of both the parametric binormal and the non-parametric (empirical) estimators of the Youden index and its associated cutoff points in the gROC curve context. Results for the parametric estimator (Theorem 1) are similar to those derived for the Youden index in the standard ROC curve context [23]. Theorem 2 deals with the empirical estimator. The structure of the proof is similar to the one used by Hsieh and Turnbull [24] in the standard ROC curve context.

Theorem 1

Let Y0 and Y1 denote the biomarker in the negative population and in the positive population, respectively. Assume that Yi is normally distributed with mean μi and standard deviation σi, for i ∈ {0,1}. Assume that two independent samples of m of i.i.d. observations from Y0 and n i.i.d. observations from Y1 are available. The sample sizes satisfy

A1.-n/m→nλ>0.

Then, we have the following weak convergences,

n⋅{JˆgP−Jg}⟶LnσJP⋅Z and,
n⋅{tˆYP−tY}⟶LnσtYP⋅Z,

where Z is a standard normal random variable and σJP and σtYP are given in eqs. (3) and (4), respectively.

Proof

Results in [14] allow to derive that the binormal estimation of the gROC curve are based on intervals of the form (−∞,a/(1−b2)−x]∪[a/(1−b2)+x,∞), where x is a non-negative real number, a=(μ1−μ0)/σ0 and b=σ1/σ0. Then, direct calculations lead to deduce that the interval leading to the the value of 1−specificity for Jg is achieved is of the form

(1−b2)−1⋅(a+b2(b2−1)log(b)+a2,a−b2(b2−1)log(b)+a2).

These points are the ones where the normal densities associated to the positive and negative populations cross each other. Therefore, the Youden index is

Jg=Φab2−xJ(1−b2)b(1−b2)−Φab2+xJ(1−b2)bˆ(1−b2)+Φa+xJ(1−b2)1−b2−Φa−xJ(1−b2)1−b2=Jg(μ1,σ1,μ0,σ0).

Let ξˆi and Sˆi be the sample mean and the sample standard deviation, which estimate μi and σi, respectively (i ∈ {0,1}). Then, we directly have that JˆgP=Jg(ξˆ1,Sˆ1,ξˆ0,Sˆ0). In addition, we have the following convergences in distribution:

n⋅(Sˆ1−σ1)L→nN(0,σ12/2),n⋅(Sˆ0−σ0)L→nN(0,σ02⋅λ/2),n⋅(ξˆ1−μ1)L→nN(0,σ12),n⋅(ξˆ0−μ0L→nN(0,σ02⋅λ).

Besides, the independence among the four random variables is also well-known [25]. Hence, the multivariate delta method ensures the weak convergence stated in (i).

In order to prove (ii), we just consider the value of 1−specificity for which the point associated to the Youden index is achieved,

tY=Φa−xJ(1−b2)1−b2+1−Φa+xJ(1−b2)1−b2=tY(μ1,μ0,σ1,σ0).

Arguing as above, and taking into account that tˆYP=tY(ξˆ1,Sˆ1,ξˆ0,Sˆ0), the result can be obtained again by applying the multivariate delta method. □

Theorem 2

Let Y0 and Y1 be the random variables modelling the biomarker behavior in the positive and negative and positive populations, respectively. Assume that the resulting gROC curve, Rg, satisfies

A2.-Rg has two continuous derivatives on some subinterval (a0,b0)⊂[0,1], such as tY∈(a0,b0), where tY=argmaxt∈[0,1]{Rg(t)−t}.

A3.-|rg′(tY)|=a>0, where rg′(t)=∂rg(t)/∂t and rg(t)=∂Rg(t)/∂t.

Assume that two independent samples of m of i.i.d. observations from Y0 and n i.i.d. observations from Y1 are available.

Then, if the sample sizes satisfy A1, the following weak convergences hold

n⋅{Jˆg−Jg}⟶LnσJ⋅Z and
r′(tY)24⋅(1+λ)1/3⋅n1/3⋅{tˆY−tY}⟶Lnargmaxz∈R{Z(n)(z)−z2},

where Z is a standard normal random variable, {Z(n)(z)}{−∞<z<∞} is a two-sided Brownian motion and σJ2=Rg(tY)[1−Rg(tY)]+λ⋅tY⋅[1−tY].

Proof

Let C[Rg] the subset containing all the pairs (uL,uU) such that there exists t ∈ [0,1] satisfying that Rg(t)=F(uL)+1−F(uU). Then,

Jˆg=max(uL,uU)∈C[Rˆg]{Fˆn(uL)−Fˆn(uU)+Gˆm(uU)−Gˆm(uL)}.

Let (xL,xU) be the pair of points that leads to Jg, that is, Jg=F(xL)−F(xU)+G(xU)−G(xL). Therefore,

(5){Jˆg−Jg}=max(uL,uU)∈C[Rˆg]{[Fˆn(uL)−Fˆn(uU)+Gˆm(uU)−Gˆm(uL)]−[F(xL)−F(xU)+G(xU)−G(xL)]}=Zˆn(xL,xU)+max(uL,uU)∈C[Rˆg]{Hˆn(uL,uU,xL,xU)−H(uL,uU,xL,xU)+H(uL,uU,xL,xU)},

where

Zˆn(u,v)=[Fˆn(u)−Fˆn(v)+Gˆm(v)−Gˆm(v)]−[F(u)−F(v)+G(v)−G(u)]Hˆn(v,w,x,z)=[Fˆn(v)−Fˆn(w)]−[Fˆn(x)−Fˆn(z)]+[Gˆm(w)−Gˆm(v)]−[Gˆm(z)−Gˆm(x)]H(v,w,x,z)=[F(v)−F(w)]−[F(x)−F(z)]+[G(w)−G(v)]−[G(z)−G(x)].

On the one hand, assumption A2 allows to apply the Hungarian embedding [26] to derive that the random variables Zˆn(xL,xU) and

(6)n−1/2⋅{B1(n)(F(xL))−B1(n)(F(xU))}+m−1/2⋅{B2(m)(G(xL))−B2(m)(G(xU))},

where {B1(n)(t)}{0≤t≤1} and {B2(m)(t)}{0≤t≤1} are two independent Brownian bridges, have the same asymptotic distribution. Taking into account basic properties of the Brownian bridge and A1, it can easily be shown that the random variable eq. (6) and

n−1/2⋅[Rg(tY)(1−Rg(tY))+λ⋅tY(1−tY)]1/2⋅Z,

where Z is a standard normal, also coincide in distribution.

On the other hand, from A2 and A3, and since rg(tY)=1, for t close to tY, Rg(t)−Rg(tY) can be approximated by t−tY. The Brownian bridge and the two-sided Brownian motion properties [27] guarantee that, for t close to tY and under A3, the random variable {Hˆn(uL,uU,xL,xU)−H(uL,uU,xL,xU)} has the same asymptotic distribution of

(7)n−1/2⋅Z1(n)(t−tY)−(n/λ)−1/2⋅Z2(m)(t−tY)=(1+λ)/n⋅Z(n)(t−tY),

where {Z1(n)(z)}{−∞<z<∞} and {Z2(m)(z)}{−∞<z<∞} are independent two-sided Brownian motions and {Z(n)(z)}{−∞<z<∞} is the weighted sum of those independent two-sided Brownian motions and, therefore, a two-sided Brownian motion as well.

Also, by assumptions A2 and A3, we have the approximation

(8)H(uL,uU,xL,xU)=(Rg(tY)−tY))−(Rg(t)−t))≈−(1/2)rg′(tY)(t−tY)2.

Therefore, from eqs. (7) and (8), we have that the random variable

max(uL,uU)∈C[Rˆg]{Hˆn(uL,uU,xL,xU)−H(uL,uU,xL,xU)+H(uL,uU,xL,xU)}

weakly converges to the distribution of the random variable

(9)maxt∈[0,1]{(1+λ)/n⋅Z(n)(t−tY)−(1/2)⋅rg′(tY)⋅(t−tY)2}=κ⋅n−2/3⋅maxz∈R{Z(n)(z)−z2}

where z=(t−tY)/γ with γ=(4⋅(1+λ)/rg′(tY)2)1/3⋅n−1/3, κ=(2(1+λ)2/rg′(tY))1/3 and {Z(n)(z)}{−∞<z<∞} is a two-sided Brownian motion. We obtain i) directly from the equality eq. (5) and the convergences in eqs. (6) and (9).

On the other hand, if tˆY is the point which maximizes (Rˆg(t)−t), then from eqs. (5) and (9), (tˆY−tY)/γ has the same asymptotic distribution that argmaxz∈R{Z(n)(z)−z2}. Result in ii) is derived from the equality

tˆY−tYγ=r′(tY)24⋅(1+λ)1/3⋅n1/3⋅{tˆY−tY}.

Remark. It is worth noting that, from eqs. (5) and (9), is easy to derive that

E[n⋅{Jˆg−Jg}]≈κ⋅n−1/6⋅E[maxz∈R{Z(n)(z)−z2}].

This bias, although asymptotically negligible, is relevant for small and moderate sample sizes.

References

[1] Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York: Wiley Blackwell, 2002.10.1002/9780470317082Suche in Google Scholar

[2] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36.10.1148/radiology.143.1.7063747Suche in Google Scholar

[3] Martínez-Camblor P, Carleos C, Corral N. General nonparametric ROC curve comparison. J Korean Stat Soc. 2013;42:71–81.10.1016/j.jkss.2012.05.002Suche in Google Scholar

[4] Martínez-Camblor P, Corral N, Rey C, Pascual J, Cernuda-Morollón E. Receiver operating characteristic curve generalization for non-monotone relationships. Stat Meth Med Res. 2017;26:113–23.10.1177/0962280214541095Suche in Google Scholar

[5] Martínez-Camblor P, Pardo-Fernández JC.. Parametric estimates for the receiver-operating characteristic curve generalization for non-monotone relationships. Stat Meth Med Res 2017 DOI: 10.1177/0962280217747009Suche in Google Scholar

[6] Youden WJ. Index for rating diagnostic tests. Cancer 1950;3:32–5.10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3Suche in Google Scholar

[7] Coffin M, Sukhatme S. Receiver operating characteristic studies and measurement errors. Biometrics. 1997;53:823–37.10.2307/2533545Suche in Google Scholar

[8] Liu X. Classification accuracy and cut points selections. Stat Med. 2012;31:2676–86.10.1002/sim.4509Suche in Google Scholar

[9] López-Ratón M, Cadarso-Suárez C, Molanes-López EM, Letón E. Confidence intervals for the symmetry point: an optimal cutpoint in continuous diagnostic tests. Pharma Stat. 2016;15:178–92.10.1002/pst.1734Suche in Google Scholar

[10] Martínez-Camblor P. Nonparametric cutoff point estimation for diagnostic decisions with weighted errors. Revista Colombiana de Estadóstica 2011;34:133–46.Suche in Google Scholar

[11] Khanafer N, Sicot N, Vanhems P, Dumitrescu O, Meyssonier V, Tristan A, Bes M, Lina G, Vandenesch F, Gillet Y, Etienne J. Severe leukopenia in staphylococcus aureus-necrotizing, community-acquired pneumonia: risk factors and impact on survival. BMC Infect Dis. 2013;159:1471–2334.10.1186/1471-2334-13-359Suche in Google Scholar

[12] Mossman D. Three-way ROCs. Med Decis Making 1999;19:78–89.10.1177/0272989X9901900110Suche in Google Scholar PubMed

[13] Nakas CT, Alonzo TA, Yiannoutsos CT. Accuracy and cut-off point selection in three-class classification problems using a generalization of the youden index. Stat Med. 29:2946–55.10.1002/sim.4044Suche in Google Scholar PubMed PubMed Central

[14] Martínez-Camblor P, Pérez-Fernández S, Díaz-Coto S. Improving the biomarker diagnostic capacity via functional transformations. J Appl Stat. 2018; In press.10.1080/02664763.2018.1554628Suche in Google Scholar

[15] Pérez-Fernández S, Martónez-Camblor P, Filzmoser P, Corral N. nsROC: An R package for non-standard ROC curve analysis. R J. 2018.10.32614/RJ-2018-043Suche in Google Scholar

[16] Martínez-Camblor P, de Uña Álvarez J. Studying the bandwidth in k-sample smooth tests. Comput Stat. 2013;28:875–92.10.1007/s00180-012-0333-1Suche in Google Scholar

[17] Spanos A, Harrell FE, Durack DT. Differential diagnosis of acute meningitis: an analysis of the predictive value of initial observations. J Am Med Assoc. 1989;262:2700–07.10.1001/jama.1989.03430190084036Suche in Google Scholar

[18] Zhou H, Qin G. Confidence intervals for the difference in paired Youden indices. Pharma Stat. 2013;12:17–27.10.1002/pst.1543Suche in Google Scholar PubMed

[19] Lai CY, Tian L, Schisterman EF. Exact confidence interval estimation for the Youden index and its corresponding optimal cut-point. Comput Stat Data Anal. 2012;56:1103–14.10.1016/j.csda.2010.11.023Suche in Google Scholar PubMed PubMed Central

[20] Zhou H, Qin G. New nonparametric confidence intervals for the Youden index. J Biopharma Stat. 2012;22:1244–57.10.1080/10543406.2011.592234Suche in Google Scholar PubMed

[21] Shan G. Improved confidence intervals for the Youden index. PLOS ONE 2015;10:1–19.10.1371/journal.pone.0127272Suche in Google Scholar PubMed PubMed Central

[22] Hall P, DiCiccio TJ, Romano JP. On smoothing and the bootstrap. Annal Stat. 1989;17:692–704.10.1214/aos/1176347135Suche in Google Scholar

[23] Schisterman EF, Perkins N. Confidence intervals for the Youden index and corresponding optimal cut-point. Commun Stat - Simul Comput. 2007;36:549–63.10.1080/03610910701212181Suche in Google Scholar

[24] Hsieh F, Turnbull BW. Nonparametric methods for evaluating diagnostic tests. Stat Sin. 1996;6:47–62.Suche in Google Scholar

[25] DasGupta A. Asymptotic theory of statistics and probability. New York: Springer, 2008.Suche in Google Scholar

[26] van der Vaart AW. Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998.Suche in Google Scholar

[27] Borodin AN, Palminen P. Handbook of Brownian motion–facts and formulae. Probability and its Applications. Birkhäuser Verlag, Basel, second edition, 2002. MR-1912205.10.1007/978-3-0348-8163-0Suche in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (DOI:https://doi.org/10.1515/ijb-2018-0060).

Received: 2018-06-20

Revised: 2019-03-13

Accepted: 2019-03-13

Published Online: 2019-04-03

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Supplementary Material Details

Artikel in diesem Heft

https://doi.org/10.1515/ijb-2018-0060

Schlagwörter für diesen Artikel

(Bio)markers; cutoff point; generalized receiver-operating characteristic (gROC) curve; receiver-operating characteristic (ROC) curve; Youden index

The Youden Index in the Generalized Receiver Operating Characteristic Curve Context

Artikel

Abstract

Acknowledgements

Appendix

Theorem 1

Proof

Theorem 2

Proof

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft