Startseite Inference on overlap index: with an application to cancer data
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Inference on overlap index: with an application to cancer data

  • Raju Dey , Arne C. Bathke und Somesh Kumar EMAIL logo
Veröffentlicht/Copyright: 5. September 2025
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.


Corresponding author: Somesh Kumar, Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur 721302, India, E-mail: 

Award Identifier / Grant number: 20204-WISS/225/197-2019, 20102-F1901166-KZP

Award Identifier / Grant number: 09/0081(11244)/2021-EMR-I

Acknowledgments

All three authors thank an associate editor and two anonymous reviewers for their valuable comments and constructive suggestions that have significantly enhanced the quality and the content of the manuscript. The first author gratefully acknowledges the financial assistance provided by the ``Council of Scientific and Industrial Research (CSIR), Government of India’’ under grant No. 09/0081(11244)/2021-EMR-I. The second author gratefully acknowledges the generous support through the State of Salzburg WISS 2025 project “IDA-Lab Salzburg” (20204-WISS/225/197–2019 and 20102-F1901166-KZP).

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: Council of Scientific and Industrial Research (CSIR), Government of India. State of Salzburg WISS 2025 project “IDA-Lab Salzburg” (20204-WISS/225/197–2019 and 20102-F1901166-KZP).

  7. Data availability: In this article, a real data set on breast cancer which consists of 2,509 breast cancer patients is considered to illustrate the applicability and importance of estimating the overlap and containment indices. Two sub-parts of this data set is taken for studying the survival times and relapse free times of breast cancer patients who have taken either of the two surgeries: Breast conserving surgery (with radio therapy) or Mastectomy (with radio therapy). One can access these data sets through the hyperlink: https://github.com/RajuProbStat/Overlap-Index-Data.

Appendix A

Proof. In order to prove that K 2(η) = 0 has only one root in (0, ), it is enough to show

(A.1) 2 1 η η log e ( 2 ) η + 1 2 η 1 2 η η + 1 2 = 0 ,

has only one root in (0, ). Since the above equation reduces to ϕ 2 ( η ) = 0 and this has only one root in (0, ) as already proven in Lemma 2.4, η* is the only solution of K 2(η) = 0 in (0, ). In a similar way, K 1(η) = 0 reduces to the equation ϕ 1 ( η ) = 0 , and hence 1 η * is the only root of K 1(η) = 0 in (0, ).

We write K (θ 1, θ 2) as a function of η in the following way

(A.2) K ( η ) = 256 η 2 λ 1 ( 1 λ 1 ) η 1 2 1 η 2 η 1 η ( 1 2 η ) 2 1 η log e ( 2 ) ( η + 1 ) 2 + ( 1 η ) 1 2 1 η ( 1 2 η ) ( η + 1 ) 3 2 .

For 0 < η < 1, ( 1 η ) ( 1 2 1 η ) ( 1 2 η ) > 0 and ( η ( 1 2 1 η ) 2 η 1 η ( 1 2 η ) 2 1 η ) > 0 , because 2 x 1 x is increasing in (0, ) and x < 1 x for 0 < x < 1. Again, for η > 1, ( 1 η ) ( 1 2 1 η ) ( 1 2 η ) < 0 and ( η ( 1 2 1 η ) 2 η 1 η ( 1 2 η ) 2 1 η ) < 0 , because 2 x 1 x is increasing in (0, ) and x > 1 x for x > 1. Therefore, K(η) > 0 for η ∈ (0, 1) ∪ (1, ). Again K (1) = 0 and hence, K(η) = 0 has only one root η = 1 in (0, ). □

References

1. Fluss, R, Faraggi, D, Reiser, B. Estimation of the Youden index and its associated cutoff point. Biom J 2005;47:458–72. https://doi.org/10.1002/bimj.200410135.Suche in Google Scholar PubMed

2. Bianco, AM, Boente, G, Gonzalez-Manteiga, W. Robust consistent estimators for ROC curves with covariates. Electronic J Statistics 2022;16:4133–61. https://doi.org/10.1214/22-ejs2042.Suche in Google Scholar

3. Weitzman, MS. Measures of overlap of income distributions of White and Negro families in the United States Volume 3. Washington, DC: US Bureau of the Census; 1970.Suche in Google Scholar

4. Samawi, HM, Yin, J, Rochani, H, Panchal, V. Notes on the overlap measure as an alternative to the Youden index: how are they related? Stat Med 2017;36:4230–40. https://doi.org/10.1002/sim.7435.Suche in Google Scholar PubMed

5. Hutchinson, GE. Concluding remarks. In: Cold Spring Harbor Symp Quant Biol. New York: Cold Spring Harbor Laboratory Press; 1957, 22:415–27 pp.10.1101/SQB.1957.022.01.039Suche in Google Scholar

6. Parkinson, JH, Kutil, R, Kuppler, J, Junker, RR, Trutschnig, W, Bathke, AC. A fast and robust way to estimate overlap of niches, and draw inference. Int J Biostat 2018;14:1–23. https://doi.org/10.1515/ijb-2017-0028.Suche in Google Scholar PubMed

7. Parkinson-Schwarz, JH, Bathke, AC. Testing for equality of distributions using the concept of (niche) overlap. Stat Pap 2022;63:225–42. https://doi.org/10.1007/s00362-021-01239-y.Suche in Google Scholar PubMed PubMed Central

8. Blonder, B. Hypervolume concepts in niche- and trait-based ecology. Ecography 2018;41:1441–55. https://doi.org/10.1111/ecog.03187.Suche in Google Scholar

9. Blonder, B, Morrow, CB, Maitner, B, Harris, DJ, Lamanna, C, Violle, C, et al.. New approaches for delineating n-dimensional hypervolumes. Methods Ecol Evol 2018;9:305–19. https://doi.org/10.1111/2041-210x.12865.Suche in Google Scholar

10. Lu, M, Winner, K, Jetz, W. A unifying framework for quantifying and comparing n-dimensional hypervolumes. Methods Ecol Evol 2021;12:1953–68. https://doi.org/10.1111/2041-210x.13665.Suche in Google Scholar

11. Swanson, HK, Lysy, M, Power, M, Stasko, AD, Johnson, JD, Reist, JD. A new probabilistic method for quantifying n-dimensional ecological niches and niche overlap. Ecology 2015;96:318–24. https://doi.org/10.1890/14-0235.1.Suche in Google Scholar PubMed

12. Mizuno, S, Yamaguchi, T, Fukushima, A, Matsuyama, Y, Ohashi, Y. Overlap coefficient for assessing the similarity of pharmacokinetic data between ethnically different populations. Clin Trials 2005;2:174–81. https://doi.org/10.1191/1740774505cn077oa.Suche in Google Scholar PubMed

13. Sneath, PHA. A method for testing the distinctness of clusters: a test of the disjunction of two clusters in Euclidean space as measured by their overlap. J Int Assoc Math Geol 1977;9:123–43. https://doi.org/10.1007/bf02312508.Suche in Google Scholar

14. Mulekar, MS, Mishra, SN. Confidence interval estimation of overlap: equal means case. Comput Stat Data Anal 2000;34:121–37. https://doi.org/10.1016/s0167-9473(99)00096-1.Suche in Google Scholar

15. Al-Saleh, MF, Samawi, HM. Inference on overlapping coefficients in two exponential populations. J Mod Appl Stat Methods 2007;6:503–16. https://doi.org/10.22237/jmasm/1193890440.Suche in Google Scholar

16. Bayoud, HA, Kittaneh, OA. Testing the equality of two exponential distributions. Commun Stat Simulat Comput 2016;45:2249–56. https://doi.org/10.1080/03610918.2014.895837.Suche in Google Scholar

17. Schmid, F, Schmidt, A. Nonparametric estimation of the coefficient of overlapping- theory and empirical application. Comput Stat Data Anal 2006;50:1583–96. https://doi.org/10.1016/j.csda.2005.01.014.Suche in Google Scholar

18. Helu, A, Samawi, H, Vogel, R. Nonparametric overlap coefficient estimation using ranked set sampling. J Nonparametric Statistics 2011;23:385–97. https://doi.org/10.1080/10485252.2010.533769.Suche in Google Scholar

19. Anderson, G, Linton, O, Whang, Y -J. Nonparametric estimation and inference about the overlap of two distributions. J Econom 2012;171:1–23. https://doi.org/10.1016/j.jeconom.2012.05.001.Suche in Google Scholar

20. Wang, D, Tian, L. Parametric methods for confidence interval estimation of overlap coefficients. Comput Stat Data Anal 2017;106:12–26. https://doi.org/10.1016/j.csda.2016.08.013.Suche in Google Scholar

21. Franco-Pereira, AM, Nakas, CT, Reiser, B, Carmen Pardo, M. Inference on the overlap coefficient: the binormal approach and alternatives. Stat Methods Med Res 2021;30:2672–84. https://doi.org/10.1177/09622802211046386.Suche in Google Scholar PubMed

22. Inacio, V, Garrido Guillen, JE. Bayesian nonparametric inference for the overlap coefficient: with an application to disease diagnosis. Stat Med 2022;41:3879–98. https://doi.org/10.1002/sim.9480.Suche in Google Scholar PubMed PubMed Central

23. Gastwirth, JL. Statistical measures of earning differentials. Am Statistician 1975;29:32–5. https://doi.org/10.1080/00031305.1975.10479109.Suche in Google Scholar

24. Junker, RR, Kuppler, J, Bathke, AC, Schreyer, ML, Trutschnig, W. Dynamic range boxes–a robust nonparametric approach to quantify size and overlap of n-dimensional hypervolumns. Methods Ecol Evol 2016;7:1503–13. https://doi.org/10.1111/2041-210x.12611.Suche in Google Scholar

25. Langthaler, PB, Gladow, K-P, KrÜger, O, Beck, J. A novel method for non-parametric statistical inference for niche overlap in multiple species. Biom J 2024;66:e202400013. https://doi.org/10.1002/bimj.202400013.Suche in Google Scholar PubMed

26. Chen, D, Laini, A, Blonder, BW. Statistical inference methods for n-dimensional hypervolumes: applications to niches and functional diversity. Methods Ecol Evol 2024;15:657–65. https://doi.org/10.1111/2041-210x.14310.Suche in Google Scholar

27. Mann, NR, Schafer, RE, Singpurwalla, ND. Methods for statistical analysis of reliability and life data. New York: John Wiley and Sons; 1974.Suche in Google Scholar

28. Self, SG, Liang, KY. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 1987;82:605–10. https://doi.org/10.1080/01621459.1987.10478472.Suche in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/ijb-2024-0106).


Received: 2024-11-23
Accepted: 2025-07-23
Published Online: 2025-09-05

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 29.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2024-0106/html
Button zum nach oben scrollen