Abstract
Common statistical approaches are not designed to deal with so-called “short fat data” in biomarker pilot studies, where the number of biomarker candidates exceeds the sample size by magnitudes. High-throughput technologies for omics data enable the measurement of ten thousands and more biomarker candidates for specific diseases or states of a disease. Due to the limited availability of study participants, ethical reasons and high costs for sample processing and analysis researchers often prefer to start with a small sample size pilot study in order to judge the potential of finding biomarkers that enable – usually in combination – a sufficiently reliable classification of the disease state under consideration. We developed a user-friendly tool, called HiPerMAb that allows to evaluate pilot studies based on performance measures like multiclass AUC, entropy, area above the cost curve, hypervolume under manifold, and misclassification rate using Monte-Carlo simulations to compute the p-values and confidence intervals. The number of “good” biomarker candidates is compared to the expected number of “good” biomarker candidates in a data set with no association to the considered disease states. This allows judging the potential in the pilot study even if statistical tests with correction for multiple testing fail to provide any hint of significance.
Funding source: LEGaTO Project (legato-project.eu)
Award Identifier / Grant number: 780681
Funding source: Lower Saxony Ministry of Science and Culture within the programme Big Data in Modern Life Science, project i.Vacc.
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This work was partly funded by European Union Horizon 2020 research and innovation programme under the LEGaTO Project (legato-project.eu), grant agreement No 780681, and the Lower Saxony Ministry of Science and Culture within the programme Big Data in Modern Life Science, project i.Vacc.
-
Institutional Review Board Statement: The study protocol was approved by the local ethics committee (Bayerische Landesärtzekammer, Munich, Germany).
-
Informed Consent Statement: Written consent was obtained from all participants.
-
Conflict of interest statement: The authors declare that they have no competing interests.
References
1. Omar, M, Klawonn, F, Brand, S, Stiesch, M, Krettek, C, Eberhard, J. Transcriptome wide high-density microarray analysis reveals differential gene transcription in periprosthetic tissue from hips with low-grade infection versus aseptic loosening. J Arthroplasty 2017;32:234–40. https://doi.org/10.1016/j.arth.2016.06.036.Search in Google Scholar PubMed
2. Biomarkers Definition Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Therapeut 2001;69:89–95.10.1067/mcp.2001.113989Search in Google Scholar PubMed
3. WHO. International programme on chemical safety biomarkers in risk assessment: validity and validation; 2001. Available from: https://inchem.org/documents/ehc/ehc/ehc222.htm [Accessed 14 May 2022].Search in Google Scholar
4. Di Liello, R, Piccirillo, MC, Arenare, L, Gargiulo, P, Schettino, C, Gravina, A, et al.. Master protocols for precision medicine in oncology: overcoming methodology of randomized clinical trials. Life 2021;11:1253. https://doi.org/10.3390/life11111253.Search in Google Scholar PubMed PubMed Central
5. Pepperkok, R, Ellenberg, J. High-throughput fluorescence microscopy for systems biology. Nature Reviews. Molecular Cell Biology 2006;7:690–6. https://doi.org/10.1038/nrm1979.Search in Google Scholar PubMed
6. Soon, Wendy Weijia, Hariharan, Manoj, Snyder, Michael P. High-throughput sequencing for biology and medicine. Molecular Systems Biology 2013;9:640. https://doi.org/10.1038/msb.2012.61.Search in Google Scholar PubMed PubMed Central
7. Wan, A.-J., Wang, K, Zhang, H.-C., Li, H, Wang, D.-N. Modercarbohydrate microarray biochip technologies. . Chinese Journal of Analytical Chemistry 2012;40:1780–8.10.1016/S1872-2040(11)60584-7Search in Google Scholar
8. Al-Mekhlafi, A, Becker, T, Klawonn, F. Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes. Commun Stat Theor Methods 2020;51:5534–48. https://doi.org/10.1080/03610926.2020.1843053.Search in Google Scholar
9. Aasthaa, B, Pepe, MS. When does combining markers improve classification performance and what are implications for practice? Stat Med 2013;32:1877–92. https://doi.org/10.1002/sim.5736.Search in Google Scholar PubMed PubMed Central
10. Dudoit, S, Shaffer, JP, Boldrick, JC. Multiple hypothesis testing in microarray experiments. Stat Sci 2003;18:71–103. https://doi.org/10.1214/ss/1056397487.Search in Google Scholar
11. J, GJ, Aldo, S. Multiple hypothesis testing in genomics. Stat Med 2014;33:1946–78. https://doi.org/10.1002/sim.6082.Search in Google Scholar PubMed
12. Genovese, CR, Lazar, NA, Nichols, T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 2002;15:870–8. https://doi.org/10.1006/nimg.2001.1037.Search in Google Scholar PubMed
13. Choi, H, Nesvizhskii, AI. False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res 2007;7:47–50. https://doi.org/10.1021/pr700747q.Search in Google Scholar PubMed
14. Keselman, H, Cribbie, R, Holland, B. Controlling the rate of type I error over a large set of statistical tests. Br J Math Stat Psychol 2002;55:27–39. https://doi.org/10.1348/000711002159680.Search in Google Scholar PubMed
15. Shaffer, JP. Multiple hypothesis testing. Annu Rev Psychol 1995;46:561–84. https://doi.org/10.1146/annurev.ps.46.020195.003021.Search in Google Scholar
16. Bajgrowicz, P, Scaillet, O. Technical trading revisited: false discoveries, persistence tests, and transaction costs. J Financ Econ 2012;106:473–91. https://doi.org/10.1016/j.jfineco.2012.06.001.Search in Google Scholar
17. Benjamini, Y, Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B 1995;57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.Search in Google Scholar
18. Benjamini, Y, Hochberg, Y. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 2000;25:60–83. https://doi.org/10.2307/1165312.Search in Google Scholar
19. Storey, JD, Tibshirani, R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003;100:9440–5. https://doi.org/10.1073/pnas.1530509100.Search in Google Scholar PubMed PubMed Central
20. Ignatiadis, N, Klaus, B, Zaugg, JB, Huber, W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat Methods 2016;13:577–80. https://doi.org/10.1038/nmeth.3885.Search in Google Scholar PubMed PubMed Central
21. Lei, L, Fithian, W. AdaPT: an interactive procedure for multiple testing with side information. J Roy Stat Soc B 2018;80:649–79. https://doi.org/10.1111/rssb.12274.Search in Google Scholar
22. Efron, B. Microarrays, empirical bayes and the two-groups model. Stat Sci 2008;23:1–22. https://doi.org/10.1214/07-sts236.Search in Google Scholar
23. Korthauer, K, Kimes, PK, Duvallet, C, Reyes, A, Subramanian, A, Teng, M, et al.. A practical guide to methods controlling false discoveries in computational biology. Genome Biol 2019;20:118. https://doi.org/10.1186/s13059-019-1716-1.Search in Google Scholar PubMed PubMed Central
24. Klawonn, F, Wang, J, Koch, I, Eberhard, J, Omar, M. HAUCA curves for the evaluation of biomarker pilot studies with small sample sizes and large numbers of features. In: Advances in intelligent data analysis; 2016, vol XV:356–67 pp.10.1007/978-3-319-46349-0_31Search in Google Scholar
25. Mason, SJ, Graham, NE. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation. Q J R Meteorol Soc 2002;128:2145–66. https://doi.org/10.1256/003590002320603584.Search in Google Scholar
26. Szafranski, SP, Wos-Oxley, ML, Vilchez-Vargas, R, Jáuregui, R, Plumeier, I, Klawonn, F, et al.. High-resolution taxonomic profiling of the subgingival microbiome for biomarker discovery and periodontitis diagnosis. Appl Environ Microbiol 2015;81:1047–58. https://doi.org/10.1128/aem.03534-14.Search in Google Scholar PubMed PubMed Central
27. Hand, DJ, Till, RJ. A simple generalisation of the area under theROC curve for multiple class classification problems. Mach Learn 2001;45:171–86. https://doi.org/10.1023/a:1010920819831.10.1023/A:1010920819831Search in Google Scholar
28. Fayyad, UM, Irani, KB. Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the international joint conference on uncertainty in AI; 1993:1022–7 pp.Search in Google Scholar
29. Novoselova, N, Wang, J, Pessler, F, Klawonn, F. Feature selection and classification with the embedded validation procedures for biomedical data analysis. Package ‘biocomb’; 2018. Available from: https://cran.r-project.org/web/packages/Biocomb/Biocomb.pdf [Accessed 14 May 2022].Search in Google Scholar
30. Montvida, O, Klawonn, F. Relative cost curves: an alternative to AUC and an extension to 3-class problems. Kybernetika 2014;50:647–60. https://doi.org/10.14736/kyb-2014-5-0647.Search in Google Scholar
31. Klawonn, F, Höppner, F, May, S. An alternative to ROC and AUC analysis of classifiers. In: Gama, J, Bradley, E, Hollm′en, J, editors. Advances in intelligent data analysis X. Berlin: Springer; 2011:210–21 pp.10.1007/978-3-642-24800-9_21Search in Google Scholar
32. Novoselova, N, Beffa, CD, Wang, J, Li, J, Pessler, F, Klawonn, F. HUM calculator and HUM package for R: easy-to-use software tools for multicategory receiver operating characteristic analysis. Bioinformatics 2014;30:1635–6. https://doi.org/10.1093/bioinformatics/btu086.Search in Google Scholar PubMed
33. Robin, X, Turck, N, Hainard, A, Tiberti, N, Lisacek, F, Sanchez, JC, et al.. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:77. https://doi.org/10.1186/1471-2105-12-77.Search in Google Scholar PubMed PubMed Central
34. Kim, H. Package ‘discretization’; 2015. Available from https://cran.r-project.org/web/packages/discretization/discretization.pdf [Accessed 14 May 2022].Search in Google Scholar
35. Dowle, M, Srinivasan, A. Data.table: extension of `data.frame`. R package version 1.14.0; 2021. Available from: https://CRAN.R-project.org/package=data.table [Accessed 14 May 2022].Search in Google Scholar
36. Harrell, FEJr. Package Hmisc; 2020. Available from: https://cran.r-project.org/web/packages/Hmisc/Hmisc.pdf [Accessed 14 May 2022].Search in Google Scholar
37. Holm, S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6:65–70.Search in Google Scholar
38. Sievert, C. Interactive web-based data visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida; 2020. Available from: https://plotly-r.com [Accessed 14 May 2022].10.1201/9780429447273Search in Google Scholar
39. Soetaert, K. plot3D: plotting multi-dimensional data. R package version 1.3; 2019. Available from: https://CRAN.R-project.org/package=plot3D [Accessed 14 May 2022].Search in Google Scholar
40. Xie, Y, Cheng, J, Tan, X. DT: a wrapper of the javaScript library ‘DataTables’. R package version 0.17; 2021. Available from: https://CRAN.R-project.org/package=DT [Accessed 14 May 2022].Search in Google Scholar
41. Hand, DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 2009;77:103–23. https://doi.org/10.1007/s10994-009-5119-5.Search in Google Scholar
42. Movahedi, F, Padman, R, Antaki, JF. Limitations of receiver operating characteristic curve on imbalanced data: assist device mortality risk scores. J Thorac Cardiovasc Surg 2021;S0022–5223:01140–5. https://doi.org/10.1016/j.jtcvs.2021.07.041.Search in Google Scholar PubMed PubMed Central
43. Mazurowski, MA, Habas, PA, Zurada, JM, Lo, JY, Baker, JA, Tourassi, GD. Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Network 2008;21:427–36. https://doi.org/10.1016/j.neunet.2007.12.031.Search in Google Scholar PubMed PubMed Central
44. Gao, T, Hao, Y, Zhang, H, Hu, L, Li, H, Li, H, et al.. Predicting pathological response to neoadjuvant chemotherapy in breast cancer patients based on imbalanced clinical data. Personal Ubiquitous Comput 2018;22:1039–47. https://doi.org/10.1007/s00779-018-1144-3.Search in Google Scholar
45. Zhang, L, Yang, H, Jiang, Z. Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN. Biomed Eng Online 2018;17:181. https://doi.org/10.1186/s12938-018-0604-3.Search in Google Scholar PubMed PubMed Central
46. Fotouhi, S, Asadi, S, Kattan, MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inf 2019;90:103089. https://doi.org/10.1016/j.jbi.2018.12.003.Search in Google Scholar PubMed
47. Carrington, AM, Fieguth, PW, Qazi, H, Holzinger, A, Chen, HH, Mayr, F, et al.. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inf Decis Making 2020;20:4. https://doi.org/10.1186/s12911-019-1014-6.Search in Google Scholar PubMed PubMed Central
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting
Articles in the same Issue
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting