Abstract
In the context of finite mixture models one considers the problem of classifying as many observations as possible in the classes of interest while controlling the classification error rate in these same classes. Similar to what is done in the framework of statistical test theory, different type I and type II-like classification error rates can be defined, along with their associated optimal rules, where optimality is defined as minimizing type II error rate while controlling type I error rate at some nominal level. It is first shown that finding an optimal classification rule boils down to searching an optimal region in the observation space where to apply the classical Maximum A Posteriori (MAP) rule. Depending on the misclassification rate to be controlled, the shape of the optimal region is provided, along with a heuristic to compute the optimal classification rule in practice. In particular, a multiclass FDR-like optimal rule is defined and compared to the thresholded MAP rules that is used in most applications. It is shown on both simulated and real datasets that the FDR-like optimal rule may be significantly less conservative than the thresholded MAP rule.
Funding source: Agence Nationale de la Recherche
Award Identifier / Grant number: ANR-16-CE40-0019
Award Identifier / Grant number: ANR-17-EUR-0007
Award Identifier / Grant number: ANR-19-CHIA-0021-01
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: G. Blanchard acknowledges support from Agence Nationale de la Recherche (ANR) via the project ANR-19-CHIA-0021-01 (BiSCottE), and the project ANR-16-CE40-0019 (SansSouci); and from the Franco-German University through the binational Doktorandenkolleg CDFA 01-18. GQE and IPS2 benefit from the support of the LabEx Saclay Plant Sciences-SPS (ANR-17-EUR-0007).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. McLachlan, GJ, Peel, D. Finite mixture models. New York: Wiley; 2000.10.1002/0471721182Suche in Google Scholar
2. Bérard, C, Martin-Magniette, M-L, Brunaud, V, Aubourg, S, Robin, S. Unsupervised classification for tiling arrays: chip-chip and transcriptome. Stat Appl Genet Mol Biol 2011;10. https://doi.org/10.2202/1544-6115.1692.Suche in Google Scholar PubMed
3. Friedman, J, Hastie, T, Tibshirani, R. The elements of statistical learning: data mining, inference, and prediction. New York: Springer Series in Statistics; 2009.10.1007/978-0-387-84858-7Suche in Google Scholar
4. Chow, C. On optimum recognition error and reject tradeoff. IEEE Trans Inf Theor 1970;16:41–6. https://doi.org/10.1109/tit.1970.1054406.Suche in Google Scholar
5. Herbei, R, Wegkamp, MH. Classification with reject option. Can J Stat 2006;34:709–21. https://doi.org/10.1002/cjs.5550340410.Suche in Google Scholar
6. Pillai, I, Fumera, G, Roli, F. Multi-label classification with a reject option. Pattern Recogn 2013;46:2256–66. https://doi.org/10.1016/j.patcog.2013.01.035.Suche in Google Scholar
7. Bartlett, P, Wegkamp, M. Classification with a reject option using a hinge loss. J Mach Learn Res 2008;9:1823–40.Suche in Google Scholar
8. Grandvalet, Y, Rakotomamonjy, A, Keshet, J, Canu, S. Support vector machines with a reject option. In: Bengio, Y, editor. Advances in neural information processing systems. Cambridge, MA: MIT press; 2009, vol 21:537–44 pp.Suche in Google Scholar
9. Wegkamp, M, Yuan, M. Support vector machines with a reject option. Bernoulli 2011;17:1368–85. https://doi.org/10.3150/10-bej320.Suche in Google Scholar
10. Zhang, C, Chaudhuri, K. Beyond disagreement-based agnostic active learning. In: Welling, M, editor. Advances in neural information processing systems. Cambridge, MA: MIT Press; 2014, vol 27:442–50 pp.Suche in Google Scholar
11. Schreuder, N, Chzhen, E. Classification with abstention but without disparities. 2021; arXiv preprint arXiv:2102.12258.Suche in Google Scholar
12. Tseng, GC, Wong, WH. Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 2005;61:10–6. https://doi.org/10.1111/j.0006-341x.2005.031032.x.Suche in Google Scholar PubMed
13. Karmakar, B, Das, S, Bhattacharya, S, Sarkar, R, Mukhopadhyay, I. Tight clustering for large datasets with an application to gene expression data. Sci Rep 2019;9:3053. https://doi.org/10.1038/s41598-019-39459-w.Suche in Google Scholar PubMed PubMed Central
14. Efron, B, Tibshirani, R. Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 2002;23:70–86. https://doi.org/10.1002/gepi.1124.Suche in Google Scholar PubMed
15. Scott, C, Nowak, R. A Neyman-Pearson approach to statistical learning. IEEE Trans Inf Theor 2005;51:3806–19. https://doi.org/10.1109/tit.2005.856955.Suche in Google Scholar
16. Scott, C, Bellala, G, Willett, R. The false discovery rate for statistical pattern recognition. Electron J Stat 2009;3:651–77. https://doi.org/10.1214/09-ejs363.Suche in Google Scholar
17. Tong, X, Feng, Y, Zhao, A. A survey on neyman-pearson classification and suggestions for future research. Wiley Interdiscip Rev Comput Stat 2016;8:64–81. https://doi.org/10.1002/wics.1376.Suche in Google Scholar
18. El-Yaniv, R, Wiener, Y. On the foundations of noise-free selective classification. J Mach Learn Res 2010;11:1605–41.Suche in Google Scholar
19. Wiener, Y, El-Yaniv, R. Agnostic pointwise-competitive selective classification. J Artif Intell Res 2015;52:171–201. https://doi.org/10.1613/jair.4439.Suche in Google Scholar
20. Denis, C, Hebiri, M. Consistency of plug-in confidence sets for classification in semi-supervised learning. J Nonparametric Statistics 2020;32:42–72. https://doi.org/10.1080/10485252.2019.1689241.Suche in Google Scholar
21. Lei, J. Classification with confidence. Biometrika 2014;101:755–69. https://doi.org/10.1093/biomet/asu038.Suche in Google Scholar
22. Neyman, J, Pearson, ES. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond - Ser A Contain Pap a Math or Phys 1933;231:289–337. https://doi.org/10.1098/rsta.1933.0009.Suche in Google Scholar
23. Scrucca, L, Fop, M, Murphy, T, Raftery, A. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 2016;8:289–317. https://doi.org/10.32614/rj-2016-021.Suche in Google Scholar
24. Tao, Q, Wu, G-W, Wang, F-Y, Wang, J. Posterior probability support vector machines for unbalanced data. IEEE Trans Neural Network 2005;16:1561–73. https://doi.org/10.1109/tnn.2005.857955.Suche in Google Scholar PubMed
25. Grandvalet, Y, Mariéthoz, J, Bengio, S. A probabilistic interpretation of SVMs with an application to unbalanced classification. In: Larochelle, H, Ranzato, M, Hadsell, R, Balcan, MF, Lin, H, editors. Advances in neural information processing systems. Cambridge, MA: MIT Press; 2006:467–74 pp.Suche in Google Scholar
26. Matias, C, Robin, S. Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proc. 2014;47:55–74. https://doi.org/10.1051/proc/201447004.Suche in Google Scholar
27. Sun, W, Cai, TT. Large-scale multiple testing under dependence. J Roy Stat Soc B 2009;71:393–424. https://doi.org/10.1111/j.1467-9868.2008.00694.x.Suche in Google Scholar
Supplementary material
The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0105).
© 2021 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Doubly robust adaptive LASSO for effect modifier discovery
- Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
- Review
- Review and comparison of treatment effect estimators using propensity and prognostic scores
- Research Articles
- Error rate control for classification rules in multiclass mixture models
- Regression trees and ensembles for cumulative incidence functions
- Causal inference under over-simplified longitudinal causal models
- Causal inference under interference with prognostic scores for dynamic group therapy studies
- Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers
- A Bayesian semiparametric accelerate failure time mixture cure model
- Quantifying the extent of visit irregularity in longitudinal data
- An improved method for analysis of interrupted time series (ITS) data: accounting for patient heterogeneity using weighted analysis
- A robust hazard ratio for general modeling of survival-times
- Penalized likelihood estimation of the proportional hazards model for survival data with interval censoring
- A parametric approach to relaxing the independence assumption in relative survival analysis
- The number of response categories in ordered response models
- A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome
- Spike detection for calcium activity
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Doubly robust adaptive LASSO for effect modifier discovery
- Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
- Review
- Review and comparison of treatment effect estimators using propensity and prognostic scores
- Research Articles
- Error rate control for classification rules in multiclass mixture models
- Regression trees and ensembles for cumulative incidence functions
- Causal inference under over-simplified longitudinal causal models
- Causal inference under interference with prognostic scores for dynamic group therapy studies
- Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers
- A Bayesian semiparametric accelerate failure time mixture cure model
- Quantifying the extent of visit irregularity in longitudinal data
- An improved method for analysis of interrupted time series (ITS) data: accounting for patient heterogeneity using weighted analysis
- A robust hazard ratio for general modeling of survival-times
- Penalized likelihood estimation of the proportional hazards model for survival data with interval censoring
- A parametric approach to relaxing the independence assumption in relative survival analysis
- The number of response categories in ordered response models
- A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome
- Spike detection for calcium activity