A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection
-
Matthias Schmid
, Torsten Hothorn , Friedemann Krause and Christina Rabe
Abstract
The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Large-scale Parentage Inference with SNPs: an Efficient Algorithm for Statistical Confidence of Parent Pair Allocations
- ExactDAS: An Exact Test Procedure for the Detection of Differential Alternative Splicing in Microarray Experiments
- Incorporating Genomic Annotation into a Hidden Markov Model for DNA Methylation Tiling Array Data
- Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
- Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data
- Analyzing Genetic Association Studies with an Extended Propensity Score Approach
- Genotype Copy Number Variations using Gaussian Mixture Models: Theory and Algorithms
- Estimators of the local false discovery rate designed for small numbers of tests
- A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection
- Comparison of Targeted Maximum Likelihood and Shrinkage Estimators of Parameters in Gene Networks
- DNA Pooling and Statistical Tests for the Detection of Single Nucleotide Polymorphisms
Articles in the same Issue
- Article
- Large-scale Parentage Inference with SNPs: an Efficient Algorithm for Statistical Confidence of Parent Pair Allocations
- ExactDAS: An Exact Test Procedure for the Detection of Differential Alternative Splicing in Microarray Experiments
- Incorporating Genomic Annotation into a Hidden Markov Model for DNA Methylation Tiling Array Data
- Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
- Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data
- Analyzing Genetic Association Studies with an Extended Propensity Score Approach
- Genotype Copy Number Variations using Gaussian Mixture Models: Theory and Algorithms
- Estimators of the local false discovery rate designed for small numbers of tests
- A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection
- Comparison of Targeted Maximum Likelihood and Shrinkage Estimators of Parameters in Gene Networks
- DNA Pooling and Statistical Tests for the Detection of Single Nucleotide Polymorphisms