A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry
-
Dirk Valkenborg
We present an approach to construct a classification rule based on the mass spectrometry data provided by the organizers of the "Classification Competition on Clinical Mass Spectrometry Proteomic Diagnosis Data." Before constructing a classification rule, we attempted to pre-process the data and to select features of the spectra that were likely due to true biological signals (i.e., peptides/proteins). As a result, we selected a set of 92 features. To construct the classification rule, we considered eight methods for selecting a subset of the features, combined with seven classification methods. The performance of the resulting 56 combinations was evaluated by using a cross-validation procedure with 1000 re-sampled data sets. The best result, as indicated by the lowest overall misclassification rate, was obtained by using the whole set of 92 features as the input for a support-vector machine (SVM) with a linear kernel. This method was therefore used to construct the classification rule. For the training data set, the total error rate for the classification rule, as estimated by using leave-one-out cross-validation, was equal to 0.16, with the sensitivity and specificity equal to 0.87 and 0.82, respectively.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Editorial
- International Competition on Mass Spectrometry Proteomic Diagnosis
- Introduction Paper
- Case-Control Breast Cancer Study of MALDI-TOF Proteomic Mass Spectrometry Data on Serum Samples
- Organizing a Competition on Clinical Mass Spectrometry Based Proteomic Diagnosis
- Competition Paper
- Application of the Random Forest Classification Method to Peaks Detected from Mass Spectrometric Proteomic Profiles of Cancer Patients and Controls
- Developing a Discrimination Rule between Breast Cancer Patients and Controls Using Proteomics Mass Spectrometric Data: A Three-Step Approach
- Principal Component Discriminant Analysis
- Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest
- A Classification Model for the Leiden Proteomics Competition
- Empirical Bayes Logistic Regression
- Autocorrelated Logistic Ridge Regression for Prediction Based on Proteomics Spectra
- Support Vector Machine Approach to Separate Control and Breast Cancer Serum Samples
- A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry
- Clinical Mass Spectrometry Proteomic Diagnosis by Conformal Predictors
- Article
- Assessing the Validity Domains of Graphical Gaussian Models in Order to Infer Relationships among Components of Complex Biological Systems
- Assessment
- Breast Cancer Diagnosis from Proteomic Mass Spectrometry Data: A Comparative Evaluation
Articles in the same Issue
- Editorial
- International Competition on Mass Spectrometry Proteomic Diagnosis
- Introduction Paper
- Case-Control Breast Cancer Study of MALDI-TOF Proteomic Mass Spectrometry Data on Serum Samples
- Organizing a Competition on Clinical Mass Spectrometry Based Proteomic Diagnosis
- Competition Paper
- Application of the Random Forest Classification Method to Peaks Detected from Mass Spectrometric Proteomic Profiles of Cancer Patients and Controls
- Developing a Discrimination Rule between Breast Cancer Patients and Controls Using Proteomics Mass Spectrometric Data: A Three-Step Approach
- Principal Component Discriminant Analysis
- Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest
- A Classification Model for the Leiden Proteomics Competition
- Empirical Bayes Logistic Regression
- Autocorrelated Logistic Ridge Regression for Prediction Based on Proteomics Spectra
- Support Vector Machine Approach to Separate Control and Breast Cancer Serum Samples
- A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry
- Clinical Mass Spectrometry Proteomic Diagnosis by Conformal Predictors
- Article
- Assessing the Validity Domains of Graphical Gaussian Models in Order to Infer Relationships among Components of Complex Biological Systems
- Assessment
- Breast Cancer Diagnosis from Proteomic Mass Spectrometry Data: A Comparative Evaluation