A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry

Dirk Valkenborg; Suzy Van Sanden; Dan Lin; Adetayo Kasim; Qi Zhu; Philippe Haldermans; Ivy Jansen; Ziv Shkedy; Tomasz Burzykowski

doi:10.2202/1544-6115.1363

Article

A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry

Dirk Valkenborg , Suzy Van Sanden , Dan Lin , Adetayo Kasim , Qi Zhu , Philippe Haldermans , Ivy Jansen , Ziv Shkedy and Tomasz Burzykowski

Published/Copyright: March 24, 2008

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 7 Issue 2

We present an approach to construct a classification rule based on the mass spectrometry data provided by the organizers of the "Classification Competition on Clinical Mass Spectrometry Proteomic Diagnosis Data." Before constructing a classification rule, we attempted to pre-process the data and to select features of the spectra that were likely due to true biological signals (i.e., peptides/proteins). As a result, we selected a set of 92 features. To construct the classification rule, we considered eight methods for selecting a subset of the features, combined with seven classification methods. The performance of the resulting 56 combinations was evaluated by using a cross-validation procedure with 1000 re-sampled data sets. The best result, as indicated by the lowest overall misclassification rate, was obtained by using the whole set of 92 features as the input for a support-vector machine (SVM) with a linear kernel. This method was therefore used to construct the classification rule. For the training data set, the total error rate for the classification rule, as estimated by using leave-one-out cross-validation, was equal to 0.16, with the sensitivity and specificity equal to 0.87 and 0.82, respectively.

Keywords: proteomic MALDI-TOFMS preprocessing; feature selection; two-stage cross-validation; classification for clinical diagnosis

Published Online: 2008-3-24

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.2202/1544-6115.1363

Keywords for this article

proteomic MALDI-TOFMS preprocessing; feature selection; two-stage cross-validation; classification for clinical diagnosis