A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Markus Ruschhaupt; Wolfgang Huber; Annemarie Poustka; Ulrich Mansmann

doi:10.2202/1544-6115.1078

Home Life Sciences A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Article

Licensed

Unlicensed Requires Authentication

A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Markus Ruschhaupt , Wolfgang Huber , Annemarie Poustka and Ulrich Mansmann

Published/Copyright: December 19, 2004

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 3 Issue 1

MLA
APA
Harvard
Chicago
Vancouver

MLA
APA
Harvard
Chicago
Vancouver

Ruschhaupt, Markus, Huber, Wolfgang, Poustka, Annemarie and Mansmann, Ulrich. "A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks" Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1. https://doi.org/10.2202/1544-6115.1078

Ruschhaupt, M., Huber, W., Poustka, A. & Mansmann, U. (). A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks. Statistical Applications in Genetics and Molecular Biology, 3(1). https://doi.org/10.2202/1544-6115.1078

Ruschhaupt, M., Huber, W., Poustka, A. and Mansmann, U. () A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks. Statistical Applications in Genetics and Molecular Biology, Vol. 3 (Issue 1). https://doi.org/10.2202/1544-6115.1078

Ruschhaupt, Markus, Huber, Wolfgang, Poustka, Annemarie and Mansmann, Ulrich. "A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks" Statistical Applications in Genetics and Molecular Biology 3, no. 1 (). https://doi.org/10.2202/1544-6115.1078

Ruschhaupt M, Huber W, Poustka A, Mansmann U. A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks. Statistical Applications in Genetics and Molecular Biology. ;3(1). https://doi.org/10.2202/1544-6115.1078

Copy

Copied to clipboard

BibTeX EndNote RIS

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

Keywords: compendium; machine learning; classification; microarray; cancer

Published Online: 2004-12-19

You are currently not able to access this content.

Articles in the same Issue

Article
Using Alpha Wisely: Improving Power to Detect Multiple QTL
Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
Asymptotic Optimality of Likelihood-Based Cross-Validation
Using Importance Sampling to Improve Simulation in Linkage Analysis
Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances
Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?
Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities
Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection
Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate
Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives
Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data
A Family-Based Association Test for Repeatedly Measured Quantitative Traits Adjusting for Unknown Environmental and/or Polygenic Effects
Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
A Mixed Model Approach to Identify Yeast Transcriptional Regulatory Motifs via Microarray Experiments
Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description
On the Dependence Structure of Sequence Alignment Scores Calculated with Multiple Scoring Matrices
Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling
A Method for Evaluating the Impact of Individual Haplotypes on Disease Incidence in Molecular Epidemiology Studies
Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment
MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data
Sparse Inverse of Covariance Matrix of QTL Effects with Incomplete Marker Data
Maximum Likelihood for Genome Phylogeny on Gene Content
Confidence Levels for the Comparison of Microarray Experiments
PLS Dimension Reduction for Classification with Microarray Data
Statistical Analysis of Genomic Tag Data
Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays
Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data
A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks
Validation and Discovery in Markov Models of Genetics Data
Making Sense of High-Throughput Protein-Protein Interaction Data
Reader's Reaction
Reader Reaction
Response to Foulkes and De Gruttola
Software Communication
BayesMendel: an R Environment for Mendelian Risk Prediction
Letter to the Editor
Concerns About Unreliable Data from Spotted cDNA Microarrays Due to Cross-Hybridization and Sequence Errors

Search journal Search the content of this journal

https://doi.org/10.2202/1544-6115.1078

Keywords for this article

compendium; machine learning; classification; microarray; cancer

Articles in the same Issue

Article
Using Alpha Wisely: Improving Power to Detect Multiple QTL
Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
Asymptotic Optimality of Likelihood-Based Cross-Validation
Using Importance Sampling to Improve Simulation in Linkage Analysis
Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances
Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?
Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities
Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection
Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate
Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives
Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data
A Family-Based Association Test for Repeatedly Measured Quantitative Traits Adjusting for Unknown Environmental and/or Polygenic Effects
Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
A Mixed Model Approach to Identify Yeast Transcriptional Regulatory Motifs via Microarray Experiments
Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description
On the Dependence Structure of Sequence Alignment Scores Calculated with Multiple Scoring Matrices
Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling
A Method for Evaluating the Impact of Individual Haplotypes on Disease Incidence in Molecular Epidemiology Studies
Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment
MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data
Sparse Inverse of Covariance Matrix of QTL Effects with Incomplete Marker Data
Maximum Likelihood for Genome Phylogeny on Gene Content
Confidence Levels for the Comparison of Microarray Experiments
PLS Dimension Reduction for Classification with Microarray Data
Statistical Analysis of Genomic Tag Data
Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays
Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data
A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks
Validation and Discovery in Markov Models of Genetics Data
Making Sense of High-Throughput Protein-Protein Interaction Data
Reader's Reaction
Reader Reaction
Response to Foulkes and De Gruttola
Software Communication
BayesMendel: an R Environment for Mendelian Risk Prediction
Letter to the Editor
Concerns About Unreliable Data from Spotted cDNA Microarrays Due to Cross-Hybridization and Sequence Errors