Startseite Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma

  • Johanna Hardin , Michael Waddell , C. David Page , Fenghuang Zhan , Bart Barlogie , John Shaughnessy und John J Crowley
Veröffentlicht/Copyright: 8. Juni 2004

Motivation: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninformative. Most, if not all, cancers are caused by inherited or acquired genetic mutations that manifest themselves in altered gene expression patterns in the clonally related cancer cells. Microarray technology allows for qualitative and quantitative measurements of the expression levels of thousands of genes simultaneously, and it has now been used both to classify cancers that are morphologically indistinguishable and to predict response to therapy. It is anticipated that this information can also be used to develop molecular diagnostic models and to provide insight into mechanisms of disease progression, e.g., transition from healthy to benign hyperplasia or conversion of a benign hyperplasia to overt malignancy. However, standard data analysis techniques are not trivial to employ on these large data sets. Methodology designed to handle large data sets (or modified to do so) is needed to access the vital information contained in the genetic samples, which in turn can be used to develop more robust and accurate methods of clinical diagnostics and prognostics.Results: Here we report on the application of a panel of statistical and data mining methodologies to classify groups of samples based on expression of 12,000 genes derived from a high density oligonucleotide microarray analysis of highly purified plasma cells from newly diagnosed MM, MGUS, and normal healthy donors. The three groups of samples are each tested against each other. The methods are found to be similar in their ability to predict group membership; all do quite well at predicting MM vs. normal and MGUS vs. normal. However, no method appears to be able to distinguish explicitly the genetic mechanisms between MM and MGUS. We believe this might be due to the lack of genetic differences between these two conditions, and may not be due to the failure of the models. We report the prediction errors for each of the models and each of the methods. Additionally, we report ROC curves for the results on group prediction.Availability: Logistic regression: standard software, available, for example in SAS. Decision trees and boosted trees: C5.0 from www.rulequest.com. SVM: SVM-light is publicly available from svmlight.joachims.org. Naïve Bayes and ensemble of voters are publicly available from www.biostat.wisc.edu/~mwaddell/eov.html. Nearest Shrunken Centroids is publicly available from http://www-stat.stanford.edu/~tibs/PAM.

Published Online: 2004-6-8

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Artikel in diesem Heft

  1. Article
  2. Using Alpha Wisely: Improving Power to Detect Multiple QTL
  3. Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests
  4. Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
  5. Asymptotic Optimality of Likelihood-Based Cross-Validation
  6. Using Importance Sampling to Improve Simulation in Linkage Analysis
  7. Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances
  8. Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?
  9. Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
  10. Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities
  11. Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection
  12. Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
  13. Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate
  14. Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives
  15. Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data
  16. A Family-Based Association Test for Repeatedly Measured Quantitative Traits Adjusting for Unknown Environmental and/or Polygenic Effects
  17. Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
  18. Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
  19. Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
  20. A Mixed Model Approach to Identify Yeast Transcriptional Regulatory Motifs via Microarray Experiments
  21. Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description
  22. On the Dependence Structure of Sequence Alignment Scores Calculated with Multiple Scoring Matrices
  23. Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling
  24. A Method for Evaluating the Impact of Individual Haplotypes on Disease Incidence in Molecular Epidemiology Studies
  25. Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment
  26. MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data
  27. Sparse Inverse of Covariance Matrix of QTL Effects with Incomplete Marker Data
  28. Maximum Likelihood for Genome Phylogeny on Gene Content
  29. Confidence Levels for the Comparison of Microarray Experiments
  30. PLS Dimension Reduction for Classification with Microarray Data
  31. Statistical Analysis of Genomic Tag Data
  32. Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays
  33. Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data
  34. A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks
  35. Validation and Discovery in Markov Models of Genetics Data
  36. Making Sense of High-Throughput Protein-Protein Interaction Data
  37. Reader's Reaction
  38. Reader Reaction
  39. Response to Foulkes and De Gruttola
  40. Software Communication
  41. BayesMendel: an R Environment for Mendelian Risk Prediction
  42. Letter to the Editor
  43. Concerns About Unreliable Data from Spotted cDNA Microarrays Due to Cross-Hybridization and Sequence Errors
Heruntergeladen am 16.9.2025 von https://www.degruyterbrill.com/document/doi/10.2202/1544-6115.1018/html
Button zum nach oben scrollen