Home Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing
Article
Licensed
Unlicensed Requires Authentication

Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing

  • Wenge Guo and Shyamal Peddada
Published/Copyright: March 24, 2008

It is a common practice to use resampling methods such as the bootstrap for calculating the p-value for each test when performing large scale multiple testing. The precision of the bootstrap p-values and that of the false discovery rate (FDR) relies on the number of bootstraps used for testing each hypothesis. Clearly, the larger the number of bootstraps the better the precision. However, the required number of bootstraps can be computationally burdensome, and it multiplies the number of tests to be performed. Further adding to the computational challenge is that in some applications the calculation of the test statistic itself may require considerable computation time. As technology improves one can expect the dimension of the problem to increase as well. For instance, during the early days of microarray technology, the number of probes on a cDNA chip was less than 10,000. Now the Affymetrix chips come with over 50,000 probes per chip. Motivated by this important need, we developed a simple adaptive bootstrap methodology for large scale multiple testing, which reduces the total number of bootstrap calculations while ensuring the control of the FDR. The proposed algorithm results in a substantial reduction in the number of bootstrap samples. Based on a simulation study we found that, relative to the number of bootstraps required for the Benjamini-Hochberg (BH) procedure, the standard FDR methodology which was the proposed methodology achieved a very substantial reduction in the number of bootstraps. In some cases the new algorithm required as little as 1/6th the number of bootstraps as the conventional BH procedure. Thus, if the conventional BH procedure used 1,000 bootstraps, then the proposed method required only 160 bootstraps. This methodology has been implemented for time-course/dose-response data in our software, ORIOGEN, which is available from the authors upon request.

Published Online: 2008-3-24

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Articles in the same Issue

  1. Article
  2. Self-Organizing Maps with Statistical Phase Synchronization (SOMPS) for Analyzing Cell Cycle-Specific Gene Expression Data
  3. Coalescent Time Distributions in Trees of Arbitrary Size
  4. Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis
  5. Nonparametric Functional Mapping of Quantitative Trait Loci Underlying Programmed Cell Death
  6. Accommodating Uncertainty in a Tree Set for Function Estimation
  7. Drifting Markov Models with Polynomial Drift and Applications to DNA Sequences
  8. Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
  9. Calculating Confidence Intervals for Prediction Error in Microarray Classification Using Resampling
  10. Structure Learning in Nested Effects Models
  11. Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study
  12. Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples
  13. Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing
  14. Re-Cracking the Nucleosome Positioning Code
  15. Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation
  16. A SNP Streak Model for the Identification of Genetic Regions Identical-by-descent
  17. Detecting Two-Locus Gene-Gene Effects Using Monotonisation of the Penetrance Matrix
  18. Modeling DNA Methylation in a Population of Cancer Cells
  19. Phenotyping Genetic Diseases Using an Extension of µ-Scores for Multivariate Data
  20. The Estimator of the Optimal Measure of Allelic Association: Mean, Variance and Probability Distribution When the Sample Size Tends to Infinity
  21. Predicting Protein Concentrations with ELISA Microarray Assays, Monotonic Splines and Monte Carlo Simulation
  22. A Comparison of Normalization Techniques for MicroRNA Microarray Data
  23. Collapsing SNP Genotypes in Case-Control Genome-Wide Association Studies Increases the Type I Error Rate and Power
  24. Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data
  25. Data Distribution of Short Oligonucleotide Expression Arrays and Its Application to the Construction of a Generalized Intellectual Framework
  26. Approximately Sufficient Statistics and Bayesian Computation
  27. A Composite-Conditional-Likelihood Approach for Gene Mapping Based on Linkage Disequilibrium in Windows of Marker Loci
  28. Statistical Methods in Integrative Analysis for Gene Regulatory Modules
  29. Reducing Spatial Flaws in Oligonucleotide Arrays by Using Neighborhood Information
  30. Pattern Classification of Phylogeny Signals
  31. A Unification of Multivariate Methods for Meta-Analysis of Genetic Association Studies
  32. Importance Sampling for the Infinite Sites Model
  33. Supervised Distance Matrices
  34. Addressing the Shortcomings of Three Recent Bayesian Methods for Detecting Interspecific Recombination in DNA Sequence Alignments
  35. A Sparse PLS for Variable Selection when Integrating Omics Data
  36. Software Communication
  37. TRAB: Testing Whether Mutation Frequencies Are Above an Unknown Background
Downloaded on 17.11.2025 from https://www.degruyterbrill.com/document/doi/10.2202/1544-6115.1360/pdf
Scroll to top button