Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
-
Sandrine Dudoit
, Mark J. van der Laan and Katherine S. Pollard
The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization of a null distribution, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. In the special case of family-wise error rate (FWER) control, our method yields the single-step minP and maxT procedures, based on minima of unadjusted p-values and maxima of test statistics, respectively, with the important distinction in the choice of null distribution. Single-step procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution. The special cases of t- and F-statistics are discussed in detail. The companion articles focus on step-down multiple testing procedures for control of the FWER (van der Laan et al., 2004b) and on augmentations of FWER-controlling methods to control error rates such as tail probabilities for the number of false positives and for the proportion of false positives among the rejected hypotheses (van der Laan et al., 2004a). The proposed bootstrap multiple testing procedures are evaluated by a simulation study and applied to genomic data in the fourth article of the series (Pollard et al., 2004).
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Using Alpha Wisely: Improving Power to Detect Multiple QTL
- Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
- Asymptotic Optimality of Likelihood-Based Cross-Validation
- Using Importance Sampling to Improve Simulation in Linkage Analysis
- Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances
- Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?
- Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
- Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities
- Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection
- Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
- Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate
- Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives
- Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data
- A Family-Based Association Test for Repeatedly Measured Quantitative Traits Adjusting for Unknown Environmental and/or Polygenic Effects
- Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
- Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
- Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
- A Mixed Model Approach to Identify Yeast Transcriptional Regulatory Motifs via Microarray Experiments
- Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description
- On the Dependence Structure of Sequence Alignment Scores Calculated with Multiple Scoring Matrices
- Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling
- A Method for Evaluating the Impact of Individual Haplotypes on Disease Incidence in Molecular Epidemiology Studies
- Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment
- MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data
- Sparse Inverse of Covariance Matrix of QTL Effects with Incomplete Marker Data
- Maximum Likelihood for Genome Phylogeny on Gene Content
- Confidence Levels for the Comparison of Microarray Experiments
- PLS Dimension Reduction for Classification with Microarray Data
- Statistical Analysis of Genomic Tag Data
- Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays
- Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data
- A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks
- Validation and Discovery in Markov Models of Genetics Data
- Making Sense of High-Throughput Protein-Protein Interaction Data
- Reader's Reaction
- Reader Reaction
- Response to Foulkes and De Gruttola
- Software Communication
- BayesMendel: an R Environment for Mendelian Risk Prediction
- Letter to the Editor
- Concerns About Unreliable Data from Spotted cDNA Microarrays Due to Cross-Hybridization and Sequence Errors
Articles in the same Issue
- Article
- Using Alpha Wisely: Improving Power to Detect Multiple QTL
- Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
- Asymptotic Optimality of Likelihood-Based Cross-Validation
- Using Importance Sampling to Improve Simulation in Linkage Analysis
- Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances
- Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?
- Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma
- Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities
- Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection
- Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates
- Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate
- Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives
- Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data
- A Family-Based Association Test for Repeatedly Measured Quantitative Traits Adjusting for Unknown Environmental and/or Polygenic Effects
- Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
- Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
- Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
- A Mixed Model Approach to Identify Yeast Transcriptional Regulatory Motifs via Microarray Experiments
- Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description
- On the Dependence Structure of Sequence Alignment Scores Calculated with Multiple Scoring Matrices
- Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling
- A Method for Evaluating the Impact of Individual Haplotypes on Disease Incidence in Molecular Epidemiology Studies
- Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment
- MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data
- Sparse Inverse of Covariance Matrix of QTL Effects with Incomplete Marker Data
- Maximum Likelihood for Genome Phylogeny on Gene Content
- Confidence Levels for the Comparison of Microarray Experiments
- PLS Dimension Reduction for Classification with Microarray Data
- Statistical Analysis of Genomic Tag Data
- Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays
- Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data
- A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks
- Validation and Discovery in Markov Models of Genetics Data
- Making Sense of High-Throughput Protein-Protein Interaction Data
- Reader's Reaction
- Reader Reaction
- Response to Foulkes and De Gruttola
- Software Communication
- BayesMendel: an R Environment for Mendelian Risk Prediction
- Letter to the Editor
- Concerns About Unreliable Data from Spotted cDNA Microarrays Due to Cross-Hybridization and Sequence Errors