A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
-
Daniel Rubin
Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the most powerful method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used in a practical setting. We instead consider splitting the sample so that the optimal cutoffs are estimated from a portion of the data, and then testing on the remaining data using these estimated cutoffs. When the null distributions for all test statistics are the same, the obvious way to control the expected number of false positives would be to use a common cutoff for all tests. In this work, we consider the common cutoff method as a benchmark multiple testing procedure. We show that in certain circumstances the use of estimated optimal cutoffs via sample splitting can dramatically outperform this benchmark method, resulting in increased true discoveries, while retaining Type-I error control. This paper is an updated version of the work presented in Rubin et al. (2005), later expanded upon by Wasserman and Roeder (2006).
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Artikel in diesem Heft
- Article
- Low-Order Conditional Independence Graphs for Inferring Genetic Networks
- A Generalized Clustering Problem, with Application to DNA Microarrays
- A Bayes Regression Approach to Array-CGH Data
- Statistical Selection of Maintenance Genes for Normalization of Gene Expressions
- Predicting the Strongest Domain-Domain Contact in Interacting Protein Pairs
- Dimension Reduction for Classification with Gene Expression Microarray Data
- A New Type of Stochastic Dependence Revealed in Gene Expression Data
- A New Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity
- Quality Optimised Analysis of General Paired Microarray Experiments
- Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data
- Cross-Validated Bagged Prediction of Survival
- Treatment of Uninformative Families in Mean Allele Sharing Tests for Linkage
- Quantile-Function Based Null Distribution in Resampling Based Multiple Testing
- Combining Results of Microarray Experiments: A Rank Aggregation Approach
- Model Selection for Mixtures of Mutagenetic Trees
- Pseudo-likelihood for Non-reversible Nucleotide Substitution Models with Neighbour Dependent Rates
- A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
- Bayesian Hierarchical Model for Correcting Signal Saturation in Microarrays Using Pixel Intensities
- Using Complexity for the Estimation of Bayesian Networks
- Detecting Local High-Scoring Segments: a First-Stage Approach for Genome-Wide Association Studies
- Examining Protein Structure and Similarities by Spectral Analysis Technique
- Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
- Approximate Sample Size Calculations with Microarray Data: An Illustration
- Numerical Solutions for Patterns Statistics on Markov Chains
- A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion
- A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments
- Validation in Genomics: CpG Island Methylation Revisited
- An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data
- Letter to the Editor
- Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk?
- Reader's Reaction
- Reader's Reaction to "Dimension Reduction for Classification with Gene Expression Microarray Data" by Dai et al (2006)
Artikel in diesem Heft
- Article
- Low-Order Conditional Independence Graphs for Inferring Genetic Networks
- A Generalized Clustering Problem, with Application to DNA Microarrays
- A Bayes Regression Approach to Array-CGH Data
- Statistical Selection of Maintenance Genes for Normalization of Gene Expressions
- Predicting the Strongest Domain-Domain Contact in Interacting Protein Pairs
- Dimension Reduction for Classification with Gene Expression Microarray Data
- A New Type of Stochastic Dependence Revealed in Gene Expression Data
- A New Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity
- Quality Optimised Analysis of General Paired Microarray Experiments
- Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data
- Cross-Validated Bagged Prediction of Survival
- Treatment of Uninformative Families in Mean Allele Sharing Tests for Linkage
- Quantile-Function Based Null Distribution in Resampling Based Multiple Testing
- Combining Results of Microarray Experiments: A Rank Aggregation Approach
- Model Selection for Mixtures of Mutagenetic Trees
- Pseudo-likelihood for Non-reversible Nucleotide Substitution Models with Neighbour Dependent Rates
- A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
- Bayesian Hierarchical Model for Correcting Signal Saturation in Microarrays Using Pixel Intensities
- Using Complexity for the Estimation of Bayesian Networks
- Detecting Local High-Scoring Segments: a First-Stage Approach for Genome-Wide Association Studies
- Examining Protein Structure and Similarities by Spectral Analysis Technique
- Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
- Approximate Sample Size Calculations with Microarray Data: An Illustration
- Numerical Solutions for Patterns Statistics on Markov Chains
- A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion
- A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments
- Validation in Genomics: CpG Island Methylation Revisited
- An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data
- Letter to the Editor
- Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk?
- Reader's Reaction
- Reader's Reaction to "Dimension Reduction for Classification with Gene Expression Microarray Data" by Dai et al (2006)