Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation
-
David Rossell
, Rudy Guerra und Clayton Scott
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid.We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott's minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closed-form estimates for the proportion of EE genes. Pseudo-Bayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial advantages for small sample sizes over the SAM method of Tusher et al. [2001], the empirical Bayes procedure of Efron and Tibshirani [2002], the mixture of normals of Pan et al. [2003] and a t-test with p-value adjustment [Dudoit et al., 2003] to control the FDR [Benjamini and Hochberg, 1995].
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Artikel in diesem Heft
- Article
- Self-Organizing Maps with Statistical Phase Synchronization (SOMPS) for Analyzing Cell Cycle-Specific Gene Expression Data
- Coalescent Time Distributions in Trees of Arbitrary Size
- Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis
- Nonparametric Functional Mapping of Quantitative Trait Loci Underlying Programmed Cell Death
- Accommodating Uncertainty in a Tree Set for Function Estimation
- Drifting Markov Models with Polynomial Drift and Applications to DNA Sequences
- Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
- Calculating Confidence Intervals for Prediction Error in Microarray Classification Using Resampling
- Structure Learning in Nested Effects Models
- Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study
- Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples
- Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing
- Re-Cracking the Nucleosome Positioning Code
- Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation
- A SNP Streak Model for the Identification of Genetic Regions Identical-by-descent
- Detecting Two-Locus Gene-Gene Effects Using Monotonisation of the Penetrance Matrix
- Modeling DNA Methylation in a Population of Cancer Cells
- Phenotyping Genetic Diseases Using an Extension of µ-Scores for Multivariate Data
- The Estimator of the Optimal Measure of Allelic Association: Mean, Variance and Probability Distribution When the Sample Size Tends to Infinity
- Predicting Protein Concentrations with ELISA Microarray Assays, Monotonic Splines and Monte Carlo Simulation
- A Comparison of Normalization Techniques for MicroRNA Microarray Data
- Collapsing SNP Genotypes in Case-Control Genome-Wide Association Studies Increases the Type I Error Rate and Power
- Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data
- Data Distribution of Short Oligonucleotide Expression Arrays and Its Application to the Construction of a Generalized Intellectual Framework
- Approximately Sufficient Statistics and Bayesian Computation
- A Composite-Conditional-Likelihood Approach for Gene Mapping Based on Linkage Disequilibrium in Windows of Marker Loci
- Statistical Methods in Integrative Analysis for Gene Regulatory Modules
- Reducing Spatial Flaws in Oligonucleotide Arrays by Using Neighborhood Information
- Pattern Classification of Phylogeny Signals
- A Unification of Multivariate Methods for Meta-Analysis of Genetic Association Studies
- Importance Sampling for the Infinite Sites Model
- Supervised Distance Matrices
- Addressing the Shortcomings of Three Recent Bayesian Methods for Detecting Interspecific Recombination in DNA Sequence Alignments
- A Sparse PLS for Variable Selection when Integrating Omics Data
- Software Communication
- TRAB: Testing Whether Mutation Frequencies Are Above an Unknown Background
Artikel in diesem Heft
- Article
- Self-Organizing Maps with Statistical Phase Synchronization (SOMPS) for Analyzing Cell Cycle-Specific Gene Expression Data
- Coalescent Time Distributions in Trees of Arbitrary Size
- Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis
- Nonparametric Functional Mapping of Quantitative Trait Loci Underlying Programmed Cell Death
- Accommodating Uncertainty in a Tree Set for Function Estimation
- Drifting Markov Models with Polynomial Drift and Applications to DNA Sequences
- Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
- Calculating Confidence Intervals for Prediction Error in Microarray Classification Using Resampling
- Structure Learning in Nested Effects Models
- Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study
- Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples
- Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing
- Re-Cracking the Nucleosome Positioning Code
- Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation
- A SNP Streak Model for the Identification of Genetic Regions Identical-by-descent
- Detecting Two-Locus Gene-Gene Effects Using Monotonisation of the Penetrance Matrix
- Modeling DNA Methylation in a Population of Cancer Cells
- Phenotyping Genetic Diseases Using an Extension of µ-Scores for Multivariate Data
- The Estimator of the Optimal Measure of Allelic Association: Mean, Variance and Probability Distribution When the Sample Size Tends to Infinity
- Predicting Protein Concentrations with ELISA Microarray Assays, Monotonic Splines and Monte Carlo Simulation
- A Comparison of Normalization Techniques for MicroRNA Microarray Data
- Collapsing SNP Genotypes in Case-Control Genome-Wide Association Studies Increases the Type I Error Rate and Power
- Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data
- Data Distribution of Short Oligonucleotide Expression Arrays and Its Application to the Construction of a Generalized Intellectual Framework
- Approximately Sufficient Statistics and Bayesian Computation
- A Composite-Conditional-Likelihood Approach for Gene Mapping Based on Linkage Disequilibrium in Windows of Marker Loci
- Statistical Methods in Integrative Analysis for Gene Regulatory Modules
- Reducing Spatial Flaws in Oligonucleotide Arrays by Using Neighborhood Information
- Pattern Classification of Phylogeny Signals
- A Unification of Multivariate Methods for Meta-Analysis of Genetic Association Studies
- Importance Sampling for the Infinite Sites Model
- Supervised Distance Matrices
- Addressing the Shortcomings of Three Recent Bayesian Methods for Detecting Interspecific Recombination in DNA Sequence Alignments
- A Sparse PLS for Variable Selection when Integrating Omics Data
- Software Communication
- TRAB: Testing Whether Mutation Frequencies Are Above an Unknown Background