A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
-
Juliane Schäfer
and Korbinian Strimmer
Inferring large-scale covariance matrices from sparse genomic data is an ubiquitous problem in bioinformatics. Clearly, the widely used standard covariance and correlation estimators are ill-suited for this purpose. As statistically efficient and computationally fast alternative we propose a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity.Subsequently, we apply this improved covariance estimator (which has guaranteed minimum mean squared error, is well-conditioned, and is always positive definite even for small sample sizes) to the problem of inferring large-scale gene association networks. We show that it performs very favorably compared to competing approaches both in simulations as well as in application to real expression data.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Estimating Motifs Under Order Restrictions
- Reproducible Research: A Bioinformatics Case Study
- Generalized Rank Tests for Replicated Microarray Data
- Stepwise Normalization of Two-Channel Spotted Microarrays
- Comparing Automatic and Manual Image Processing in FLARE Assay Analysis for Colon Carcinogenesis
- Pixel-level Signal Modelling with Spatial Correlation for Two-Colour Microarrays
- Empirical Bayes Microarray ANOVA and Grouping Cell Lines by Equal Expression Levels
- Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data.
- Early Diagnostic Marker Panel Determination for Microarray Based Clinical Studies
- Prediction of Missing Values in Microarray and Use of Mixed Models to Evaluate the Predictors
- Combined Association and Linkage Analysis for General Pedigrees and Genetic Models
- Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data
- The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix
- A Simple Loglinear Model for Haplotype Effects in a Case-Control Study Involving Two Unphased Genotypes
- Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns: Environmental Covariates, Gene-Gene and Gene-Environment Interaction
- Error Distribution for Gene Expression Data
- A General Framework for Weighted Gene Co-Expression Network Analysis
- Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm
- Comparing Bacterial DNA Microarray Fingerprints
- Continuous Covariates in Genetic Association Studies of Case-Parent Triads: Gene and Gene-Environment Interaction Effects, Population Stratification, and Power Analysis
- Robust Remote Homology Detection by Feature Based Profile Hidden Markov Models
- Empirical Bayes Estimation of a Sparse Vector of Gene Expression Changes
- Hierarchical Inverse Gaussian Models and Multiple Testing: Application to Gene Expression Data
- FADO: A Statistical Method to Detect Favored or Avoided Distances between Occurrences of Motifs using the Hawkes' Model
- Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation
- Fold-Change Estimation of Differentially Expressed Genes using Mixture Mixed-Model
- Test on the Structure of Biological Sequences via Chaos Game Representation
- Reverse Engineering Galactose Regulation in Yeast through Model Selection
- Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives.
- Weighted Analysis of Paired Microarray Experiments
- A Probabilistic Approach to Large-Scale Association Scans: A Semi-Bayesian Method to Detect Disease-Predisposing Alleles
- A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
- Structured Antedependence Models for Functional Mapping of Multiple Longitudinal Traits
- Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes
- Bayesian Statistical Studies of the Ramachandran Distribution
- On Reference Designs For Microarray Experiments
- Computing Asymptotic Power and Sample Size for Case-Control Genetic Association Studies in the Presence of Phenotype and/or Genotype Misclassification Errors
Articles in the same Issue
- Article
- Estimating Motifs Under Order Restrictions
- Reproducible Research: A Bioinformatics Case Study
- Generalized Rank Tests for Replicated Microarray Data
- Stepwise Normalization of Two-Channel Spotted Microarrays
- Comparing Automatic and Manual Image Processing in FLARE Assay Analysis for Colon Carcinogenesis
- Pixel-level Signal Modelling with Spatial Correlation for Two-Colour Microarrays
- Empirical Bayes Microarray ANOVA and Grouping Cell Lines by Equal Expression Levels
- Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data.
- Early Diagnostic Marker Panel Determination for Microarray Based Clinical Studies
- Prediction of Missing Values in Microarray and Use of Mixed Models to Evaluate the Predictors
- Combined Association and Linkage Analysis for General Pedigrees and Genetic Models
- Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data
- The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix
- A Simple Loglinear Model for Haplotype Effects in a Case-Control Study Involving Two Unphased Genotypes
- Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns: Environmental Covariates, Gene-Gene and Gene-Environment Interaction
- Error Distribution for Gene Expression Data
- A General Framework for Weighted Gene Co-Expression Network Analysis
- Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm
- Comparing Bacterial DNA Microarray Fingerprints
- Continuous Covariates in Genetic Association Studies of Case-Parent Triads: Gene and Gene-Environment Interaction Effects, Population Stratification, and Power Analysis
- Robust Remote Homology Detection by Feature Based Profile Hidden Markov Models
- Empirical Bayes Estimation of a Sparse Vector of Gene Expression Changes
- Hierarchical Inverse Gaussian Models and Multiple Testing: Application to Gene Expression Data
- FADO: A Statistical Method to Detect Favored or Avoided Distances between Occurrences of Motifs using the Hawkes' Model
- Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation
- Fold-Change Estimation of Differentially Expressed Genes using Mixture Mixed-Model
- Test on the Structure of Biological Sequences via Chaos Game Representation
- Reverse Engineering Galactose Regulation in Yeast through Model Selection
- Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives.
- Weighted Analysis of Paired Microarray Experiments
- A Probabilistic Approach to Large-Scale Association Scans: A Semi-Bayesian Method to Detect Disease-Predisposing Alleles
- A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
- Structured Antedependence Models for Functional Mapping of Multiple Longitudinal Traits
- Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes
- Bayesian Statistical Studies of the Ramachandran Distribution
- On Reference Designs For Microarray Experiments
- Computing Asymptotic Power and Sample Size for Case-Control Genetic Association Studies in the Presence of Phenotype and/or Genotype Misclassification Errors