Normalization, bias correction, and peak calling for ChIP-seq
-
Aaron Diaz
Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Exploring Multicollinearity Using a Random Matrix Theory Approach
- The Beta-Binomial SGoF method for multiple dependent tests
- Detecting Sample Misidentifications in Genetic Association Studies
- Borrowing Information Across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis
- Hierarchical Bayes Model for Predicting Effectiveness of HIV Combination Therapies
- The practical effect of batch on genomic prediction
- Normalization, bias correction, and peak calling for ChIP-seq
- Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model
- Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals
- Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set
- Non-Iterative, Regression-Based Estimation of Haplotype Associations with Censored Survival Outcomes
- Graph Selection with GGMselect
- Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry
- A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification
- Software Communication
- GENOVA: Gene Overlap Analysis of GWAS Results
Articles in the same Issue
- Article
- Exploring Multicollinearity Using a Random Matrix Theory Approach
- The Beta-Binomial SGoF method for multiple dependent tests
- Detecting Sample Misidentifications in Genetic Association Studies
- Borrowing Information Across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis
- Hierarchical Bayes Model for Predicting Effectiveness of HIV Combination Therapies
- The practical effect of batch on genomic prediction
- Normalization, bias correction, and peak calling for ChIP-seq
- Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model
- Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals
- Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set
- Non-Iterative, Regression-Based Estimation of Haplotype Associations with Censored Survival Outcomes
- Graph Selection with GGMselect
- Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry
- A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification
- Software Communication
- GENOVA: Gene Overlap Analysis of GWAS Results