A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion
-
Anna Kedzierska
We propose a heuristic approach to the detection of evidence for recombination and gene conversion in multiple DNA sequence alignments. The proposed method consists of two stages. In the first stage, a sliding window is moved along the DNA sequence alignment, and phylogenetic trees are sampled from the conditional posterior distribution with MCMC. To reduce the noise intrinsic to inference from the limited amount of data available in the typically short sliding window, a clustering algorithm based on the Robinson-Foulds distance is applied to the trees thus sampled, and the posterior distribution over tree clusters is obtained for each window position. While changes in this posterior distribution are indicative of recombination or gene conversion events, it is difficult to decide when such a change is statistically significant. This problem is addressed in the second stage of the proposed algorithm, where the distributions obtained in the first stage are post-processed with a Bayesian hidden Markov model (HMM). The emission states of the HMM are associated with posterior distributions over phylogenetic tree topology clusters. The hidden states of the HMM indicate putative recombinant segments. Inference is done in a Bayesian sense, sampling parameters from the posterior distribution with MCMC. Of particular interest is the determination of the number of hidden states as an indication of the number of putative recombinant regions. To this end, we apply reversible jump MCMC, and sample the number of hidden states from the respective posterior distribution.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Low-Order Conditional Independence Graphs for Inferring Genetic Networks
- A Generalized Clustering Problem, with Application to DNA Microarrays
- A Bayes Regression Approach to Array-CGH Data
- Statistical Selection of Maintenance Genes for Normalization of Gene Expressions
- Predicting the Strongest Domain-Domain Contact in Interacting Protein Pairs
- Dimension Reduction for Classification with Gene Expression Microarray Data
- A New Type of Stochastic Dependence Revealed in Gene Expression Data
- A New Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity
- Quality Optimised Analysis of General Paired Microarray Experiments
- Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data
- Cross-Validated Bagged Prediction of Survival
- Treatment of Uninformative Families in Mean Allele Sharing Tests for Linkage
- Quantile-Function Based Null Distribution in Resampling Based Multiple Testing
- Combining Results of Microarray Experiments: A Rank Aggregation Approach
- Model Selection for Mixtures of Mutagenetic Trees
- Pseudo-likelihood for Non-reversible Nucleotide Substitution Models with Neighbour Dependent Rates
- A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
- Bayesian Hierarchical Model for Correcting Signal Saturation in Microarrays Using Pixel Intensities
- Using Complexity for the Estimation of Bayesian Networks
- Detecting Local High-Scoring Segments: a First-Stage Approach for Genome-Wide Association Studies
- Examining Protein Structure and Similarities by Spectral Analysis Technique
- Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
- Approximate Sample Size Calculations with Microarray Data: An Illustration
- Numerical Solutions for Patterns Statistics on Markov Chains
- A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion
- A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments
- Validation in Genomics: CpG Island Methylation Revisited
- An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data
- Letter to the Editor
- Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk?
- Reader's Reaction
- Reader's Reaction to "Dimension Reduction for Classification with Gene Expression Microarray Data" by Dai et al (2006)
Articles in the same Issue
- Article
- Low-Order Conditional Independence Graphs for Inferring Genetic Networks
- A Generalized Clustering Problem, with Application to DNA Microarrays
- A Bayes Regression Approach to Array-CGH Data
- Statistical Selection of Maintenance Genes for Normalization of Gene Expressions
- Predicting the Strongest Domain-Domain Contact in Interacting Protein Pairs
- Dimension Reduction for Classification with Gene Expression Microarray Data
- A New Type of Stochastic Dependence Revealed in Gene Expression Data
- A New Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity
- Quality Optimised Analysis of General Paired Microarray Experiments
- Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data
- Cross-Validated Bagged Prediction of Survival
- Treatment of Uninformative Families in Mean Allele Sharing Tests for Linkage
- Quantile-Function Based Null Distribution in Resampling Based Multiple Testing
- Combining Results of Microarray Experiments: A Rank Aggregation Approach
- Model Selection for Mixtures of Mutagenetic Trees
- Pseudo-likelihood for Non-reversible Nucleotide Substitution Models with Neighbour Dependent Rates
- A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting
- Bayesian Hierarchical Model for Correcting Signal Saturation in Microarrays Using Pixel Intensities
- Using Complexity for the Estimation of Bayesian Networks
- Detecting Local High-Scoring Segments: a First-Stage Approach for Genome-Wide Association Studies
- Examining Protein Structure and Similarities by Spectral Analysis Technique
- Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
- Approximate Sample Size Calculations with Microarray Data: An Illustration
- Numerical Solutions for Patterns Statistics on Markov Chains
- A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion
- A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments
- Validation in Genomics: CpG Island Methylation Revisited
- An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data
- Letter to the Editor
- Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk?
- Reader's Reaction
- Reader's Reaction to "Dimension Reduction for Classification with Gene Expression Microarray Data" by Dai et al (2006)