Computing Posterior Probabilities for Score-based Alignments Using ppALIGN
-
Stefan Wolfsheimer
Score-based pairwise alignments are widely used in bioinformatics in particular with molecular database search tools, such as the BLAST family. Due to sophisticated heuristics, such algorithms are usually fast but the underlying scoring model unfortunately lacks a statistical description of the reliability of the reported alignments. In particular, close to gaps, in low-score or low-complexity regions, a huge number of alternative alignments arise which results in a decrease of the certainty of the alignment. ppALIGN is a software package that uses hidden Markov Model techniques to compute position-wise reliability of score-based pairwise alignments of DNA or protein sequences. The design of the model allows for a direct connection between the scoring function and the parameters of the probabilistic model. For this reason it is suitable to analyze the outcomes of popular score based aligners and search tools without having to choose a complicated set of parameters. By contrast, our program only requires the classical score parameters (the scoring function and gap costs). The package comes along with a library written in C++, a standalone program for user defined alignments (ppALIGN) and another program (ppBLAST) which can process a complete result set of BLAST. The main algorithms essentially exhibit a linear time complexity (in the alignment lengths), and they are hence suitable for on-line computations. We have also included alternative decoding algorithms to provide alternative alignments. ppALIGN is a fast program/library that helps detect and quantify questionable regions in pairwise alignments. Due to its structure, the input/output interface it can to be connected to other post-processing tools. Empirically, we illustrate its usefulness in terms of correctly predicted reliable regions for sequences generated using the ROSE model for sequence evolution, and identify sensor-specific regions in the denitrifying betaproteobacterium Aromatoleum aromaticum.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk
- Hessian Calculation for Phylogenetic Likelihood based on the Pruning Algorithm and its Applications
- Cluster-Localized Sparse Logistic Regression for SNP Data
- How to analyze many contingency tables simultaneously in genetic association studies
- Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure
- Estimating the Number of One-step Beneficial Mutations
- Testing clonality of three and more tumors using their loss of heterozygosity profiles
- Correction for Founder Effects in Host-Viral Association Studies via Principal Components
- A Non-Homogeneous Dynamic Bayesian Network with Sequentially Coupled Interaction Parameters for Applications in Systems and Synthetic Biology
- An Integrated Hierarchical Bayesian Model for Multivariate eQTL Mapping
- A Novel and Fast Normalization Method for High-Density Arrays
- Performance of MAX Test and Degree of Dominance Index in Predicting the Mode of Inheritance
- A Bayesian autoregressive three-state hidden Markov model for identifying switching monotonic regimes in Microarray time course data
- QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function
- Computing Posterior Probabilities for Score-based Alignments Using ppALIGN
Articles in the same Issue
- Article
- A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk
- Hessian Calculation for Phylogenetic Likelihood based on the Pruning Algorithm and its Applications
- Cluster-Localized Sparse Logistic Regression for SNP Data
- How to analyze many contingency tables simultaneously in genetic association studies
- Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure
- Estimating the Number of One-step Beneficial Mutations
- Testing clonality of three and more tumors using their loss of heterozygosity profiles
- Correction for Founder Effects in Host-Viral Association Studies via Principal Components
- A Non-Homogeneous Dynamic Bayesian Network with Sequentially Coupled Interaction Parameters for Applications in Systems and Synthetic Biology
- An Integrated Hierarchical Bayesian Model for Multivariate eQTL Mapping
- A Novel and Fast Normalization Method for High-Density Arrays
- Performance of MAX Test and Degree of Dominance Index in Predicting the Mode of Inheritance
- A Bayesian autoregressive three-state hidden Markov model for identifying switching monotonic regimes in Microarray time course data
- QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function
- Computing Posterior Probabilities for Score-based Alignments Using ppALIGN