Startseite Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling

  • Torben Friedrich , Christian Koetschan und Tobias Müller
Veröffentlicht/Copyright: 6. Januar 2010

Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths.Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitting the distribution of sequence lengths with the method of moments (MM) and maximum likelihood (ML). Performance evaluations of differently adjusted HMM topologies underline the impact of an optimisation for HMMs based on sequence lengths. Secondary structure prediction on internal transcribed spacer 2 sequences demonstrates exemplarily the general impact of topological optimisations. In summary, we propose a general methodology to improve the modelling behaviour of HMMs by topological optimisation with ML and a fast and easily implementable moment estimator.

Published Online: 2010-1-6

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Artikel in diesem Heft

  1. Article
  2. Epistatic Interactions
  3. Testing for Gene-Gene Interaction with AMMI Models
  4. A Bayesian Hierarchical Model for Quantitative Real-Time PCR Data
  5. Informative or Noninformative Calls for Gene Expression: A Latent Variable Approach
  6. Detecting Genotyping Error Using Measures of Degree of Hardy-Weinberg Disequilibrium
  7. Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling
  8. The Apportionment of Total Genetic Variation by Categorical Analysis of Variance
  9. Dealing with Heterogeneity between Cohorts in Genomewide SNP Association Studies
  10. An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data
  11. Parameter Estimation in Multiple-Hidden I.I.D. Models from Biological Multiple Alignment
  12. Asymptotic Distribution of the "Orthogonal" Quantitative Transmission Disequilibrium Test in a Structured Population: Exact Formula
  13. Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis
  14. An Internal Calibration Method for Protein-Array Studies
  15. Weighted-LASSO for Structured Network Inference from Time Course Data
  16. Trilocus Disequilibrium Analysis of Multiallelic Markers in Outcrossing Populations
  17. Sparse Partial Least Squares Classification for High Dimensional Data
  18. Reconstructability Analysis as a Tool for Identifying Gene-Gene Interactions in Studies of Human Diseases
  19. Sub-Modular Resolution Analysis by Network Mixture Models
  20. Space Oriented Rank-Based Data Integration
  21. The Generalized Odds Ratio as a Measure of Genetic Risk Effect in the Analysis and Meta-Analysis of Association Studies
  22. Network Enrichment Analysis in Complex Experiments
  23. Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression
  24. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data
  25. A Random Coefficients Model for Regional Co-Expression Associated with DNA Copy Number
  26. Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression
  27. Classification of Genomic Sequences via Wavelet Variance and a Self-Organizing Map with an Application to Mitochondrial DNA
  28. Confidently Estimating the Number of DNA Replication Origins
  29. Generalizing Moving Averages for Tiling Arrays Using Combined P-Value Statistics
  30. Lasso Logistic Regression, GSoft and the Cyclic Coordinate Descent Algorithm: Application to Gene Expression Data
  31. Granger Causality Analysis of Human Cell-Cycle Gene Expression Profiles
  32. Mapping Quantitative Trait Loci in a Non-Equilibrium Population
  33. On the Optimal Design of Genetic Variant Discovery Studies
  34. On Optimal Selection of Summary Statistics for Approximate Bayesian Computation
  35. Assessment of LD Matrix Measures for the Analysis of Biological Pathway Association
  36. Optimal Tests Shrinking Both Means and Variances Applicable to Microarray Data Analysis
  37. The Detection of Blur in Affymetrix GeneChips
  38. Regression-Based Multi-Trait QTL Mapping Using a Structural Equation Model
  39. Spatial Clustering of Array CGH Features in Combination with Hierarchical Multiple Testing
  40. Predicting Patient Survival from Longitudinal Gene Expression
  41. Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression
  42. Reader's Reaction
  43. An Alternative Model of Type A Dependence in a Gene Set of Correlated Genes
  44. Letter to the Editor
  45. Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn
Heruntergeladen am 13.9.2025 von https://www.degruyterbrill.com/document/doi/10.2202/1544-6115.1480/html
Button zum nach oben scrollen