Improving Hidden Markov Models for Classification of Human Immunodeficiency Virus-1 Subtypes through Linear Classifier Learning
-
Ingo Bulla
, Anne-Kathrin Schultz and Peter Meinicke
Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the unsupervised estimation of emission probabilities by a supervised method. For that purpose, we use regularized linear discriminant learning together with a balancing scheme to account for the widely varying sample size. After training the multiclass linear discriminants, the corresponding weights are transformed to valid probabilities using a softmax function.We apply this modified algorithm to classify HIV-1 sequence data (in the form of partial-length HIV-1 sequences and semi-artificial recombinants) and show that the performance of pHMMs can be significantly improved by the proposed technique.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- The Inheritance Procedure: Multiple Testing of Tree-structured Hypotheses
- Optimality Criteria for the Design of 2-Color Microarray Studies
- Stopping-Time Resampling and Population Genetic Inference under Coalescent Models
- A Mixture-Model Approach for Parallel Testing for Unequal Variances
- Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps
- MicroRNA Transcription Start Site Prediction with Multi-objective Feature Selection
- A Context Dependent Pair Hidden Markov Model for Statistical Alignment
- Fast Wavelet Based Functional Models for Transcriptome Analysis with Tiling Arrays
- Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length
- Transcriptional Network Inference from Functional Similarity and Expression Data: A Global Supervised Approach
- Improving Hidden Markov Models for Classification of Human Immunodeficiency Virus-1 Subtypes through Linear Classifier Learning
Articles in the same Issue
- Article
- The Inheritance Procedure: Multiple Testing of Tree-structured Hypotheses
- Optimality Criteria for the Design of 2-Color Microarray Studies
- Stopping-Time Resampling and Population Genetic Inference under Coalescent Models
- A Mixture-Model Approach for Parallel Testing for Unequal Variances
- Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps
- MicroRNA Transcription Start Site Prediction with Multi-objective Feature Selection
- A Context Dependent Pair Hidden Markov Model for Statistical Alignment
- Fast Wavelet Based Functional Models for Transcriptome Analysis with Tiling Arrays
- Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length
- Transcriptional Network Inference from Functional Similarity and Expression Data: A Global Supervised Approach
- Improving Hidden Markov Models for Classification of Human Immunodeficiency Virus-1 Subtypes through Linear Classifier Learning