Home Life Sciences Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation
Article
Licensed
Unlicensed Requires Authentication

Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation

  • Yuanyuan Xiao and Mark R Segal
Published/Copyright: September 16, 2005

Identification of peptides binding to Major Histocompatibility Complex (MHC) molecules is important for accelerating vaccine development and improving immunotherapy. Accordingly, a wide variety of prediction methods have been applied in this context. In this paper, we introduce (tree-based) ensemble classifiers for such problems and contrast their predictive performance with forefront existing methods for both MHC class I and class II molecules. In addition, we investigate the impact of differing peptide representation schemes on performance. Finally, classifier predictions are used to conduct genomewide scans of a diverse collection of HIV-1 strains, enabling assessment of epitope conservation. We investigated all combinations of six classification methods (classification trees, artificial neural networks, support vector machines, as well as the more recently devised ensemble methods (bagging, random forests, boosting) with four peptide representation schemes (amino acid sequence, select biophysical properties, select quantitative structure-activity relationship (QSAR) descriptors, and the combination of the latter two) in predicting peptide binding to an MHC class I molecule (HLA-A2) and MHC class II molecule (HLA-DR4). Our results show that the ensemble methods are consistently more accurate than the other three alternatives. Furthermore, they are robust with respect to parameter tuning. Among the four representation schemes, the amino acid sequence representation gave consistently (across classifiers) best results. This finding obviates the need for feature selection strategies incurred by use of biophysical and/or QSAR properties. We obtained, and aligned, a diverse set of 32 HIV-1 genomes and pursued genomewide HLA-DR4 epitope profiling by querying with respect to classifier predictions, as obtained under each of the four peptide representation schemes. We validated those epitopes conserved across strains against known T-cell epitopes. Once again, amino acid sequence representation was at least as effective as using properties. Assessment of novel epitope predictions awaits experimental verification.

Published Online: 2005-9-16

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Articles in the same Issue

  1. Article
  2. Estimating Motifs Under Order Restrictions
  3. Reproducible Research: A Bioinformatics Case Study
  4. Generalized Rank Tests for Replicated Microarray Data
  5. Stepwise Normalization of Two-Channel Spotted Microarrays
  6. Comparing Automatic and Manual Image Processing in FLARE Assay Analysis for Colon Carcinogenesis
  7. Pixel-level Signal Modelling with Spatial Correlation for Two-Colour Microarrays
  8. Empirical Bayes Microarray ANOVA and Grouping Cell Lines by Equal Expression Levels
  9. Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data.
  10. Early Diagnostic Marker Panel Determination for Microarray Based Clinical Studies
  11. Prediction of Missing Values in Microarray and Use of Mixed Models to Evaluate the Predictors
  12. Combined Association and Linkage Analysis for General Pedigrees and Genetic Models
  13. Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data
  14. The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix
  15. A Simple Loglinear Model for Haplotype Effects in a Case-Control Study Involving Two Unphased Genotypes
  16. Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns: Environmental Covariates, Gene-Gene and Gene-Environment Interaction
  17. Error Distribution for Gene Expression Data
  18. A General Framework for Weighted Gene Co-Expression Network Analysis
  19. Statistical Inference in Evolutionary Models of DNA Sequences via the EM Algorithm
  20. Comparing Bacterial DNA Microarray Fingerprints
  21. Continuous Covariates in Genetic Association Studies of Case-Parent Triads: Gene and Gene-Environment Interaction Effects, Population Stratification, and Power Analysis
  22. Robust Remote Homology Detection by Feature Based Profile Hidden Markov Models
  23. Empirical Bayes Estimation of a Sparse Vector of Gene Expression Changes
  24. Hierarchical Inverse Gaussian Models and Multiple Testing: Application to Gene Expression Data
  25. FADO: A Statistical Method to Detect Favored or Avoided Distances between Occurrences of Motifs using the Hawkes' Model
  26. Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation
  27. Fold-Change Estimation of Differentially Expressed Genes using Mixture Mixed-Model
  28. Test on the Structure of Biological Sequences via Chaos Game Representation
  29. Reverse Engineering Galactose Regulation in Yeast through Model Selection
  30. Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives.
  31. Weighted Analysis of Paired Microarray Experiments
  32. A Probabilistic Approach to Large-Scale Association Scans: A Semi-Bayesian Method to Detect Disease-Predisposing Alleles
  33. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
  34. Structured Antedependence Models for Functional Mapping of Multiple Longitudinal Traits
  35. Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes
  36. Bayesian Statistical Studies of the Ramachandran Distribution
  37. On Reference Designs For Microarray Experiments
  38. Computing Asymptotic Power and Sample Size for Case-Control Genetic Association Studies in the Presence of Phenotype and/or Genotype Misclassification Errors
Downloaded on 31.12.2025 from https://www.degruyterbrill.com/document/doi/10.2202/1544-6115.1158/html
Scroll to top button