Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates

Ling Deng; Dirk F. Moore

doi:10.2202/1544-6115.1391

Article

Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates

Ling Deng and Dirk F. Moore

Published/Copyright: January 28, 2009

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 8 Issue 1

Sequence data from a series of homologous DNA segments from related organisms are typically polymorphic at many sites, and these polymorphisms are the result of evolutionary processes. Such data may be used to estimate the substitution rates as well as the variability of these rates. Careful characterization of the distribution of this variation is essential for accurate estimation of evolutionary distances and phylogeny reconstruction among these sequences. Many researchers have recognized the importance of the variability of substitution rates, which most have modeled using a discrete gamma distribution. Some have extended these methods to explicitly account for the correlation of substitution rates among sites using hidden Markov models; others have proposed context-dependent substitution rate schemes. We accommodate these correlations using a composite likelihood method based on a bivariate gamma distribution, which is more flexible than hidden Markov models in terms of correlation structure and more computationally tractable compared to the context-dependent schemes. We show that the estimates have good theoretical properties. We also use simulations to compare the maximum composite likelihood estimates to those obtained from maximum likelihood based on the independence assumption. We use data from the mitochondrial DNA of ten primates to obtain maximum composite likelihood estimates of the mean substitution rate, overdispersion, and correlation parameters, and use these estimates in a parametric phylogenetic bootstrap to assess the impact of serial correlation on the estimates of substitution rates and branch lengths.

Keywords: bivariate negative binomial distribution; composite likelihood; substitution rate; phylogeny; parametric bootstrap

Published Online: 2009-1-28

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.2202/1544-6115.1391

Keywords for this article

bivariate negative binomial distribution; composite likelihood; substitution rate; phylogeny; parametric bootstrap