The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Lee A Newberg; Lee Ann McCue; Charles E Lawrence

doi:10.2202/1544-6115.1135

Article

The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Lee A Newberg , Lee Ann McCue and Charles E Lawrence

Published/Copyright: June 1, 2005

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 4 Issue 1

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality.We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters.We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Keywords: Sequence Weights; Maximum Likelihood; Motifs; Phylogeny; Sequencing; Consensus Distribution; Position-Weight Matrices

Published Online: 2005-6-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.2202/1544-6115.1135

Keywords for this article

Sequence Weights; Maximum Likelihood; Motifs; Phylogeny; Sequencing; Consensus Distribution; Position-Weight Matrices