Home Estimation of the covariance structure from SNP allele frequencies
Article
Licensed
Unlicensed Requires Authentication

Estimation of the covariance structure from SNP allele frequencies

  • Jan van Waaij , Zilong Li and Carsten Wiuf EMAIL logo
Published/Copyright: May 26, 2022

Abstract

We propose two new statistics, V ̂ and S ̂ , to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F 2-statistics (distances between pairs of populations). The statistic V ̂ is obtained by averaging over all SNPs (similar to standard statistics). Its expectation is the true covariance matrix of the observed population SNP frequencies, offset by a matrix with identical entries. In contrast, the statistic S ̂ is put in a Bayesian context and is obtained by averaging over pairs of SNPs, such that each SNP is only used once. It thus makes use of the joint distribution of pairs of SNPs. In addition, we provide a number of novel mathematical results about old and new statistics, and their mutual relationship.


Corresponding author: Carsten Wiuf, Department of Mathematical Science, University of Copenhagen, Copenhagen 2100, Denmark, E-mail:

Acknowledgments

CW and JvW are supported by the Independent Research Fund Denmark (grant number: 8021-00360B) and the University of Copenhagen through the Data+ initiative. ZI is supported by the Novo Nordisk Foundation, Denmark (grant number: NNF20OC0061343).

References

DeGiorgio, M., Jakobsson, M., and Rosenberg, N.A. (2009). Out of Africa: modern humanorigins special feature: explaining worldwide patterns of human genetic variation using a coalescent-based serial founder model of migration outward from africa. Proc. Natl. Acad. Sci. U. S. A. 106: 16057–16062. https://doi.org/10.1073/pnas.0903341106.Search in Google Scholar

Escalona, M., Rocha, S., and Posada, D. (2016). A comparison of tools for the simulation of genomic next-generation sequencing data. Nat. Rev. Genet. 17: 459–469. https://doi.org/10.1038/nrg.2016.57.Search in Google Scholar

Hakemi, S.L. (1962). On realizability of a set of integers as degrees of the vertices of a linear graph. i. J. Soc. Ind. Appl. Math. 10: 496–506.10.1137/0110037Search in Google Scholar

Hudson, R.R. (1983). Properties of a neutral allele model with intragenic recombinationl. Theor. Popul. Biol. 23: 183–201. https://doi.org/10.1016/0040-5809(83)90013-8.Search in Google Scholar

Hudson, R.R. (2002). Generating samples under a wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338. https://doi.org/10.1093/bioinformatics/18.2.337.Search in Google Scholar PubMed

Korunes, K.L. and Goldberg, A. (2021). Human genetic admixture. PLoS Genet. 17: e1009374. https://doi.org/10.1371/journal.pgen.1009374.Search in Google Scholar PubMed PubMed Central

Leppala, K., Nielsen, S., and Mailund, T. (2017). admixturegraph: an r package for admixture graph manipulation and fitting. Bioinformatics 33: 1738–1740. https://doi.org/10.1093/bioinformatics/btx048.Search in Google Scholar PubMed PubMed Central

Lipson, M. (2020). Applying f4-statistics and admixture graphs: theory and examples. Mol. Ecol. Resour. 20: 1658–1667. https://doi.org/10.1111/1755-0998.13230.Search in Google Scholar PubMed

Nicholson, G., Smith, A.V., Jonsson, F., Gustafsson, O., Stefansson, K., and Donnelly, P. (2002). Assessing population differentiation and isolation from single-nucleotide polymorphism data. J. R. Stat. Soc. Series B Stat. Methodol. 64: 695–715. https://doi.org/10.1111/1467-9868.00357.Search in Google Scholar

Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., Genschoreck, T., Webster, T., and Reich, D. (2012). Ancient admixture in human history. Genetics 192: 1065–1093. https://doi.org/10.1534/genetics.112.145037.Search in Google Scholar PubMed PubMed Central

Pickrell, J. and Pritchard, J. (2012). Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8: 1–17. https://doi.org/10.1038/npre.2012.6956.1.Search in Google Scholar

Semple, C. and Steel, M. (2003). Phylogenetics, Oxford lecture series in mathematics and its applications. Oxford University Press, Oxford.Search in Google Scholar

Received: 2022-01-21
Accepted: 2022-05-02
Published Online: 2022-05-26

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 8.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2022-0005/html
Scroll to top button