Home Life Sciences Inference for one-step beneficial mutations using next generation sequencing
Article
Licensed
Unlicensed Requires Authentication

Inference for one-step beneficial mutations using next generation sequencing

  • Andrzej J. Wojtowicz EMAIL logo , Craig R. Miller and Paul Joyce
Published/Copyright: January 30, 2015

Abstract

Experimental evolution is an important research method that allows for the study of evolutionary processes occurring in microorganisms. Here we present a novel approach to experimental evolution that is based on application of next generation sequencing. Under this approach population level sequencing is applied to an evolving population in which multiple first-step beneficial mutations occur concurrently. As a result, frequencies of multiple beneficial mutations are observed in each replicate of an experiment. For this new type of data we develop methods of statistical inference. In particular, we propose a method for imputing selection coefficients of first-step beneficial mutations. The imputed selection coefficient are then used for testing the distribution of first-step beneficial mutations and for estimation of mean selection coefficient. In the case when selection coefficients are uniformly distributed, collected data may also be used to estimate the total number of available first-step beneficial mutations.


Corresponding author: Andrzej J. Wojtowicz, Department of Animal Sciences, Washington State University, Pullman, WA 99164, USA, e-mail: ; and Department of Mathematics, University of Idaho, Moscow, ID 83844, USA; Department of Statistics, University of Idaho, Moscow, ID 83844, USA; and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844, USA

Acknowledgments

The authors would like to thank Holly A. Wichman for her comments on this research. This work was supported by a grant from the National Institutes of Health R01 GM076040-01.

Appendix

A Simulation of population dynamics for a single first-step beneficial mutation

Below we present a procedure of simulation of a population dynamics in which one beneficial mutation competes against the wild type. The relative frequency of a mutation is recorded in consecutive generations.

  1. Let Nwt(t) denote the count of the wild type and let Nmut(t) denote the count of the mutant in generation t. Let Nwt=N and Nmut=0 at t=0.

  2. The wild type individuals produce Gwt(t)=Nwt(t)2wwt progeny and mutants produce Gmut(t)=Nmut(t)2wwt(1+si) progeny.

  3. The probability that an offspring of the wild type is a mutant equals μ. Let M(t) denote the number of mutants among the progeny of the wild type, then M(t)∼Bin(Gwt(t), μ).

  4. The population size is constant and equal to N. Therefore, at the end of each generation the number of progeny needs to be reduced to N by binomial sampling. Let

    Nmut(t+1)~Bin(N,Gmut(t)+M(t)Gwt(t)+Gmut(t))

    and

    Nwt(t+1)=NNmut(t+1).

  5. Record current relative frequency of a mutant. Go back to step 2.

B Solution to the difference equation for the mean proportion of a mutation

Here we provide a solution to the following difference equation

π(t+1)=π(t)(qsμ)+μπ(t)(qs1)+1.

Let a=qsμ, b=μ, c=qs–1, and d=1. Since

(da)2+4bc=((1μ)qs)2

is always positive, we can apply the following method. Let

ρ=(da)2+4bc=(1qs+μ)2+4μ(qs1)=((1μ)qs)2=qs+μ1.

Note that in our case ρ is always positive since qs>1–μ, because s≥0, wwt≥0, and 0≤μ≤1. Let

η=da+ρ2c=1qs+μ+qs1+μ2(qs1)=μqs1.

Then

(4)π(t)=1x(t)η. (4)

To calculate x(t) the following first-order difference equation has to be solved

x(t+1)=dηcηc+ax(t)+cηc+a.

Let

α=dηcηc+aβ=cηc+a

then

x(t)=αtx(0)+β(1αt)1α,

where

x(0)=1η+π(0).

Therefore,

x(t)=αtη+β(1αt)1α=(ηc+adcπ(0))(dηc)t+c(ηc+a)t+c(ηc+a)t(η+π(0))(ηc+a)t(η+π(0))(2ηc+ad).

Next, using equation (4) we obtained the final formula for π(t)

π(t)=(ηc+a)t(η+π(0))(2ηc+ad)(ηc+adcπ(0))(dηc)t+c(ηc+a)t+c(ηc+a)t(η+π(0))η=1qs1(qst(qs+μ1)(μ+π(0)(qs1))(qs1)(1π(0))(1μ)t+qst(μ+π(0)(qs1))μ).

C Likelihood function ofrandδ

Result: The log likelihood function of r and δ is given by:

lnL(r,δ)=ln(rY)+a=1Yb=1nln(kaxa,b)+(rY)ln0(1g(s))nkf(s)ds+i=1Yln0g(si)j=1nxi,j(1g(si))nkij=1nxi,jf(si)dsi.

Proof: Since all entries in the matrix X are assumed to be independent random variables distributed according to a binomial distribution, the probability of observing a particular matrix X is equal to the product of binomial probabilities of cells in the matrix multiplied by the combinatorial term that accounts for different possible order of observations. This probability is given by the following formula

P(X1,1=x1,1,X1,2=x1,2,...,XY,n=xY,n,XY+1,1=0,...,Xr,n=0|s1,...,sr)=(rY)P(X1,1=x1,1)P(X1,2=x1,2)...P(XY,n=xY,n)P(XY+1,1=0)...P(Xr,n=0)=(rY)a=1Yb=1n(kaxa,b)i=1Yg(si)j=1nxi,j(1g(si))nkij=1nxi,jl=Y+1r(1g(sl))nk,

where g(si) is given by equation (1). Here we assume that selection coefficients s1, s2,…, sr are unobserved random variables with probability density function f(si). Therefore, we define the joint probability of a matrix X and a vector of selection coefficients as

P(X1,1=x1,1,X1,2=x1,2,...,XY,n=xY,n,XY+1,1=0,...,Xr,n=0,s1,...,sr)=(rY)a=1Yb=1n(kaxa,b)i=1Yg(si)j=1nxi,j(1g(si))nkij=1nxi,jf(si)l=Y+1r(1g(sl))nkf(sl).

To obtain the likelihood function of r and δ we integrated out selection coefficients. Here we present the log likelihood function

lnL(r,δ)=ln(rY)+a=1Yb=1nln(kaxa,b)+(rY)ln0(1g(s))nkf(s)ds+i=1Yln0g(si)j=1nxi,j(1g(si))nkij=1nxi,jf(si)dsi.

D Expected number of singletons

Result: The expected number of singletons is given by the following formula

E(Z)=rnk0g(s)(1g(s))nk1f(s)ds.

Proof: Denote the number singletons by Z, then

Z=i=1rIi,

where Ii is an indicator function such that

Ii=(1if the ith mutant was observed only once in one replicate0otherwise.

Then

E(Ii|si)=P(Ii=1)=P(j=1nXi,j=1).

Since

j=1nXi,j~Bin(nk,g(si)),

where g(si) is given by formula (1),

E(Ii|si)=nkg(si)(1g(si))nk1.

To obtain the expected value of Ii, the selection coefficient si has to be integrated out from the above formula. Therefore,

E(Ii)=nk0g(si)(1g(si))nk1f(si)dsi,

where f(si) is the probability density function of si. Based on this result we obtained the expected value of the number of singletons

E(Z)=i=1rE(Ii)=rnk0g(s)(1g(s))nk1f(s)ds.

E Expected number of neutral mutations observed more than once

Result: The expected number of neutral mutations observed more than once is given by the following formula

E(V)=rneu(1(1tμπ(0))nknk(tμ+π(0))(1tμπ(0))nk1).

Proof: We assume that a neutral mutation accumulates in a population at the rate μ. Therefore, under the deterministic model, after t generations proportion of that mutation in a population is equal to +π(0), where π(0) is the proportion of that mutation in the initial inoculum. Since a site where a particular neutral mutation can occur is sampled nk times and the probability of sampling this mutation in one trial equals +π(0), the number of times it is observed in n replicates is a binomially distributed random variable, which we denote by W

W~Bin(nk,tμ+π(0)).

Then, the probability of sampling that mutation more than once (denoted by γ) is given by

γ=P(W>1)=1[P(W=0)+P(W=1)]=1(1tμπ(0))nknk(tμ+π(0))(1tμπ(0))nk1.

Assuming that there are rnew available neutral mutations, the number of neutral mutations observed more than once (denoted by V) has a binomial distribution

V~Bin(rneu,γ).

Therefore, the expected number of neutral mutations observed more than once is given by

E(V)=rneuγ=rneu(1(1tμπ(0))nknk(tμ+π(0))(1tμπ(0))nk1).

References

Barrett, R. D. H., R. C. MacLean and G. Bell (2006): “Mutations of intermediate effect are responsible for adaptation in evolving Pseudomonas fluorescens populations,” Biol. Lett., 2, 236–238.Search in Google Scholar

Barrick, J. E., M. R. Kauth, C. C. Strelioff and R. E. Lenski (2010): “Escherichia coli rpoB mutants have increased evolvability in proportion to their fitness defects,” Mol. Biol. Evol., 27, 1338–1347.Search in Google Scholar

Beisel, C. J., D. R. Rokyta, H. A. Wichman and P. Joyce (2007): “Testing the extreme value domain of attraction for distributions of beneficial fitness effects,” Genetics, 176, 2441–2449.10.1534/genetics.106.068585Search in Google Scholar PubMed PubMed Central

Brockhurst, M. A., N. Colegrave and D. E. Rozen (2011): “Next-generation sequencing as a tool to study microbial evolution,” Mole. Ecol., 20, 972–980.Search in Google Scholar

Castillo, E. and A. S. Hadi (1997): “Fitting the generalized Pareto distribution to data,” J. Am. Stat. Assoc., 92, 1609–1620.Search in Google Scholar

Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B (Met.), 39, 1–38.Search in Google Scholar

Desai, M. M. and D. S. Fisher (2007): “Beneficial mutation selection balance and the effect of linkage on positive selection,” Genetics, 176, 1759–1798.10.1534/genetics.106.067678Search in Google Scholar PubMed PubMed Central

Drake, J. W. (1991): “A constant rate of spontaneous mutation in DNA-based microbes,” Proc. Natl. Acad. Sci. USA, 88, 7160–7164.10.1073/pnas.88.16.7160Search in Google Scholar PubMed PubMed Central

Drake, J. W., B. Charlesworth, D. Charlesworth and J. F. Crow (1998): “Rates of spontaneous mutation,” Genetics, 148, 1667–1686.10.1093/genetics/148.4.1667Search in Google Scholar PubMed PubMed Central

Gerrish, P. J. and R. E. Lenski (1998): “The fate of competing beneficial mutations in an asexual population,” Genetica, 102–103, 127–144.10.1023/A:1017067816551Search in Google Scholar

Gillespie, J. H. (1983): “A simple stochastic gene substitution model,” Theor. Popul. Biol., 23, 202–215.Search in Google Scholar

Gillespie, J. H. (1984): “Molecular evolution over the mutational landscape,” Evolution, 38, 1116–1129.10.1111/j.1558-5646.1984.tb00380.xSearch in Google Scholar PubMed

Gillespie, J. H. (1991): The causes of molecular evolution, Oxford University Press, New York.Search in Google Scholar

Jain, K. and J. Krug (2007): “Deterministic and stochastic regimes of asexual evolution on rugged fitness landscapes,” Genetics, 175, 1275–1288.10.1534/genetics.106.067165Search in Google Scholar PubMed PubMed Central

Joyce, P., D. R. Rokyta, C. J. Beisel and H. A. Orr (2008): “A general extreme value theory model for the adaptation of DNA sequences under strong selection and weak mutation,” Genetics, 180, 1627–1643.10.1534/genetics.108.088716Search in Google Scholar PubMed PubMed Central

Kimura, M. (1979): “Model of effectively neutral mutations in which selective constraint is incorporated,” Proc. Natl. Acad. Sci. USA, 76, 3440–3444.10.1073/pnas.76.7.3440Search in Google Scholar PubMed PubMed Central

Lang, G. I., D. P. Rice, M. J. Hickman, E. Sodergren, G. M. Weinstock, D. Botstein and M. M. Desai (2013): “Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations,” Nature, 500, 571–574.10.1038/nature12344Search in Google Scholar PubMed PubMed Central

Miller, C. R., P. Joyce and H. A. Wichman (2011): “Mutational effects and population dynamics during viral adaptation challenge current model,” Genetics, 187, 185–202.10.1534/genetics.110.121400Search in Google Scholar PubMed PubMed Central

Ohta, T. (1977): Molecular evolution and polymorphism, National Institute of Genetics, Mishima.Search in Google Scholar

Pickands, J., III (1975): “Statistical inference using extreme order statistics,” Ann. Stat., 3, 119–131.Search in Google Scholar

Rokyta, D. R., P. Joyce, S. B. Caudle and H. A. Wichman (2005): “An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus,” Nat. Genetics, 37, 441–444.Search in Google Scholar

Rokyta, D. R., C. J. Beisel, P. Joyce, M. T. Ferris, C. L. Burch and H. A. Wichman (2008): “Beneficial fitness effects are not exponential for two viruses,” J. Mol. Evol., 67, 368–376.Search in Google Scholar

Sanjuán, R., A. Moya and S. F. Elena (2004): “The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus,” Proc. Natl. Acad. Sci. USA, 101, 8396–8401.10.1073/pnas.0400146101Search in Google Scholar PubMed PubMed Central

Published Online: 2015-1-30
Published in Print: 2015-2-1

©2015 by De Gruyter

Downloaded on 17.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2014-0030/html
Scroll to top button