Abstract
Objectives
EpiEstim is a popular statistical framework designed to produce real-time estimates of the time-varying reproductive number,
Methods
Following a real-world example of a COVID-19 outbreak in a small university town, we generate simulated case report data from a two-population mechanistic model with an explicit generation interval distribution and expression to compute true
Results
When population structure is present but not accounted for
Conclusions
Introduction
The time-varying reproductive number,
Many methods to estimate
Population structure has long been recognized to play an important role in the spread of infectious disease [13], 14]. Clearly, not all people mix homogeneously within larger populations. Yet our empirical and theoretical methods often ignore heterogeneous mixing because increasing model realism requires increasing data requirements and model complexity [13], [14], [15], [16], [17], [18]. Nonetheless, when individual variation in infectiousness is not accounted for, population level estimates such as
Here, we evaluate the temporal performance of
Methods
Real data with population structure
We motivate our two-population model and synthetic epidemics with daily COVID-19 incidence reported to Whitman County, Washington, USA during fall 2020. Located in an agricultural area of southeastern Washington, Whitman County is home to Washington State University (WSU), a public research land-grant university, and the city of Pullman. Although the Whitman County community residents and WSU student populations overlap geographically, population mixing does not occur randomly. Students are largely concentrated in housing on or near the WSU campus, and community members live in the city of Pullman or are dispersed throughout Whitman County. During the fall semester all courses were fully remote, however, many students returned to the Pullman campus. COVID testing was available for both community residents and WSU students, and lockdown measures and masking requirements were already in place at both WSU and Pullman city public spaces and many private businesses at the beginning of case reporting. Thus, transmission rates were very likely near constant throughout the outbreak. Even with interventions in place, Whitman County reported an outbreak of COVID-19 within the student and subsequently the community populations with one of the highest rates reported in Washington State at the time [22]. The epidemic time series data we present here begins on August 17, 2020, the week prior to the beginning of the semester when most students returned to campus, and reports until December 27, 2020 (days 230–362 on the Julian calendar). We implement EpiEstim on the daily case report data, weekly aggregated data, and separately for the WSU student and Whitman County community subpopulations.
Two-population model
To study the effects of population structure on
We assume the latency rate (σ) and recover rate (γ) are constant and equivalent in both populations. All transmission parameters are constant throughout the epidemic, where cross-transmission from i → j is equal to i ← j such that β ij =β ji , and β j ≥β i ≥β ij . Population sizes remain constant throughout the duration of the epidemic, with no substantial loss to death or migration. However, we also assume that subpopulation sizes are unequal, but each subpopulation size is large enough to make a significant contribution to the total population size. We chose values for the relative subpopulation sizes based on the sizes of the WSU student and Whitman County Community subpopulations. Definitions of parameters and their symbols are given in Table 1.
Symbols, definitions, parameter values, and their source if applicable used in simulations to vary the effect of population structure.
Symbol | Definition | Value/range | Source |
---|---|---|---|
N | Total pop. size | 35,039 | [28], [29] |
ϕ i | Prop. pop. i | 0.6 | [28] |
ϕ j | Prop. pop. j | 0.4 | [29] |
β i | Transmission rate i | 4 × 10−5 | NA |
β j | Transmission rate j | [6 × 10−5, 12 × 10−5] | NA |
β ij | Cross-transmission rate | [1 × 10−5, 1 × 10−8] | NA |
σ | Latency rate | 1/3.59 | [30] |
γ | Recovery rate | 1/3.56 | [30] |
ρ | Testing rate | 0.1 | [30] |
k | Dispersion parameter | 2 | NA |
E i0, E j0 | Initial exposed | 20 | [30] |
I i0, I j0 | Initial infected | 18 | [30] |
Specifying true
R
t
We derive the time-varying reproductive number directly from our ODE model (Equations (1a)–(1h)) by finding
Equation (2) gives us an exact calculation for
Time varying reproductive numbers
the vector {S i (t), S j (t)}′ and constant during of infection γ −1, the exact subpopulation time varying reproductive numbers can written as a system of equations that rely on the depletion of susceptible individuals across populations:
We also compare the conditional subpopulation time varying reproductive numbers from Equation (4) to the subpopulation estimates from EpiEstim, likewise denoted
Specifying the generation interval distribution
To estimate
Champerdon et al. [25] present a unique result in mathematical epidemiology by explicitly deriving the link between the intrinsic generation interval distribution, g, used in renewal equation models and SEIR compartmental models with Erlang distributed latent and infectious periods. We follow their logic for our two-population compartment model with only one latent stage and one infectious stage that occur simultaneously in subpopulations {i, j}. We let F(τ) be the total probability of drawing a random individual from either infected compartment {I i , I j } τ units of time after being infected. Using the expressions for F k from the system of ODEs in Appendix A.2 from Champredon et al. [25], we formulate an expression for F (i.e. F k=1) specific to our two-population model:
where {π i , π j } are the proportions each subpopulation contributes to the total population size. Since we explicitly assume σ i =σ j =σ and γ i =γ j =γ, Equation (6) reduces to
as given in Champredon et al. [25] when k=1. Further, the intrinsic infectiousness of individuals who have been infected for length of time τ is βF(τ), and we obtain the intrinsic generation interval distribution following [25] by normalizing with
In our model, β in Equation (7) is the composite transmission rate within and across subpopulations. We obtain our composite β by finding the dominant eigenvalue of the inter-community mixing matrix [see [13] in Equation (3) to get
Now we can use Equation (7) to find the intrinstic generation interval distribution. We use
with mean
and variance
This result for a two-population SEIR compartment model is the same expression given in Table 1 in Champredon et al. [25] when the latency and recovery rates are the same for the subpopulations. All mathematical analyses were performed in Wolfram Mathematica 13.1 [26].
Synthetic data with population structure
To quantify the effect of hidden population structure on the timeliness of
Here, ρ is the testing rate and {C i , C j } are true daily incidence in each subpopulation at time t, which comprise the mean, and k is the dispersion parameter. We use the negative binomial distribution, with its separate variance term, to model case reports from small populations. We assume that testing rate is constant and that cases are reported at symptom onset, to give a realistic delay in reporting from disease onset. Table 1 gives the symbols, definitions, and values (or range of values) used in all simulations. True incidence and case reports are given for the total population as the sum of the incidence and sum of the case reports from the two subpopulations, respectively. We generate 100 synthetic epidemics for each parameter combination in Table 1 to demonstrate the bias in temporal accuracy of the estimates from EpiEstim when population structure is not considered. All simulations were performed in R [11].
R
t
estimation with EpiEstim
The general method implemented in the EpiEstim R package relies on the renewal equation to estimate
where i(t) is the total number of infection incidents at time t in days [1], 3], 12]. EpiEstim estimates the instantaneous reproduction number which corresponds to the time-varying reproductive number and both are denoted as
Another caveat with the method implemented by EpiEstim (Equation (13)) is the time window of incidence reporting must be in the same time units as the generation or serial interval distribution [3]. Nash et al. [3] extend EpiEstim’s applicability to coarsely aggregated data by implementing an expectation-maximisation algorithm to reconstruct daily incidence from temporally aggregated data and then estimate
Last, we estimate the time-varying reproductive numbers
Results
We begin our analyses with a demonstration of time-varying reproductive number estimation with EpiEstim using COVID-19 incidence reported to Whitman County, Washington, USA. Figure 1 shows daily and weekly COVID-19 case reports and corresponding

Whitman Co. fall 2020 COVID-19 case reports and
Because we are interested in quantifying the timing bias from EpiEstim, we present results from simulated data where we know the true time-varying reproductive number,

Example of a synthetic COVID-19 outbreak in a structured population with parameter values given in Table 1 where β
j
=12 × 10−5 and β
ij
=1 × 10−8. (A) Synthetic epidemic and true
To evaluate the performance of EpiEstim when applied in small, structured populations, we first quantify the bias in the timeliness of

Results from simulated data computing true
Next, we determine the timing accuracy of

Results from simulated data estimating
When population structure is present, the subpopulation transmission dynamics influence

True
These observations lead to a practical solution to recover the timing accuracy when population structure is present. When applying EpiEstim to populations with non-random mixing between subgroups, we demonstrate the subpopulation with the lagging outbreak crosses the

Results from simulated data computing true
Discussion
We have developed a two-population model with a specified generation interval distribution to demonstrate the impact of population structure on
Our real-world example of COVID-19 incidence reported to Whitman County demonstrates when our findings and recommendations would have practical application. The population in Whitman County has two defined subpopulations, the WSU students and the surrounding community, and this information has been reported alongside incidence. In this case, population structure should be considered when applying the methods of EpiEstim. Our results suggest that the time point when
The reproductive number,
Funding source: NIH
Award Identifier / Grant number: R35GM147013
Funding source: CDC Center for Forecasting and Outbreak Analytics
Award Identifier / Grant number: CDC-RFA-FT-23-0069
Acknowledgments
The authors would like to acknowledge Drs. Anne Cori and Matthew Mietchen for their thoughts and advice in the conceptualization of this work, and the two anonymous Reviewers who provided very helpful comments and feedback.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission. ETL conceptualized the project. EC developed the model and performed all mathematical and simulation-based analyses. EC wrote the manuscript, which was reviewed by ETL.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: This publication was made possible by cooperative agreement CDC-RFA-FT-23-0069 from the CDC’s Center for Forecasting and Outbreak Analytics. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention. ETL was also funded by NIH R35GM147013.
-
Data availability: All data, Mathematica Notebooks and R code can be found at github.com/erinclancey/EpiEstim-Timing.
References
1. Cori, A, Ferguson, NM, Fraser, C, Cauchemez, S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol 2013;178:1505–12. https://doi.org/10.1093/aje/kwt133.Search in Google Scholar PubMed PubMed Central
2. Nash, RK, Nouvellet, P, Cori, A. Real-time estimation of the epidemic reproduction number: scoping review of the applications and challenges. PLoS Digit Health 2022;1:e0000052. https://doi.org/10.1371/journal.pdig.0000052.Search in Google Scholar PubMed PubMed Central
3. Nash, RK, Bhatt, S, Cori, A, Nouvellet, P. Estimating the epidemic reproduction number from temporally aggregated incidence data: a statistical modelling approach and software tool. PLoS Comput Biol 2023;19:e1011439. https://doi.org/10.1371/journal.pcbi.1011439.Search in Google Scholar PubMed PubMed Central
4. Nouvellet, P, Cori, A, Garske, T, Blake, IM, Dorigatti, I, Hinsley, W, et al.. A simple approach to measure transmissibility and forecast incidence. Epidemics 2018;22:29–35. https://doi.org/10.1016/j.epidem.2017.02.012.Search in Google Scholar PubMed PubMed Central
5. Wallinga, J, Teunis, P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol 2004;160:509–16. https://doi.org/10.1093/aje/kwh255.Search in Google Scholar PubMed PubMed Central
6. Bettencourt, LM, Ribeiro, RM. Real time Bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One 2008;3:e2185. https://doi.org/10.1371/journal.pone.0002185.Search in Google Scholar PubMed PubMed Central
7. Lytras, T. Estimate epidemic effective reproduction number in a Bayesian framework [R package bayEStim version 0.0. 1] [Internet]. Available from: https://github.com/thlytras/bayEStim.Search in Google Scholar
8. Scire, J, Huisman, JS, Grosu, A, Angst, DC, Lison, A, Li, J, et al.. estimateR: an R package to estimate and monitor the effective reproductive number. BMC Bioinf 2023;24:310. https://doi.org/10.1186/s12859-023-05428-4.Search in Google Scholar PubMed PubMed Central
9. Abbott, S, Hellewell, J, Thompson, RN, Sherratt, K, Gibbs, HP, Bosse, NI, et al.. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Res 2020;5:112. https://doi.org/10.12688/wellcomeopenres.16006.1.Search in Google Scholar
10. Cori, A, Cauchemez, S, Ferguson, NM, Fraser, C, Dahlqwist, E, Demarsh, PA, et al.. Package ‘EpiEstim’. Vienna Austria: CRAN; 2020, vol 13.Search in Google Scholar
11. R Core Team. R: a language and environment for statistical computing. Vienna, Austria; 2023. Available from: https://www.R-project.org/.Search in Google Scholar
12. Gostic, KM, McGough, L, Baskerville, EB, Abbott, S, Joshi, K, Tedijanto, C, et al.. Practical considerations for measuring the effective reproductive number, Rt. PLoS Comput Biol 2020;16:e1008409. https://doi.org/10.1371/journal.pcbi.1008409.Search in Google Scholar PubMed PubMed Central
13. Watts, DJ, Muhamad, R, Medina, DC, Dodds, PS. Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc Natl Acad Sci USA 2005;102:11157–62. https://doi.org/10.1073/pnas.0501226102.Search in Google Scholar PubMed PubMed Central
14. Lloyd-Smith, JO, Schreiber, SJ, Kopp, PE, Getz, WM. Superspreading and the effect of individual variation on disease emergence. Nature 2005;438:355–9. https://doi.org/10.1038/nature04153.Search in Google Scholar PubMed PubMed Central
15. White, LF, Archer, B, Pagano, M. Determining the dynamics of influenza transmission by age. Emerg Themes Epidemiol 2014;11:1–10. https://doi.org/10.1186/1742-7622-11-4.Search in Google Scholar PubMed PubMed Central
16. Vazquez, A. Epidemic outbreaks on structured populations. J Theor Biol 2007;245:125–9. https://doi.org/10.1016/j.jtbi.2006.09.018.Search in Google Scholar PubMed
17. Fraser, C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One 2007;2:e758. https://doi.org/10.1371/journal.pone.0000758.Search in Google Scholar PubMed PubMed Central
18. Klinkenberg, D, Fraser, C, Heesterbeek, H. The effectiveness of contact tracing in emerging epidemics. PLoS One 2006;1:e12. https://doi.org/10.1371/journal.pone.0000012.Search in Google Scholar PubMed PubMed Central
19. Delamater, PL, Street, EJ, Leslie, TF, Yang, YT, Jacobsen, KH. Complexity of the basic reproduction number (R0). Emerg Infect Dis 2019;25:1. https://doi.org/10.3201/eid2501.171901.Search in Google Scholar PubMed PubMed Central
20. Davies, NG, Klepac, P, Liu, Y, Prem, K, Jit, M, Eggo, RM. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med 2020;26:1205–11. https://doi.org/10.1038/s41591-020-0962-9.Search in Google Scholar PubMed
21. Bharti, N, Lambert, B, Exten, C, Faust, C, Ferrari, M, Robinson, A. Large university with high COVID-19 incidence is not associated with excess cases in non-student population. Sci Rep 2022;12:3313. https://doi.org/10.1038/s41598-022-07155-x.Search in Google Scholar PubMed PubMed Central
22. Painter, I, Huynh, G, Lavista Ferres, JM, Etzioni, R, Richardson, BA, Thakkar, N, et al.. SitRep 15: COVID-19 transmission across Washington state; 2020. Available from: https://iazpvnewgrp01.blob.core.windows.net/source/archived/WA_Situation_Report_15_COVID-19_transmission_across_Washington_State.pdf.Search in Google Scholar
23. Diekmann, O, Heesterbeek, J, Roberts, MG. The construction of next-generation matrices for compartmental epidemic models. J R Soc Interface 2010;7:873–85. https://doi.org/10.1098/rsif.2009.0386.Search in Google Scholar PubMed PubMed Central
24. Britton, T, Scalia Tomba, G. Estimation in emerging epidemics: biases and remedies. J R Soc Interface 2019;16:20180670. https://doi.org/10.1098/rsif.2018.0670.Search in Google Scholar PubMed PubMed Central
25. Champredon, D, Dushoff, J, Earn, DJ. Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation. SIAM J Appl Math 2018;78:3258–78. https://doi.org/10.1137/18m1186411.Search in Google Scholar
26. Wolfram Research, Inc. Mathematica. Champaign, Illinois: Wolfram Research, Inc.; 2021. Available from: https://www.wolfram.com/mathematica.Search in Google Scholar
27. King, AA, Nguyen, D, Ionides, EL. Statistical inference for partially observed Markov processes via the R package pomp. J Stat Softw 2016;69:1–43. https://doi.org/10.18637/jss.v069.i12.Search in Google Scholar
28. Washington State Office of Financial Management. State of Washington 2021 population trends; 2021. Available from: https://ofm.wa.gov/sites/default/files/public/dataresearch/pop/april1/ofm_april1__poptrends.pdf.Search in Google Scholar
29. Office of Strategy, Planning, and Analysis. Total student enrollment – Washington State University; 2022. Available from: https://ir.wsu.edu/total-student-enrollment/.Search in Google Scholar
30. Pei, S, Kandula, S, Shaman, J. Differential effects of intervention timing on COVID-19 spread in the United States. Sci Adv 2020;6:eabd6370. https://doi.org/10.1126/sciadv.abd6370.Search in Google Scholar PubMed PubMed Central
31. Riley, S, Fraser, C, Donnelly, CA, Ghani, AC, Abu-Raddad, LJ, Hedley, AJ, et al.. Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science 2003;300:1961–6. https://doi.org/10.1126/science.1086478.Search in Google Scholar PubMed
32. Thomas, R. Estimated population mixing by country and risk cohort for the HIV/AIDS epidemic in Western Europe. J Geogr Syst 2001;3:283–301. https://doi.org/10.1007/pl00011481.Search in Google Scholar
33. Hué, S, Pillay, D, Clewley, JP, Pybus, OG. Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. Proc Natl Acad Sci USA 2005;102:4425–9. https://doi.org/10.1073/pnas.0407534102.Search in Google Scholar PubMed PubMed Central
34. Achaiah, NC, Subbarajasetty, SB, Shetty, RM. R0 and re of COVID-19: can we predict when the pandemic outbreak will be contained? Indian J Crit Care Med Peer-Rev Off Publ Indian Soc Crit Care Med 2020;24:1125. https://doi.org/10.5005/jp-journals-10071-23649.Search in Google Scholar PubMed PubMed Central
© 2025 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Causal mediation analysis for difference-in-difference design and panel data
- Research Articles
- Sensitivity analysis for unmeasured confounding for a joint effect with an application to survey data
- Investigating the association between school substance programs and student substance use: accounting for informative cluster size
- The quantiles of extreme differences matrix for evaluating discriminant validity
- Finite-sample improved confidence intervals based on the estimating equation theory for the modified Poisson and least-squares regressions
- What if dependent causes of death were independent?
- Bot invasion: protecting the integrity of online surveys against spamming
- A study of a stochastic model and extinction phenomenon of meningitis epidemic
- Understanding the impact of media and latency in information response on the disease propagation: a mathematical model and analysis
- Time-varying reproductive number estimation for practical application in structured populations
- Perspective
- Should we still use pointwise confidence intervals for the Kaplan–Meier estimator?
Articles in the same Issue
- Causal mediation analysis for difference-in-difference design and panel data
- Research Articles
- Sensitivity analysis for unmeasured confounding for a joint effect with an application to survey data
- Investigating the association between school substance programs and student substance use: accounting for informative cluster size
- The quantiles of extreme differences matrix for evaluating discriminant validity
- Finite-sample improved confidence intervals based on the estimating equation theory for the modified Poisson and least-squares regressions
- What if dependent causes of death were independent?
- Bot invasion: protecting the integrity of online surveys against spamming
- A study of a stochastic model and extinction phenomenon of meningitis epidemic
- Understanding the impact of media and latency in information response on the disease propagation: a mathematical model and analysis
- Time-varying reproductive number estimation for practical application in structured populations
- Perspective
- Should we still use pointwise confidence intervals for the Kaplan–Meier estimator?