Causal inference under over-simplified longitudinal causal models

Lola Étiévant; Vivian Viallon

doi:10.1515/ijb-2020-0081

Article

Causal inference under over-simplified longitudinal causal models

Lola Étiévant and Vivian Viallon

Published/Copyright: November 1, 2021

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal The International Journal of Biostatistics Volume 18 Issue 2

Abstract

Many causal models of interest in epidemiology involve longitudinal exposures, confounders and mediators. However, repeated measurements are not always available or used in practice, leading analysts to overlook the time-varying nature of exposures and work under over-simplified causal models. Our objective is to assess whether – and how – causal effects identified under such misspecified causal models relates to true causal effects of interest. We derive sufficient conditions ensuring that the quantities estimated in practice under over-simplified causal models can be expressed as weighted averages of longitudinal causal effects of interest. Unsurprisingly, these sufficient conditions are very restrictive, and our results state that the quantities estimated in practice should be interpreted with caution in general, as they usually do not relate to any longitudinal causal effect of interest. Our simulations further illustrate that the bias between the quantities estimated in practice and the weighted averages of longitudinal causal effects of interest can be substantial. Overall, our results confirm the need for repeated measurements to conduct proper analyses and/or the development of sensitivity analyses when they are not available.

Keywords: causal inference; identifiability; longitudinal model; structural causal model

Corresponding author: Lola Étiévant, Institut Camille Jordan, Villeurbanne 69622, France, E-mail: lola.etievant@gmail.com

Acknowledgments

The authors are grateful to Stijn Vansteelandt for insightful comments on preliminary versions of this article, and to the reviewers of the International Journal of Biostatistics for valuable comments and suggestions.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Disclaimers: Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.

Appendix A. Proof of Theorem 1

Consider a longitudinal model (L) and assume that the only available data regarding the exposure of interest consists in X , which is a deterministic function of X ̲ t 1 t 2 , with 1 ≤ t ₁ ≤ t ₂ ≤ T. Let x and x * , x ≠ x * , be two given possible values of X , and assume that there exists some W ⊂ Z , taking its values in some space Ω W , such that 0 < P ( X = x ∣ W = w ) < 1 and 0 < P ( X = x * ∣ W = w ) < 1 , for all w such that P ( W = w ) > 0 . Now, consider an over-simplified model (S) and assume that ( Y X = x X ∣ W ) S . Then, following usual arguments of causal inference [8, 10, 11], the causal effect of interest, A T E S x ; x * ≔ E S Y X = x − Y X = x * , would be estimated under this over-simplified model (S) as

A T E ̃ S x ; x * = ∑ w ∈ Ω W E Y ∣ W = w , X = x − E Y ∣ W = w , X = x * × P ( W = w ) .

Now assume that there exists some t 0 ∈ 1 , t 1 , some t 3 ∈ t 2 , T , such that ( Y X ̲ t 0 t 3 = x ̲ t 0 t 3 X ̲ t 0 t 3 ∣ W ) L . Let w be any given possible value of W such that P ( W = w ) > 0 , and x ̲ t 0 t 3 and x ̲ t 0 t 3 * in 0,1 t 3 − t 0 + 1 be any two possible profiles of X ̲ t 0 t 3 leading to X = x and X = x * , respectively, and such that P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ X = x , W = w ) > 0 and P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ X = x * , W = w ) > 0 . It follows that P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w ) > 0 and P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ W = w ) > 0 . Next, usual arguments of causal inference [8, 10, 11] yield

A T E L ∣ W = w x ̲ t 0 t 3 ; x ̲ t 0 t 3 * = ∑ w ∈ Ω W E L Y X ̲ t 0 t 3 = x ̲ t 0 t 3 − Y X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ W = w , = ∑ w ∈ Ω W E L Y X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 − E L Y X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 * , = ∑ w ∈ Ω W E Y ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 − E Y ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 * .

Because X ̲ t 0 t 3 d-separates X and W under model (L) [9, 44], we have, for any w in Ω W and any x ̲ t 0 t 3 in { 0,1 } t 3 − t 0 + 1 such that P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w ) > 0 ,

E Y ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 = E Y ∣ W = w , X = x , X ̲ t 0 t 3 = x ̲ t 0 t 3 ,

with x corresponding to the value taken by X when X ̲ t 0 t 3 = x ̲ t 0 t 3 . In other respect, we have

E Y ∣ W = w , X = x = ∑ x ̲ t 0 t 3 ∈ 0,1 t 3 − t 0 + 1 E Y ∣ W = w , X = x , X ̲ t 0 t 3 = x ̲ t 0 t 3 × P X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X = x , = ∑ x ̲ t 0 t 3 ∈ 0,1 t 3 − t 0 + 1 E Y ∣ W = w , X ̲ t 0 t 3 = x ̲ t 0 t 3 × P X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X = x , = ∑ x ̲ t 0 t 3 ∈ 0,1 t 3 − t 0 + 1 E L Y X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w × P X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X = x ,

where the second equality comes from the fact that X ̲ t 0 t 3 = x ̲ t 0 t 3 ⇒ X = x , for any x ̲ t 0 t 3 such that P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X = x ) ≠ 0 . This finally yields

A T E ̃ S ( x ; x * ) = ∑ w ∈ Ω W ∑ x ̲ t 0 t 3 , x ̲ t 0 t 3 * ∈ 0,1 t 3 − t 0 + 1 A T E L ∣ W = w x ̲ t 0 t 3 ; x ̲ t 0 t 3 * × P X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ X = x , W = w × P X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ X = x * , W = w × P ( W = w ) ,

where the sums are over all x ̄ t 0 t 3 and x ̄ t 0 t 3 * in { 0,1 } t 3 − t 0 + 1 such that P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 ∣ W = w , X = x ) and P ( X ̲ t 0 t 3 = x ̲ t 0 t 3 * ∣ W = w , X = x * ) , respectively, are not null.

The proof of the result under condition (T.Uncond) follows from similar, but simpler, arguments and is therefore omitted.

References

1. Agudo, A, Bonet, C, Travier, N, González, C, Vineis, P, Bueno-de Mesquita, H, et al.. Impact of cigarette smoking on cancer risk in the european prospective investigation into cancer and nutrition study. J Clin Oncol 2012;30:4550–7. https://doi.org/10.1200/jco.2011.41.0183.Search in Google Scholar PubMed

2. Bagnardi, V, Rota, M, Botteri, E, Tramacere, I, Islami, F, Fedirko, V, et al.. Alcohol consumption and site-specific cancer risk: a comprehensive dose-response meta-analysis. Br J Cancer 2015;112:580–93. https://doi.org/10.1038/bjc.2014.579.Search in Google Scholar PubMed PubMed Central

3. Lauby-Secretan, B, Scoccianti, C, Loomis, D, Grosse, Y, Bianchini, F, Straif, K. Body fatness and cancer - viewpoint of the iarc working group. N Engl J Med 2016;375:794–8. https://doi.org/10.1056/nejmsr1606602.Search in Google Scholar PubMed PubMed Central

4. Bradbury, KE, Appleby, PN, Tipper, SJ, Travis, RC, Allen, NE, Kvaskoff, M, et al.. Circulating insulin-like growth factor i in relation to melanoma risk in the european prospective investigation into cancer and nutrition. Int J Cancer 2019;144:957–66. https://doi.org/10.1002/ijc.31854.Search in Google Scholar PubMed PubMed Central

5. Chan, AT, Ogino, S, Giovannucci, EL, Fuchs, CS. Inflammatory markers are associated with risk of colorectal cancer and chemopreventive response to anti-inflammatory drugs. Gastroenterology 2011;140:799–808. https://doi.org/10.1053/j.gastro.2010.11.041.Search in Google Scholar PubMed PubMed Central

6. Dossus, L, Lukanova, A, Rinaldi, S, Allen, N, Cust, AE, Becker, S, et al.. Hormonal, metabolic, and inflammatory profiles and endometrial cancer risk within the epic cohort—a factor analysis. Am J Epidemiol 2013;177:787–99. https://doi.org/10.1093/aje/kws309.Search in Google Scholar PubMed

7. Hernan, MA, Robins, JM. Causal inference: what if. Boca Raton: Chapman & Hall/CRC; 2020 [forthcoming].Search in Google Scholar

8. Pearl, J. Causal inference in statistics: an overview. Stat Surv 2009;3:96–146. https://doi.org/10.1214/09-ss057.Search in Google Scholar

9. Pearl, J. Causality: models, reasoning, and inference. New York: Cambridge University Press; 2009.10.1017/CBO9780511803161Search in Google Scholar

10. Robins, J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model 1986;7:1393–512. https://doi.org/10.1016/0270-0255(86)90088-6.Search in Google Scholar

11. Rosenbaum, PR, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar

12. Daniel, RM, Cousens, S, DE Stavola, BL, Kenward, MG, Sterne, JA. Methods for dealing with time-dependent confounding. Stat Med 2012;32:1584–618. https://doi.org/10.1002/sim.5686.Search in Google Scholar PubMed

13. VanderWeele, TJ. Explanation in causal inference - methods for mediation and interaction. Oxford: Oxford University Press; 2015.10.1093/ije/dyw277Search in Google Scholar PubMed PubMed Central

14. VanderWeele, TJ, Tchetgen Tchetgen, E. Mediation analysis with time-varying exposures and mediators. J Roy Stat Soc B 2017;79:917–38. https://doi.org/10.1111/rssb.12194.Search in Google Scholar PubMed PubMed Central

15. Sofrygin, O, Zhu, Z, Schmittdiel, JA, Adams, AS, Grant, RW, van der Laan, MJ, et al.. Targeted learning with daily ehr data. Stat Med 2019;38:3073–90. https://doi.org/10.1002/sim.8164.Search in Google Scholar PubMed

16. Aalen, O, Røysland, K, Gran, J, Kouyos, R, Lange, T. Can we believe the dags? A comment on the relationship between causal dags and mechanisms. Stat Methods Med Res 2016;25:2294–314. https://doi.org/10.1177/0962280213520436.Search in Google Scholar PubMed PubMed Central

17. Maxwell, SE, Cole, DA. Bias in cross-sectional analyses of longitudinal mediation. Psychol Methods 2007;12:23–44. https://doi.org/10.1037/1082-989x.12.1.23.Search in Google Scholar PubMed

18. Maxwell, SE, Cole, DA, Mitchell, MA. Bias in cross-sectional analyses of longitudinal mediation: partial and complete mediation under an autoregressive model. Multivariate Behav Res 2011;46:816–41. https://doi.org/10.1080/00273171.2011.606716.Search in Google Scholar PubMed

19. Huang, Y, Valtorta, M. Identifiability in causal bayesian networks: a sound and complete algorithm. In: Proceedings of the twenty-first national conference on artificial intelligence (AAAI 2006). AAAI Press, Menlo Park, CA; 2006:1149–56 pp.Search in Google Scholar

20. Shpitser, I, Pearl, J. Identification of joint interventional distributions in recursive semi-markovian causal models. In: Proceedings of the 21st national conference on artificial intelligence and the 18th innovative applications of artificial intelligence conference (AAAI 2006). AAAI Press, Menlo Park, CA; 2006:1219–26 pp.Search in Google Scholar

21. Tian, J, Pearl, J. A general identification condition for causal effects. In: Proceedings of the eighteenth national conference on artificial intelligence. AAAI Press/The MIT Press, Menlo Park, CA; 2002:567–73 pp.Search in Google Scholar

22. Tian, J, Pearl, J. On the identification of causal effects. Technical report, cognitive systems laboratory, Los Angeles: University of California; 2003, Technical report 290-L.Search in Google Scholar

23. Arnold, M, Charvat, H, Freisling, H, Noh, H, Adami, H-O, Soerjomataram, I, et al.. Adulthood overweight and survival from breast and colorectal cancer in Swedish women. Cancer Epidemiol Biomarker Prevention 2019;18:1518–24. https://doi.org/10.1158/1055-9965.EPI-19-0075.Search in Google Scholar PubMed

24. Arnold, M, Freisling, H, Stolzenberg-Solomon, R, Kee, F, O’Doherty, M, Ordóẽz Mena, JM, et al.. Overweight duration in older adults and cancer risk: a study of cohorts in europe and the United States. Eur J Epidemiol 2016;31:893–904. https://doi.org/10.1007/s10654-016-0169-z.Search in Google Scholar PubMed PubMed Central

25. De Rubeis, V, Cotterchio, M, Smith, BT, Griffith, LE, Borgida, A, Gallinger, S, et al.. Trajectories of body mass index, from adolescence to older adulthood, and pancreatic cancer risk; a population-based case–control study in ontario, Canada. Cancer Causes Control 2019;30:955–66. https://doi.org/10.1007/s10552-019-01197-9.Search in Google Scholar PubMed PubMed Central

26. Fan, AZ, Russell, M, Stranges, S, Dorn, J, Trevisan, M. Association of lifetime alcohol drinking trajectories with cardiometabolic risk. J Clin Endocrinol Metabol 2008;93:154–61. https://doi.org/10.1210/jc.2007-1395.Search in Google Scholar PubMed PubMed Central

27. Kunzmann, AT, Coleman, HG, Huang, W-Y, Berndt, SI. The association of lifetime alcohol use with mortality and cancer risk in older adults: a cohort study. PLoS Med 2018;15:1–18. https://doi.org/10.1371/journal.pmed.1002585.Search in Google Scholar PubMed PubMed Central

28. Platt, A, Sloan, F, Costanzo, P. Alcohol-consumption trajectories and associated characteristics among adults older than age 50. J Stud Alcohol Drugs 2010;71:169–79. https://doi.org/10.15288/jsad.2010.71.169.Search in Google Scholar PubMed PubMed Central

29. Yang, Y, Dugu, P-A, Lynch, ' BM, Hodge, AM, Karahalios, A, MacInnis, RJ, et al.. Trajectories of body mass index in adulthood and all-cause and cause-specific mortality in the melbourne collaborative cohort study. BMJ Open 2019;9. https://doi.org/10.1136/bmjopen-2019-030078.Search in Google Scholar PubMed PubMed Central

30. Zheng, R, Du, M, Zhang, B, Xin, J, Chu, H, Ni, M, et al.. Body mass index (bmi) trajectories and risk of colorectal cancer in the plco cohort. Br J Cancer 2018;119:130–2. https://doi.org/10.1038/s41416-018-0121-y.Search in Google Scholar PubMed PubMed Central

31. Pearl, J. An introduction to causal inference. Int J Biostat 2010;6: Article 7. https://doi.org/10.2202/1557-4679.1203.Search in Google Scholar PubMed PubMed Central

32. Hernan, MA, VanderWeele, TJ. Compound treatments and transportability of causal inference. Epidemiology 2011;22:368–77. https://doi.org/10.1097/ede.0b013e3182109296.Search in Google Scholar

33. VanderWeele, TJ, Hernan, MA. Causal inference under multiple versions of treatment. J Causal Inference 2013;1:1–20. https://doi.org/10.1515/jci-2012-0002.Search in Google Scholar PubMed PubMed Central

34. Greenland, S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology 2003;14:300–6. https://doi.org/10.1097/01.ede.0000042804.12056.6c.Search in Google Scholar

35. Hernán, M. The hazards of hazard ratios. Epidemiology 2010;21:13–5. https://doi.org/10.1097/ede.0b013e3181c1ea43.Search in Google Scholar PubMed PubMed Central

36. Hernán, M, Hernández-Díaz, S, Robins, J. A structural approach to selection bias. Epidemiology 2004;15:615–25. https://doi.org/10.1097/01.ede.0000135174.63482.43.Search in Google Scholar PubMed

37. Peng, D, Luke, WM. To adjust or not to adjust? Sensitivity analysis of m-bias and butterfly-bias. J Causal Inference 2015;3:41–57. https://doi.org/10.1515/jci-2013-0021.Search in Google Scholar

38. Adams, R, Saria, S, Rosenblum, M. The impact of time series length and discretization on longitudinal causal estimation methods; 2020. arXiv preprint arXiv:2011.15099.Search in Google Scholar

39. Ferreira Guerra, S, Schnitzer, M, Amelie, F, Blais, L. Impact of discretization of the timeline for longitudinal causal inference methods. Stat Med 2020;39:4069–85. https://doi.org/10.1002/sim.8710.Search in Google Scholar PubMed

40. Beesley, LJ, Salvatore, M, Fritsche, LG, Pandit, A, Rao, A, Brummett, C, et al.. The emerging landscape of health research based on biobanks linked to electronic health records: existing resources, statistical challenges, and potential opportunities. Stat Med 2020;39:773–800. https://doi.org/10.1002/sim.8445.Search in Google Scholar PubMed PubMed Central

41. Agniel, D, Kohane, IS, Weber, GM. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 2018;361:1–9. https://doi.org/10.1136/bmj.k1479.Search in Google Scholar PubMed PubMed Central

42. Beesley, L, Mukherjee, B. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 2020. https://doi.org/10.1111/biom.13400.Search in Google Scholar PubMed

43. Beesley, LJ, Mukherjee, B. Bias reduction and inference for electronic health record data under selection and phenotype misclassification: three case studies. medRxiv 2020. https://doi.org/10.1101/2020.12.21.20248644.Search in Google Scholar PubMed PubMed Central

44. Verma, T, Pearl, J. Causal networks: semantics and expressiveness. In: Proceedings of the fourth workshop on uncertainty in artificial intelligence; 1988:352–9 pp.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0081).

Received: 2020-06-02

Revised: 2021-10-13

Accepted: 2021-10-14

Published Online: 2021-11-01

You are currently not able to access this content.

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/ijb-2020-0081

Keywords for this article

causal inference; identifiability; longitudinal model; structural causal model