The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis
-
Andrea Bellavia
, Ran S. Rotem
, Aisha S. Dickerson , Johnni Hansen , Ole Gredal and Marc G. Weisskopf
Abstract
Investigating the joint exposure to several risk factors is becoming a key component of epidemiologic studies. Individuals are exposed to multiple factors, often simultaneously, and evaluating patterns of exposures and high-dimension interactions may allow for a better understanding of health risks at the individual level. When jointly evaluating high-dimensional exposures, common statistical methods should be integrated with machine learning techniques that may better account for complex settings. Among these, Logic regression was developed to investigate a large number of binary exposures as they relate to a given outcome. This method may be of interest in several public health settings, yet has never been presented to an epidemiologic audience. In this paper, we review and discuss Logic regression as a potential tool for epidemiological studies, using an example of occupation history (68 binary exposures of primary occupations) and amyotrophic lateral sclerosis in a population-based Danish cohort. Logic regression identifies predictors that are Boolean combinations of the original (binary) exposures, fully operating within the regression framework of interest (e. g. linear, logistic). Combinations of exposures are graphically presented as Logic trees, and techniques for selecting the best Logic model are available and of high importance. While highlighting several advantages of the method, we also discuss specific drawbacks and practical issues that should be considered when using Logic regression in population-based studies. With this paper, we encourage researchers to explore the use of machine learning techniques when evaluating large-dimensional epidemiologic data, as well as advocate the need of further methodological work in the area.
Funding statement: This work was supported by the National Institute of Neurological Disorders and Stroke (Funder Id: http://dx.doi.org/10.13039/100000065, R21NS099910) and by the National Institute of Environmental Health Science (Funder Id: http://dx.doi.org/10.13039/100000066, R01ES028800).
Disclosure: The authors declare no conflict of interest.
References
Alexeeff, S. E., Yau, V., Qian, Y., Davignon, M., Lynch, F., Crawford, P., Davis, R., and Croen, L. A. (2017). Medical conditions in the first years of life associated with future diagnosis of ASD in children. Journal of Autism and Developmental Disorders, 47(7):2067–2079.10.1007/s10803-017-3130-4Search in Google Scholar PubMed PubMed Central
Aylward, L. L., Kirman, C. R., Schoeny, R., Portier, C. J., and Hays, S. M. (2013). Evaluation of biomonitoring data from the CDC national exposure report in a risk assessment context: perspectives across chemicals. Environmental Health Perspectives, 121(3):287–294.10.1289/ehp.1205740Search in Google Scholar PubMed PubMed Central
Billionnet, C., D. Sherrill, and I. Annesi-Maesano. 2012. “Estimating the health effects of exposure to multi-pollutant mixture.” Annals of Epidemiology 22 (2): 126–141.10.1016/j.annepidem.2011.11.004Search in Google Scholar PubMed
Breiman, L. (2017). Classification and Regression Trees. New York: Routledge.10.1201/9781315139470Search in Google Scholar
Bühlmann, P., and Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer Science & Business Media.10.1007/978-3-642-20192-9Search in Google Scholar
Carlin, D. J., Rider, C. V., Woychik, R., and Birnbaum, L. S. (2013). Unraveling the health effects of environmental mixtures: an NIEHS priority. Environmental Health Perspectives, 121(1):A6–8.10.1289/ehp.1206182Search in Google Scholar PubMed PubMed Central
Chen, C. C. M., Schwender, H., Keith, J., Nunkesser, R., Mengersen, K., and Macrossan, P. (2011). Methods for identifying SNP interactions: a review on variations of logic regression, random forest and bayesian logistic regression. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1580–1591.10.1109/TCBB.2011.46Search in Google Scholar PubMed
Dickerson, A. S., Hansen, J., Kioumourtzoglou, M.-A., Specht, A. J., Gredal, O., and Weisskopf, M. G. (2018). Study of occupation and amyotrophic lateral sclerosis in a danish cohort. Occupational and Environmental Medicine, 75(9):630–638.10.1136/oemed-2018-105110Search in Google Scholar PubMed PubMed Central
Dominici, F., Peng, R. D., Barr, C. D., and Bell, M. L. (2010). Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology (Cambridge, Mass.), 21(2):187.10.1097/EDE.0b013e3181cc86e8Search in Google Scholar PubMed PubMed Central
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1):1–67.10.1214/aos/1176347963Search in Google Scholar
Greenland, S. (1993). Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary-testing, and empirical-bayes regression. Statistics in Medicine, 12(8):717–736.10.1002/sim.4780120802Search in Google Scholar
Hansen, J., and Lassen, C. F. (2011). The supplementary pension fund register. Scandinavian Journal of Public Health, 39(7_suppl):99–102.10.1177/1403494810394716Search in Google Scholar
Howard, G. J., and Webster, T. F. (2012). Contrasting theories of interaction in epidemiology and toxicology. Environmental Health Perspectives, 121(1):1–6.10.1289/ehp.1205889Search in Google Scholar
Kioumourtzoglou, M.-A., Seals, R. M., Himmerslev, L., Gredal, O., Hansen, J., and Weisskopf, M. G. (2015). Comparison of diagnoses of amyotrophic lateral sclerosis by use of death certificates and hospital discharge data in the danish population. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 16(3–4):224–229.10.3109/21678421.2014.988161Search in Google Scholar
Lucek, P. R., and Ott, J. (1997). Neural network analysis of complex traits. Genetic Epidemiology, 14(6):1101–1106.10.1002/(SICI)1098-2272(1997)14:6<1101::AID-GEPI90>3.0.CO;2-KSearch in Google Scholar
Mooney, S. J., Westreich, D. J., and El-Sayed, A. M. (2015). Epidemiology in the era of big data. Epidemiology (Cambridge, Mass.), 26(3):390.10.1097/EDE.0000000000000274Search in Google Scholar
Naimi, A. I., Platt, R. W., and Larkin, J. C. (2018). Machine learning for fetal growth prediction. Epidemiology, 29(2):290–298.10.1097/EDE.0000000000000788Search in Google Scholar
Rothman, K. J., Greenland, S., and Lash, T. L. (2008). Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins.Search in Google Scholar
Ruczinski, I., Kooperberg, C., and LeBlanc, M. (2003). Logic regression. Journal of Computational and Graphical Statistics, 12(3):475–511.10.4135/9781412971980.n202Search in Google Scholar
Ruczinski, I., Kooperberg, C., and LeBlanc, M. L. (2004). Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications. Journal of Multivariate Analysis, 90(1):178–195.10.1016/j.jmva.2004.02.010Search in Google Scholar
Schmidt, M., Schmidt, S. A. J., Sandegaard, J. L., Ehrenstein, V., Pedersen, L., and Sørensen, H. T. (2015). The danish national patient registry: a review of content, data quality, and research potential. Clinical Epidemiology, 7:449.10.2147/CLEP.S91125Search in Google Scholar
Schwender, H., and Ickstadt, K. (2007). Identification of SNP interactions using logic regression. Biostatistics, 9(1):187–198.10.1093/biostatistics/kxm024Search in Google Scholar PubMed
Stafoggia, M., S. Breitner, R. Hampel, and X. Basagaña. 2017. “Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science.” Current Environmental Health Reports 4 (4): 481–490.10.1007/s40572-017-0162-zSearch in Google Scholar PubMed
Steenblock D. A., Ikrar, T., Antonio, A.S.S., Wardaningsih, E., and Azizi, M.J. 2018. “Amyotrophic Lateral Sclerosis (ALS) Linked to Intestinal Microbiota Dysbiosis & Systemic Microbial Infection in Human Patients: A Cross-Sectional Clinical Study.” International Journal of Neurodegenerative Disorders 1 (1): 003. https://www.researchgate.net/profile/Taruna_Ikrar/publication/329916527_Amyotrophic_Lateral_Sclerosis_ALS_Linked_to_Intestinal_Microbiota_Dysbiosis_Systemic_Microbial_Infection_in_Human_Patients_A_Cross-Sectional_Clinical_Study/links/5c2335b7458515a4c7f8f259/Amyotrophic-Lateral-Sclerosis-ALS-Linked-to-Intestinal-Microbiota-Dysbiosis-Systemic-Microbial-Infection-in-Human-Patients-A-Cross-Sectional-Clinical-Study.pdf.10.23937/ijnd-2017/1710003Search in Google Scholar
Taskiran, D., Sagduyu, A., Yüceyar, N., Kutay, F. Z., and Pögün, Ş. (2000). Increased cerebrospinal fluid and serum nitrite and nitrate levels in amyotrophic lateral sclerosis. International Journal of Neuroscience, 101(1-4):65–72.10.3109/00207450008986493Search in Google Scholar PubMed
Taylor, K. W., Joubert, B. R., Braun, J. M., Dilworth, C., Gennings, C., Hauser, R., Heindel, J. J., Rider, C. V., Webster, T. F., and Carlin, D. J. (2016). Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environmental Health Perspectives, 124(12):A227–29. https://doi.org/10.1289/EHP547.Search in Google Scholar PubMed PubMed Central
Thomas, D. C., Witte, J. S., and Greenland, S. (2007). Dissecting effects of complex mixtures: who’s afraid of informative priors? Epidemiology, 18(2):186–190.10.1097/01.ede.0000254682.47697.70Search in Google Scholar PubMed
Vanderweele, T. J. (2009). Sufficient cause interactions and statistical interactions. Epidemiology, 20(1):6–13. https://doi.org/10.1097/EDE.0b013e31818f69e7.Search in Google Scholar PubMed
Weisskopf, M. G., R. M. Seals, and T. F. Webster. 2018. “Bias amplification in epidemiologic analysis of exposure to mixtures.” Environmental Health Perspectives10.1289/EHP2450Search in Google Scholar PubMed PubMed Central
Xue, Y. C., Feuer, R., Cashman, N., and Luo, H. (2018). Enteroviral infection: the forgotten link to amyotrophic lateral sclerosis? Frontiers in Molecular Neuroscience, 12(11):63.10.3389/fnmol.2018.00063Search in Google Scholar PubMed PubMed Central
Yoo, W., Ference, B. A., Cote, M. L., and Schwartz, A. (2012). A comparison of logistic regression, logic regression, classification tree, and random forests to identify effective gene-gene and gene-environmental interactions. International Journal of Applied Science and Technology, 2(7):268.47003: 1.Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (DOI:https://doi.org/10.1515/em-2019-0032).
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Editorial
- The mean prevalence
- Research Articles
- Heterogeneous indirect effects for multiple mediators using interventional effect models
- Sleep habits and their association with daytime sleepiness among medical students of Tanta University, Egypt
- Population attributable fractions for continuously distributed exposures
- A real-time search strategy for finding urban disease vector infestations
- Disease mapping models for data with weak spatial dependence or spatial discontinuities
- A comparison of cause-specific and competing risk models to assess risk factors for dementia
- A simple index of prediction accuracy in multiple regression analysis
- A comparison of approaches for estimating combined population attributable risks (PARs) for multiple risk factors
- Posterior predictive treatment assignment methods for causal inference in the context of time-varying treatments
- Random effects tumour growth models for identifying image markers of mammography screening sensitivity
- Extrapolating sparse gold standard cause of death designations to characterize broader catchment areas
- Extending balance assessment for the generalized propensity score under multiple imputation
- Regression analysis of unmeasured confounding
- The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis
- Meeting the Assumptions of Inverse-Intensity Weighting for Longitudinal Data Subject to Irregular Follow-Up: Suggestions for the Design and Analysis of Clinic-Based Cohort Studies
Articles in the same Issue
- Editorial
- The mean prevalence
- Research Articles
- Heterogeneous indirect effects for multiple mediators using interventional effect models
- Sleep habits and their association with daytime sleepiness among medical students of Tanta University, Egypt
- Population attributable fractions for continuously distributed exposures
- A real-time search strategy for finding urban disease vector infestations
- Disease mapping models for data with weak spatial dependence or spatial discontinuities
- A comparison of cause-specific and competing risk models to assess risk factors for dementia
- A simple index of prediction accuracy in multiple regression analysis
- A comparison of approaches for estimating combined population attributable risks (PARs) for multiple risk factors
- Posterior predictive treatment assignment methods for causal inference in the context of time-varying treatments
- Random effects tumour growth models for identifying image markers of mammography screening sensitivity
- Extrapolating sparse gold standard cause of death designations to characterize broader catchment areas
- Extending balance assessment for the generalized propensity score under multiple imputation
- Regression analysis of unmeasured confounding
- The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis
- Meeting the Assumptions of Inverse-Intensity Weighting for Longitudinal Data Subject to Irregular Follow-Up: Suggestions for the Design and Analysis of Clinic-Based Cohort Studies