Home Medicine The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis
Article
Licensed
Unlicensed Requires Authentication

The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis

  • Andrea Bellavia ORCID logo EMAIL logo , Ran S. Rotem , Aisha S. Dickerson , Johnni Hansen , Ole Gredal and Marc G. Weisskopf
Published/Copyright: February 25, 2020

Abstract

Investigating the joint exposure to several risk factors is becoming a key component of epidemiologic studies. Individuals are exposed to multiple factors, often simultaneously, and evaluating patterns of exposures and high-dimension interactions may allow for a better understanding of health risks at the individual level. When jointly evaluating high-dimensional exposures, common statistical methods should be integrated with machine learning techniques that may better account for complex settings. Among these, Logic regression was developed to investigate a large number of binary exposures as they relate to a given outcome. This method may be of interest in several public health settings, yet has never been presented to an epidemiologic audience. In this paper, we review and discuss Logic regression as a potential tool for epidemiological studies, using an example of occupation history (68 binary exposures of primary occupations) and amyotrophic lateral sclerosis in a population-based Danish cohort. Logic regression identifies predictors that are Boolean combinations of the original (binary) exposures, fully operating within the regression framework of interest (e. g. linear, logistic). Combinations of exposures are graphically presented as Logic trees, and techniques for selecting the best Logic model are available and of high importance. While highlighting several advantages of the method, we also discuss specific drawbacks and practical issues that should be considered when using Logic regression in population-based studies. With this paper, we encourage researchers to explore the use of machine learning techniques when evaluating large-dimensional epidemiologic data, as well as advocate the need of further methodological work in the area.

Funding statement: This work was supported by the National Institute of Neurological Disorders and Stroke (Funder Id: http://dx.doi.org/10.13039/100000065, R21NS099910) and by the National Institute of Environmental Health Science (Funder Id: http://dx.doi.org/10.13039/100000066, R01ES028800).

  1. Disclosure: The authors declare no conflict of interest.

References

Alexeeff, S. E., Yau, V., Qian, Y., Davignon, M., Lynch, F., Crawford, P., Davis, R., and Croen, L. A. (2017). Medical conditions in the first years of life associated with future diagnosis of ASD in children. Journal of Autism and Developmental Disorders, 47(7):2067–2079.10.1007/s10803-017-3130-4Search in Google Scholar PubMed PubMed Central

Aylward, L. L., Kirman, C. R., Schoeny, R., Portier, C. J., and Hays, S. M. (2013). Evaluation of biomonitoring data from the CDC national exposure report in a risk assessment context: perspectives across chemicals. Environmental Health Perspectives, 121(3):287–294.10.1289/ehp.1205740Search in Google Scholar PubMed PubMed Central

Billionnet, C., D. Sherrill, and I. Annesi-Maesano. 2012. “Estimating the health effects of exposure to multi-pollutant mixture.” Annals of Epidemiology 22 (2): 126–141.10.1016/j.annepidem.2011.11.004Search in Google Scholar PubMed

Breiman, L. (2017). Classification and Regression Trees. New York: Routledge.10.1201/9781315139470Search in Google Scholar

Bühlmann, P., and Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer Science & Business Media.10.1007/978-3-642-20192-9Search in Google Scholar

Carlin, D. J., Rider, C. V., Woychik, R., and Birnbaum, L. S. (2013). Unraveling the health effects of environmental mixtures: an NIEHS priority. Environmental Health Perspectives, 121(1):A6–8.10.1289/ehp.1206182Search in Google Scholar PubMed PubMed Central

Chen, C. C. M., Schwender, H., Keith, J., Nunkesser, R., Mengersen, K., and Macrossan, P. (2011). Methods for identifying SNP interactions: a review on variations of logic regression, random forest and bayesian logistic regression. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1580–1591.10.1109/TCBB.2011.46Search in Google Scholar PubMed

Dickerson, A. S., Hansen, J., Kioumourtzoglou, M.-A., Specht, A. J., Gredal, O., and Weisskopf, M. G. (2018). Study of occupation and amyotrophic lateral sclerosis in a danish cohort. Occupational and Environmental Medicine, 75(9):630–638.10.1136/oemed-2018-105110Search in Google Scholar PubMed PubMed Central

Dominici, F., Peng, R. D., Barr, C. D., and Bell, M. L. (2010). Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology (Cambridge, Mass.), 21(2):187.10.1097/EDE.0b013e3181cc86e8Search in Google Scholar PubMed PubMed Central

Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1):1–67.10.1214/aos/1176347963Search in Google Scholar

Greenland, S. (1993). Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary-testing, and empirical-bayes regression. Statistics in Medicine, 12(8):717–736.10.1002/sim.4780120802Search in Google Scholar

Hansen, J., and Lassen, C. F. (2011). The supplementary pension fund register. Scandinavian Journal of Public Health, 39(7_suppl):99–102.10.1177/1403494810394716Search in Google Scholar

Howard, G. J., and Webster, T. F. (2012). Contrasting theories of interaction in epidemiology and toxicology. Environmental Health Perspectives, 121(1):1–6.10.1289/ehp.1205889Search in Google Scholar

Kioumourtzoglou, M.-A., Seals, R. M., Himmerslev, L., Gredal, O., Hansen, J., and Weisskopf, M. G. (2015). Comparison of diagnoses of amyotrophic lateral sclerosis by use of death certificates and hospital discharge data in the danish population. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 16(3–4):224–229.10.3109/21678421.2014.988161Search in Google Scholar

Lucek, P. R., and Ott, J. (1997). Neural network analysis of complex traits. Genetic Epidemiology, 14(6):1101–1106.10.1002/(SICI)1098-2272(1997)14:6<1101::AID-GEPI90>3.0.CO;2-KSearch in Google Scholar

Mooney, S. J., Westreich, D. J., and El-Sayed, A. M. (2015). Epidemiology in the era of big data. Epidemiology (Cambridge, Mass.), 26(3):390.10.1097/EDE.0000000000000274Search in Google Scholar

Naimi, A. I., Platt, R. W., and Larkin, J. C. (2018). Machine learning for fetal growth prediction. Epidemiology, 29(2):290–298.10.1097/EDE.0000000000000788Search in Google Scholar

Rothman, K. J., Greenland, S., and Lash, T. L. (2008). Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins.Search in Google Scholar

Ruczinski, I., Kooperberg, C., and LeBlanc, M. (2003). Logic regression. Journal of Computational and Graphical Statistics, 12(3):475–511.10.4135/9781412971980.n202Search in Google Scholar

Ruczinski, I., Kooperberg, C., and LeBlanc, M. L. (2004). Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications. Journal of Multivariate Analysis, 90(1):178–195.10.1016/j.jmva.2004.02.010Search in Google Scholar

Schmidt, M., Schmidt, S. A. J., Sandegaard, J. L., Ehrenstein, V., Pedersen, L., and Sørensen, H. T. (2015). The danish national patient registry: a review of content, data quality, and research potential. Clinical Epidemiology, 7:449.10.2147/CLEP.S91125Search in Google Scholar

Schwender, H., and Ickstadt, K. (2007). Identification of SNP interactions using logic regression. Biostatistics, 9(1):187–198.10.1093/biostatistics/kxm024Search in Google Scholar PubMed

Stafoggia, M., S. Breitner, R. Hampel, and X. Basagaña. 2017. “Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science.” Current Environmental Health Reports 4 (4): 481–490.10.1007/s40572-017-0162-zSearch in Google Scholar PubMed

Steenblock D. A., Ikrar, T., Antonio, A.S.S., Wardaningsih, E., and Azizi, M.J. 2018. “Amyotrophic Lateral Sclerosis (ALS) Linked to Intestinal Microbiota Dysbiosis & Systemic Microbial Infection in Human Patients: A Cross-Sectional Clinical Study.” International Journal of Neurodegenerative Disorders 1 (1): 003. https://www.researchgate.net/profile/Taruna_Ikrar/publication/329916527_Amyotrophic_Lateral_Sclerosis_ALS_Linked_to_Intestinal_Microbiota_Dysbiosis_Systemic_Microbial_Infection_in_Human_Patients_A_Cross-Sectional_Clinical_Study/links/5c2335b7458515a4c7f8f259/Amyotrophic-Lateral-Sclerosis-ALS-Linked-to-Intestinal-Microbiota-Dysbiosis-Systemic-Microbial-Infection-in-Human-Patients-A-Cross-Sectional-Clinical-Study.pdf.10.23937/ijnd-2017/1710003Search in Google Scholar

Taskiran, D., Sagduyu, A., Yüceyar, N., Kutay, F. Z., and Pögün, Ş. (2000). Increased cerebrospinal fluid and serum nitrite and nitrate levels in amyotrophic lateral sclerosis. International Journal of Neuroscience, 101(1-4):65–72.10.3109/00207450008986493Search in Google Scholar PubMed

Taylor, K. W., Joubert, B. R., Braun, J. M., Dilworth, C., Gennings, C., Hauser, R., Heindel, J. J., Rider, C. V., Webster, T. F., and Carlin, D. J. (2016). Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environmental Health Perspectives, 124(12):A227–29. https://doi.org/10.1289/EHP547.Search in Google Scholar PubMed PubMed Central

Thomas, D. C., Witte, J. S., and Greenland, S. (2007). Dissecting effects of complex mixtures: who’s afraid of informative priors? Epidemiology, 18(2):186–190.10.1097/01.ede.0000254682.47697.70Search in Google Scholar PubMed

Vanderweele, T. J. (2009). Sufficient cause interactions and statistical interactions. Epidemiology, 20(1):6–13. https://doi.org/10.1097/EDE.0b013e31818f69e7.Search in Google Scholar PubMed

Weisskopf, M. G., R. M. Seals, and T. F. Webster. 2018. “Bias amplification in epidemiologic analysis of exposure to mixtures.” Environmental Health Perspectives10.1289/EHP2450Search in Google Scholar PubMed PubMed Central

Xue, Y. C., Feuer, R., Cashman, N., and Luo, H. (2018). Enteroviral infection: the forgotten link to amyotrophic lateral sclerosis? Frontiers in Molecular Neuroscience, 12(11):63.10.3389/fnmol.2018.00063Search in Google Scholar PubMed PubMed Central

Yoo, W., Ference, B. A., Cote, M. L., and Schwartz, A. (2012). A comparison of logistic regression, logic regression, classification tree, and random forests to identify effective gene-gene and gene-environmental interactions. International Journal of Applied Science and Technology, 2(7):268.47003: 1.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (DOI:https://doi.org/10.1515/em-2019-0032).


Received: 2019-02-21
Revised: 2019-12-12
Accepted: 2020-01-12
Published Online: 2020-02-25

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Editorial
  2. The mean prevalence
  3. Research Articles
  4. Heterogeneous indirect effects for multiple mediators using interventional effect models
  5. Sleep habits and their association with daytime sleepiness among medical students of Tanta University, Egypt
  6. Population attributable fractions for continuously distributed exposures
  7. A real-time search strategy for finding urban disease vector infestations
  8. Disease mapping models for data with weak spatial dependence or spatial discontinuities
  9. A comparison of cause-specific and competing risk models to assess risk factors for dementia
  10. A simple index of prediction accuracy in multiple regression analysis
  11. A comparison of approaches for estimating combined population attributable risks (PARs) for multiple risk factors
  12. Posterior predictive treatment assignment methods for causal inference in the context of time-varying treatments
  13. Random effects tumour growth models for identifying image markers of mammography screening sensitivity
  14. Extrapolating sparse gold standard cause of death designations to characterize broader catchment areas
  15. Extending balance assessment for the generalized propensity score under multiple imputation
  16. Regression analysis of unmeasured confounding
  17. The Use of Logic Regression in Epidemiologic Studies to Investigate Multiple Binary Exposures: An Example of Occupation History and Amyotrophic Lateral Sclerosis
  18. Meeting the Assumptions of Inverse-Intensity Weighting for Longitudinal Data Subject to Irregular Follow-Up: Suggestions for the Design and Analysis of Clinic-Based Cohort Studies
Downloaded on 14.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/em-2019-0032/html
Scroll to top button