Home Outliers in nutrient intake data for U.S. adults: national health and nutrition examination survey 2017–2018
Article
Licensed
Unlicensed Requires Authentication

Outliers in nutrient intake data for U.S. adults: national health and nutrition examination survey 2017–2018

  • Sara Burcham ORCID logo EMAIL logo , Yuki Liu , Ashley L. Merianos ORCID logo and Angelico Mendy ORCID logo
Published/Copyright: November 10, 2023
Become an author with De Gruyter Brill

Abstract

Objectives

An important step in preparing data for statistical analysis is outlier detection and removal, yet no gold standard exists in current literature. The objective of this study is to identify the ideal decision test using the National Health and Nutrition Examination Survey (NHANES) 2017–2018 dietary data.

Methods

We conducted a secondary analysis of NHANES 24-h dietary recalls, considering the survey's multi-stage cluster design. Six outlier detection and removal strategies were assessed by evaluating the decision tests' impact on the Pearson's correlation coefficient among macronutrients. Furthermore, we assessed changes in the effect size estimates based on pre-defined sample sizes. The data were collected as part of the 2017–2018 24-h dietary recall among adult participants (N=4,893).

Results

Effect estimate changes for macronutrients varied from 6.5 % for protein to 39.3 % for alcohol across all decision tests. The largest proportion of outliers removed was 4.0 % in the large sample size, for the decision test, >2 standard deviations from the mean. The smallest sample size, particularly for alcohol analysis, was most affected by the six decision tests when compared to no decision test.

Conclusions

This study, the first to use 2017–2018 NHANES dietary data for outlier evaluation, emphasizes the importance of selecting an appropriate decision test considering factors such as statistical power, sample size, normality assumptions, the proportion of data removed, effect estimate changes, and the consistency of estimates across sample sizes. We recommend the use of non-parametric tests for non-normally distributed variables of interest.


Corresponding author: Sara Burcham, MPH, Division of Epidemiology, Department of Environmental & Public Health Sciences, The University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA, E-mail:

Award Identifier / Grant number: K01DA044313

Award Identifier / Grant number: P30 ES006096

Award Identifier / Grant number: R21ES032161

Acknowledgments

We thank Dr. Mary Beth Genter, Professor at the University of Cincinnati College of Medicine Department of Environmental and Public Health Sciences for her review and comments. All authors read and approved the final manuscript.

  1. Research ethics: This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving research study participants were approved by the National Center for Health Statistics Ethics Review Board. Written informed consent was obtained from all subjects/patients.

  2. Informed consent: Written informed consent was obtained from all subjects/patients.

  3. Author contributions: The authors has accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Competing interests: The authors state no conflict of interest.

  5. Research funding: Ashley L. Merianos’ contribution was partly funded by grants K01DA044313 and R21ES032161 from the National Institutes of Health. Angelico Mendy’s contribution was partly funded by grant P30 ES006096 from the National Institutes of Health. Yuki Liu is employed by Intuitive Surgical and contributed to this research as part of a personal outside of work. The research and research results are not, in any way, associated with Stanford University. YL has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. The funders had no role in the design, analysis or writing of this article.

  6. Data availability: Not applicable.

References

1. Lee, MS, Carcone, AI, Ko, L, Kulik, N, Ellis, DA, Naar, S. Managing outliers in adolescent food frequency questionnaire data. J Nutr Educ Behav 2021;53:28–35. https://doi.org/10.1016/j.jneb.2020.08.002.Search in Google Scholar PubMed PubMed Central

2. Kwak, SK, Kim, JH. Statistical data preparation: management of missing values and outliers. Korean J Anesthesiol 2017;70:407. https://doi.org/10.4097/kjae.2017.70.4.407.Search in Google Scholar PubMed PubMed Central

3. Thakwalakwa, CM, Kuusipalo, HM, Maleta, KM, Phuka, JC, Ashorn, P, Cheung, YB. The validity of a structured interactive 24-hour recall in estimating energy and nutrient intakes in 15-month-old rural Malawian children: the validity of 24 h recall. Matern Child Nutr 2012;8:380–9. https://doi.org/10.1111/j.1740-8709.2010.00283.x.Search in Google Scholar PubMed PubMed Central

4. Maniruzzaman, M, Rahman, MJ, Al-MehediHasan, M, Suri, HS, Abedin, MM, El-Baz, A, et al.. Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst 2018;42:92. https://doi.org/10.1007/s10916-018-0940-7.Search in Google Scholar PubMed PubMed Central

5. Curran-Everett, D. Explorations in statistics: the assumption of normality. Adv Physiol Educ 2017;41:449–53. https://doi.org/10.1152/advan.00064.2017.Search in Google Scholar PubMed

6. Pollard, TJ, Johnson, AEW, Raffa, JD, Mark, RG. tableone: an open source python package for producing summary statistics for research papers. JAMIA Open 2018;1:26–31. https://doi.org/10.1093/jamiaopen/ooy012.Search in Google Scholar PubMed PubMed Central

7. Mowbray, FI, Fox-Wasylyshyn, SM, El-Masri, MM. Univariate outliers: a conceptual overview for the nurse researcher. Can J Nurs Res 2019;51:31–7. https://doi.org/10.1177/0844562118786647.Search in Google Scholar PubMed

8. van der Spoel, E, Choi, J, Roelfsema, F, le Cessie, S, van Heemst, D, Dekkers, OM. Comparing methods for measurement error detection in serial 24-h hormonal data. J Biol Rhythm 2019;34:347–63. https://doi.org/10.1177/0748730419850917.Search in Google Scholar PubMed PubMed Central

9. SAS Institute Inc. SAS® 9.4 Language reference: concepts, 6th ed. Cary, NC: SAS Institute Inc.; 2013.Search in Google Scholar

10. Centers for Disease Control and Prevention, National Center for Health Statistics. National health and nutrition examination survey; 2021. Why I was selected? https://www.cdc.gov/nchs/nhanes/participant/participant-selected.htm#:∼:text=We%20cannot%20go%20to%20every,process%20using%20U.S.%20Census%20information [Accessed 9 Mar 2023].Search in Google Scholar

11. National Center for Health Statistics. 2017-2018 data documentation, codebook, and frequencies – dietary interview – individual foods, first day [Internet]; 2020. https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1IFF_J.htm [Accessed 20 May 2022].Search in Google Scholar

12. Liu, J, Micha, R, Li, Y, Mozaffarian, D. Trends in food sources and diet quality among US children and adults, 2003–2018. JAMA Netw Open 2021;4:e215262. https://doi.org/10.1001/jamanetworkopen.2021.5262.Search in Google Scholar PubMed PubMed Central

13. Food Surveys Research Group. Key points using WWEIA NHANES 2017–2018 [Internet]. Agricultural Research Service, USDA; 2021. https://www.ars.usda.gov/ARSUserFiles/80400530/pdf/1718/Key%20Points%20Using%20WWEIA%20NHANES%202017-2018.pdf [Accessed 26 Jul 2022].Search in Google Scholar

14. Moshfegh, AJ, Rhodes, DG, Baer, DJ, Murayi, T, Clemens, JC, Rumpler, WV, et al.. The US department of agriculture automated multiple-pass method reduces bias in the collection of energy intakes. Am J Clin Nutr 2008;88:324–32. https://doi.org/10.1093/ajcn/88.2.324.Search in Google Scholar PubMed

15. Tanner, KJ, Watowicz, RP. Development of a tool to measure the number of foods and beverages consumed by children using national health and nutrition examination survey (NHANES) FFQ data. Publ Health Nutr 2018;21:1486–94. https://doi.org/10.1017/s1368980017004098.Search in Google Scholar PubMed PubMed Central

16. Saylor, J, Friedmann, E, Lee, HJ. Navigating complex sample analysis using national survey data. Nurs Res 2012;61:231–7. https://doi.org/10.1097/nnr.0b013e3182533403.Search in Google Scholar

17. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. Available from: https://www.R-project.org/.Search in Google Scholar

18. Kowalkowska, J, Wadolowska, L. The 72-item semi-quantitative food frequency questionnaire (72-item SQ-FFQ) for polish young adults: reproducibility and relative validity. Nutrients 2022;14:2696. https://doi.org/10.3390/nu14132696.Search in Google Scholar PubMed PubMed Central

19. Leaf, A, Antonio, J. The effects of overfeeding on body composition: the role of macronutrient composition. Int J Exerc Sci 2017;10:1275–96.10.70252/HPPF5281Search in Google Scholar

20. Gress, TW, Denvir, J, Shapiro, JI. Effect of removing outliers on statistical inference: implications to interpretation of experimental data in medical research. Marshall J Med 2018;4:9. https://doi.org/10.18590/mjm.2018.vol4.iss2.9.Search in Google Scholar PubMed PubMed Central

21. Tukey, J. Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company; 1977.Search in Google Scholar

22. Hickman, PE, Koerbin, G, Potter, JM, Glasgow, N, Cavanaugh, JA, Abhayaratna, WP, et al.. Choice of statistical tools for outlier removal causes substantial changes in analyte reference intervals in healthy populations. Clin Chem 2020;66:1558–61. https://doi.org/10.1093/clinchem/hvaa208.Search in Google Scholar PubMed

23. Sullivan, JH, Warkentin, M, Wallace, L. So many ways for assessing outliers: what really works and does it matter? J Bus Res 2021;132:530–43. https://doi.org/10.1016/j.jbusres.2021.03.066.Search in Google Scholar

24. Forouhi, NG, Krauss, RM, Taubes, G, Willett, W. Dietary fat and cardiometabolic health: evidence, controversies, and consensus for guidance. BMJ 2018;361:k2139.10.1136/bmj.k2139Search in Google Scholar PubMed PubMed Central

25. Shan, Z, Rehm, CD, Rogers, G, Ruan, M, Wang, DD, Hu, FB, et al.. Trends in dietary carbohydrate, protein, and fat intake and diet quality among US adults, 1999–2016. JAMA 2019;322:1178. https://doi.org/10.1001/jama.2019.13771.Search in Google Scholar PubMed PubMed Central

26. Willett, W. Chapter 11. Implications of total energy intake for epidemiologic analyses, 3rd ed. New York, NY: Oxford University Press; 2013.Search in Google Scholar

27. Livingstone, MBE, Black, AE. Markers of the validity of reported energy intake. J Nutr 2003;133:895S–920S. https://doi.org/10.1093/jn/133.3.895s.Search in Google Scholar PubMed

28. Čampulová, M, Veselík, P, Michálek, J. Control chart and six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM 10. Atmos Pollut Res 2017;8:700–8. https://doi.org/10.1016/j.apr.2017.01.004.Search in Google Scholar

29. Sangra, RA, Codina, AF. The identification, impact and management of missing values and outlier data in nutritional epidemiology. Nutr Hosp 2015;31:189–95. https://doi.org/10.3305/nh.2015.31.sup3.8766.Search in Google Scholar PubMed

Received: 2023-05-18
Accepted: 2023-10-18
Published Online: 2023-11-10

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 13.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/em-2023-0018/html
Scroll to top button