Home MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
Article
Licensed
Unlicensed Requires Authentication

MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination

  • Martin Paries ORCID logo , Evelyne Vigneau , Adeline Huneau , Olivier Lantz and Stéphanie Bougeard EMAIL logo
Published/Copyright: December 13, 2023

Abstract

Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.


Corresponding author: Stéphanie Bougeard, Anses, Epidemiology, Health and Welfare, Laboratory of Ploufragan-Plouzané-Niort, Ploufragan, France, E-mail:

Acknowledgments

We thank from the Institut Pasteur, the Recombinant Protein Production and Purification core facility for SARS-CoV-2 protein preparation, the Molecular Biophysics core facility for their quality checking and Yves L. Janin (Unit of Chemistry and Biocatalysis), for providing the Hikarazine, the LuLISA substrate. We also thank for serum sample management the whole ICAReB team and COVID-19 support staff at Institut Pasteur, the team from the Eurofins Biomnis Sample Library and from CerbaHealthcare. We also thank the personnel of the Institut Curie who volunteered to participate to the Curiosa study, which was set up and managed by the staff of the clinical and laboratory departments of the Institut Curie.

  1. Research ethics: Not applicable.

  2. Author contributions: Write the manuscript: MP, EV, AH, SB, Designed the study: OL, Manage the data: OL, Analyzed and visualized the data: MP, EV, SB, Revised the manuscript: MP, EV, AH, SB, OL

  3. Competing interests: The authors state no competing interests.

  4. Research funding: The blood and clinical Study at Institut Curie was funded by Fondation de France, Agence Nationale de la Recherche (ANR-21-COVR-002) and Institutional funding from Institut Curie.

Appendix A

CURIE-O-SA data (N = 4383). Visualization of the correlation matrix of X ̃ (quantified variables), obtained through MPBCAOS with two components. The intensity of colors is linked to the intensity of the correlation. A strong positive correlation (close to 1) is colored in red, whereas a strong negative correlation (close to −1) is colored in blue.

Appendix B

Participants’ categories log10LN mean [CI, 95 %] log10LS mean [CI, 95 %] log10PNT mean [CI, 95 %]
Covid- Non.vacc 3.35 [3.33;3.36] 3.16 [3.14;3.18] 4.7 [4.68;4.71]
Covid- vacc 3.2 [3.17;3.25] 5.04 [5;5.08] 2.7 [2.63;2.79]
Covid + Non.vacc 4.3 [4.22;4.39] 4.25 [4.15;4.35] 4.09 [4;4.19]
Covid + vacc 4.21 [3.92;4.52] 5.23 [4.12;5.37] 2.57 [2.18;2.95]

CURIE-O-SA data (N = 4383). Means and 95 % confidence intervals for serological assays X 3 for participants’ categories from the joined information between infection and vaccination.

References

1. Skov, T, Honoré, AH, Jensen, HM, Næs, T, Engelsen, SB. Chemometrics in foodomics: handling data structures from multiple analytical platforms. TrAC, Trends Anal Chem 2014;60:71–9. https://doi.org/10.1016/j.trac.2014.05.004.Search in Google Scholar

2. Mishra, P, Roger, J-M, Jouan-Rimbaud-Bouveresse, D, Biancolillo, A, Marini, F, Nordon, A, et al.. Recent trends in multi-block data analysis in chemometrics for multi-source data integration. TrAC, Trends Anal Chem 2021;137:116206. https://doi.org/10.1016/j.trac.2021.116206.Search in Google Scholar

3. Bougeard, S, Cardinal, M. Multiblock modeling for complex preference study. Application to European preferences for smoked salmon. Food Qual Prefer 2014;32:56–64. https://doi.org/10.1016/j.foodqual.2013.06.002.Search in Google Scholar

4. Bougeard, S, Qannari, EM, Rose, N. Multiblock redundancy analysis: interpretation tools and application in epidemiology. J Chemometr 2011;25:467–75. https://doi.org/10.1002/cem.1392.Search in Google Scholar

5. Smilde, AK, Westerhuis, JA, de Jong, S. A framework for sequential multiblock component methods. J Chemometr 2003;17:323–37. https://doi.org/10.1002/cem.811.Search in Google Scholar

6. Tchandao Mangamana, E, Cariou, V, Vigneau, E, Glèlè Kakaï, RL, Qannari, EM. Unsupervised multiblock data analysis: a unified approach and extensions. Chemometr Intell Lab Syst 2019;194:103856. https://doi.org/10.1016/j.chemolab.2019.103856.Search in Google Scholar

7. Smilde, AK, Næs, T, Liland, KH. Multiblock data fusion in statistics and machine learning: applications in the natural and life sciences. Hoboken, NJ: Wiley; 2022.10.1002/9781119600978Search in Google Scholar

8. Wold, S, Geladi, P, Esbensen, K, Öhman, J. Multi-way principal components-and PLS-analysis. J Chemometr 1987;1:41–56. https://doi.org/10.1002/cem.1180010107.Search in Google Scholar

9. Wold, S, Kettaneh, N, Tjessem, K. Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. J Chemometr 1996;10:463–82. https://doi.org/10.1002/(sici)1099-128x(199609)10:5/6<463::aid-cem445>3.0.co;2-l.10.1002/(SICI)1099-128X(199609)10:5/6<463::AID-CEM445>3.3.CO;2-CSearch in Google Scholar

10. Cariou, V, Qannari, EM, Rutledge, DN, Vigneau, E. ComDim: from multiblock data analysis to path modeling. Food Qual Prefer 2018;67:27–34. https://doi.org/10.1016/j.foodqual.2017.02.012.Search in Google Scholar

11. Hanafi, M, Kohler, A, Qannari, EM. Shedding new light on hierarchical principal component analysis. J Chemometr 2010;24:703–9. https://doi.org/10.1002/cem.1334.Search in Google Scholar

12. Carroll, JD. Generalization of canonical correlation analysis to three of more sets of variables. Oxford: Oxford University Press; 1968.10.1037/e473742008-115Search in Google Scholar

13. Pagès, J. Multiple factor analysis by example using R. Boca Raton, Fla: CRC Press, Taylor & Francis Group; 2015.Search in Google Scholar

14. Lavit, C, Escoufier, Y, Sabatier, R, Traissac, P. The act (statis method). Comput Stat Data Anal 1994;18:97–119. https://doi.org/10.1016/0167-9473(94)90134-1.Search in Google Scholar

15. Stevens, SS. On the theory of scales of measurement. Science 1946;103:677–80. https://doi.org/10.1126/science.103.2684.677.Search in Google Scholar PubMed

16. Gifi, A. Nonlinear multivariate analysis. Hoboken: Wiley-Blackwell; 1990.Search in Google Scholar

17. Michailidis, G, De Leeuw, J. The Gifi system of descriptive multivariate analysis. Stat Sci 1998;1:307–36. https://doi.org/10.1214/ss/1028905828.Search in Google Scholar

18. Hirschfeld, HO. A connection between correlation and contingency. In: Mathematical proceedings of the cambridge philosophical society. Cambridge University Press; 1935:520–4 pp.10.1017/S0305004100013517Search in Google Scholar

19. Benzécri, J-P. L’analyse des données. Paris: Dunod; 1973.Search in Google Scholar

20. Di Ciaccio, A. Optimal coding of high-cardinality categorical data in machine learning. In: Scientific meeting of the classification and data analysis group of the italian statistical society. Springer; 2021:39–51 pp.10.1007/978-3-031-30164-3_4Search in Google Scholar

21. Linting, M, Meulman, JJ, Groenen, PJ, van der Koojj, AJ. Nonlinear principal components analysis: introduction and application. 2007; 12: 336, https://doi.org/10.1037/1082-989x.12.3.336.Search in Google Scholar PubMed

22. De Leeuw, J. History of nonlinear principal component analysis. California: UCLA: Department of Statistics; 2013.Search in Google Scholar

23. van der Burg, E, de Leeuw, J, Dijksterhuis, G. OVERALS. Comput Stat Data Anal 1994;18:141–63. https://doi.org/10.1016/0167-9473(94)90136-8.Search in Google Scholar

24. Tenenhaus, A, Tenenhaus, M. Regularized generalized canonical correlation analysis. Psychometrika 2011;76:257–84. https://doi.org/10.1007/s11336-011-9206-8.Search in Google Scholar

25. Hwang, H, Takane, Y. Nonlinear generalized structured component analysis. Behaviormetrika 2009;37:1–14. https://doi.org/10.2333/bhmk.37.1.Search in Google Scholar

26. Russolillo, G. Non-metric partial least squares. Electron J Stat 2012;6:1641–69. https://doi.org/10.1214/12-ejs724.Search in Google Scholar

27. Young, FW. Quantitative analysis of qualitative data. Psychometrika 1981;46:357–88. https://doi.org/10.1007/bf02293796.Search in Google Scholar

28. de Leeuw, J, Young, FW, Takane, Y. Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 1976;41:471–503. https://doi.org/10.1007/bf02296971.Search in Google Scholar

29. Kroonenberg, PM, De Leeuw, J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 1980;45:69–97. https://doi.org/10.1007/bf02293599.Search in Google Scholar

30. Kruskal, JB. Nonmetric multidimensional scaling: a numerical method. Psychometrika 1964;29:115–29. https://doi.org/10.1007/bf02289694.Search in Google Scholar

31. Campos, MP, Reis, MS. Data preprocessing for multiblock modelling – a systematization with new methods. Chemometr Intell Lab Syst 2020;199:103959. https://doi.org/10.1016/j.chemolab.2020.103959.Search in Google Scholar

32. Westerhuis, JA, Kourti, T, MacGregor, JF. Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr: J Chemom Soc 1998;12:301–21. https://doi.org/10.1002/(sici)1099-128x(199809/10)12:5<301::aid-cem515>3.0.co;2-s.10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-SSearch in Google Scholar

33. Chavent, M, Kuentz-Simonet, V, Labenne, A, Saracco, J. Multivariate analysis of mixed data: the R package PCAmixdata. arXiv; 2017.Search in Google Scholar

34. Van der Burg, E, De Leeuw, J, Verdegaal, R. Homogeneity analysis withk sets of variables: an alternating least squares method with optimal scaling features. Psychometrika 1988;53:177–97. https://doi.org/10.1007/bf02294131.Search in Google Scholar

35. Tenenhaus, M, Vinzi, VE, Chatelin, Y-M, Lauro, C. PLS path modeling. Comput Stat Data Anal 2005;48:159–205. https://doi.org/10.1016/j.csda.2004.03.005.Search in Google Scholar

36. Pagès, J. Analyse factorielle multiple appliquée aux variables qualitatives et aux données mixtes. Rev Stat Appl 2002;50:5–37.Search in Google Scholar

37. Pagès, J. Analyse factorielle de données mixtes. Rev Stat Appl 2004;52:93–111.Search in Google Scholar

38. Paries. PCA.OS: principal component analysis with optimal scaling features. R package version; 2022. Available from: https://github.com/martinparies/PCA.OS.Search in Google Scholar

39. Anna, F, Goyard, S, Lalanne, AI, Nevo, F, Gransagne, M, Souque, P, et al.. High seroprevalence but short‐lived immune response to SARS‐CoV‐2 infection in Paris. Eur J Immunol 2021;51:180–90. https://doi.org/10.1002/eji.202049058.Search in Google Scholar PubMed PubMed Central

40. Le Vu, S, Jones, G, Anna, F, Rose, T, Richard, J-B, Bernard-Stoecklin, S, et al.. Prevalence of SARS-CoV-2 antibodies in France: results from nationwide serological surveillance. Nat Commun 2021;12:3025. https://doi.org/10.1038/s41467-021-23233-6.Search in Google Scholar PubMed PubMed Central

41. Si, Y, Covello, L, Wang, S, Covello, T, Gelman, A. Beyond vaccination rates: a synthetic random proxy metric of total SARS-CoV-2 immunity seroprevalence in the community. Epidemiology 2022;33:457–64. https://doi.org/10.1097/ede.0000000000001488.Search in Google Scholar

42. Hall, V, Foulkes, S, Insalata, F, Kirwan, P, Saei, A, Atti, A, et al.. Protection against SARS-CoV-2 after covid-19 vaccination and previous infection. N Engl J Med 2022;386:1207–20. https://doi.org/10.1056/nejmoa2118691.Search in Google Scholar PubMed PubMed Central

43. Gower, JC. A general coefficient of similarity and some of its properties. Biometrics 1971;1:857–71. https://doi.org/10.2307/2528823.Search in Google Scholar

44. Pavoine, S, Vallet, J, Dufour, A-B, Gachet, S, Daniel, H. On the challenge of treating various types of variables: application for improving the measurement of functional diversity. Oikos 2009;118:391–402. https://doi.org/10.1111/j.1600-0706.2008.16668.x.Search in Google Scholar

45. Mariette, J, Villa-Vialaneix, N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics 2018;34:1009–15. https://doi.org/10.1093/bioinformatics/btx682.Search in Google Scholar PubMed

Received: 2023-06-05
Accepted: 2023-11-02
Published Online: 2023-12-13

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Research Articles
  3. Random forests for survival data: which methods work best and under what conditions?
  4. Flexible variable selection in the presence of missing data
  5. An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
  6. MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
  7. Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
  8. Hypothesis testing for detecting outlier evaluators
  9. Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
  10. Commentary
  11. Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
  12. Research Articles
  13. Optimizing personalized treatments for targeted patient populations across multiple domains
  14. Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
  15. History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
  16. Revisiting incidence rates comparison under right censorship
  17. Ensemble learning methods of inference for spatially stratified infectious disease systems
  18. The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
  19. Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
  20. Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
  21. Improving the mixed model for repeated measures to robustly increase precision in randomized trials
  22. Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
  23. A modified rule of three for the one-sided binomial confidence interval
  24. Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
  25. Bayesian estimation and prediction for network meta-analysis with contrast-based approach
  26. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Downloaded on 15.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2023-0062/html
Scroll to top button