Abstract
In integrative analysis parametric or nonparametric methods are often used. The former is easier for interpretation but not robust, while the latter is robust but not easy to interpret the relationships among the different types of variables. To combine the advantages of both methods and for flexibility, here a system of semiparametric projection non-linear regression models is proposed for the integrative analysis, to model the innate coordinate structure of these different types of data, and a diagnostic tool is constructed to classify new subjects to the case or control group. Simulation studies are conducted to evaluate the performance of the proposed method, and shows promising results. Then the method is applied to analyze a real omics data from The Cancer Genome Atlas study, compared the results with those from the similarity network fusion, another integrative analysis method, and results from our method are more reasonable.
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Proof of Theorem 1
The conditional density of Y given X is
Let g(x) be the density of X, the joint density of (Y, X) is
be the log-likelihood ratio. Let P be the probability measure of
Note that
Denote
so if we show that
which is the desired result.
Now we show that
By our specification of
By our model specification, and (C1)–(C4) (note that (C1) together with
So,
By (C1),
Thus, for some generic constant
and so by Theorem 2.4.1 in van der Vaart and Wellner [29],
Proof of Theorem 2
We only give an outline proof, the detailed proof is similar to that of Theorem 2 in Yuan, Yin and Tan [30] and is omitted.
Below we identify the asymptotic covariance matrices of
We first compute the efficient score for β and α. For a function
are the scores for β and α. We assume dim(β) > 1. The score operator for f at direction g = (g1, g2, g3)⊺ is, with
Note that
Let Pn and P as defined in the proof of Theorem 1, Z = (Y, X), and
This will complete the proof.
Since we proved the asymptotic equivalent of
and
Applying the constraint
Denote
Since
where the op(1) is in the vector sense, and
From these facts we get
and
Denote
Using Taylor expansion, and note that
The above and (A.2) give
Similarly,
The above and (A.3) give
Now, subtracting (A.4) from (A.5) we get
It is known that
So (A.6) is
Denote
which gives the desired result.
Proof of Theorem 3
We only give the proof for
and
Then
The mean in the right hand side above is
By the Lemma in Yuan, Yin and Tan (2019), the last term on the right hand side above is
and so we can just write
The rest proof is similar to that of Theorem 3 in Yuan, Yin, Tan [30] and is omitted.
References
1. Ramaswamy, V, Chanin, ML, Angell, J, Barnett, J, Gaffen, D, Gelman, M, et al.. Stratospheric temperature trends: observations and model simulations. Rev Geophys 2001;39:71–122. https://doi.org/10.1029/1999rg000065.10.1029/1999RG000065Search in Google Scholar
2. Mikeska, T, Alsop, K, Australian Ovarian Cancer Study Group, Mitchell, G, Bowtell, DD, Dobrovic, A. No evidence for PALB2 methylation in high-grade serous ovarian cancer. J Ovarian Res 2013;6:26. https://doi.org/10.1186/1757-2215-6-26.10.1186/1757-2215-6-26Search in Google Scholar PubMed PubMed Central
3. Curran, PJ, Hussong, AM. Integrative data analysis: the simultaneous analysis of multiple data sets. Psychol Methods 2009;14:81–100. https://doi.org/10.1037/a0015914.10.1037/a0015914Search in Google Scholar PubMed PubMed Central
4. Gao, J, Aksoy, BA, Dogrusoz, U, Dresdner, G, Gross, B, Sumer, SO, et al.. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 2013;6:pl1. https://doi.org/10.1126/scisignal.2004088.10.1126/scisignal.2004088Search in Google Scholar PubMed PubMed Central
5. Roadmap Epigenomics Consortium, Kundaje, A, Meuleman, W, Ernst, J, Bilenky, M, Yen, A, et al.. Integrative analysis of 111 reference human epigenomes. Nature 2015;518:317–30. https://doi.org/10.1038/nature14248.10.1038/nature14248Search in Google Scholar PubMed PubMed Central
6. Li, W, Zhou, H, Abujarour, R, Zhu, S, Joo, JY, Lin, T, et al.. Generation of human-induced pluripotent stem cells in the absence of exogenous Sox2. Stem Cell 2009;27:2992–3000. https://doi.org/10.1002/stem.240.10.1002/stem.240Search in Google Scholar PubMed PubMed Central
7. Cacchiarelli, D, Trapnell, C, Ziller, MJ, Soumillon, M, Cesana, M, Karnik, R, et al.. Integrative analyses of human reprogramming reveal dynamic nature of induced pluripotency. Cell 2015;162:412–24. https://doi.org/10.1016/j.cell.2015.06.016.10.1016/j.cell.2015.06.016Search in Google Scholar PubMed PubMed Central
8. Castro, FG, Kellison, JG, Boyd, SJ, Kopak, A. A methodology for conducting integrative mixed methods research and data analyses. J Mix Methods Res 2010;4:342–60. https://doi.org/10.1177/1558689810382916.10.1177/1558689810382916Search in Google Scholar PubMed PubMed Central
9. Zhao, Q, Shi, X, Huang, J, Liu, J, Li, Y, Ma, S. Integrative analysis of ‘-Omics’ data using penalty functions. Wiley Interdiscip Rev Comput Stat 2015;7:99–108. https://doi.org/10.1002/wics.1322.10.1002/wics.1322Search in Google Scholar PubMed PubMed Central
10. Fang, H, Huang, H, Yuan, A, Fan, R, Tan, MT. Structural equation modelling for cancer early detection with integrative data; 2019. (Submitted).Search in Google Scholar
11. Shen, R, Wang, S, Mo, Q. Sparse integrative clustering of multiple omics data sets. Ann Appl Stat 2013;7:269–94. https://doi.org/10.1214/12-aoas578.10.1214/12-AOAS578Search in Google Scholar PubMed PubMed Central
12. Lock, EF, Dunson, DB. Bayesian consensus clustering. Bioinformatics 2013;29:2610–6. https://doi.org/10.1093/bioinformatics/btt425.10.1093/bioinformatics/btt425Search in Google Scholar
13. Wang, B, Mezlini, AM, Demir, F, Fiume, M, Tu, Z, Brudno, M, et al.. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014;11:333–7. https://doi.org/10.1038/nmeth.2810.10.1038/nmeth.2810Search in Google Scholar
14. Zhang, S, Liu, CC, Li, W, Shen, H, Laird, PW, Zhou, XJ. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 2012;40:9379–91. https://doi.org/10.1093/nar/gks725.10.1093/nar/gks725Search in Google Scholar
15. Wei, Y. Integrative analyses of cancer data: a review from a statistical perspective. Cancer Inform 2015;14:173–81. https://doi.org/10.4137/cin.s17303.10.4137/CIN.S17303Search in Google Scholar
16. Klein, RW, Spady, RH. An efficient semiparametric estimator for binary response models. Econometrica 1993;61:387. https://doi.org/10.2307/2951556.10.2307/2951556Search in Google Scholar
17. Cox, DR. Regression models and life-tables. J Roy Stat Soc B 1972;34:187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.10.1007/978-1-4612-4380-9_37Search in Google Scholar
18. Qin, J, Garcia, TP, Ma, Y, Tang, MX, Marder, K, Wang, Y. Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. Ann Appl Stat 2014;8:1182–208. https://doi.org/10.1214/14-aoas730.10.1214/14-AOAS730Search in Google Scholar
19. Yuan, A, Chen, X, Zhou, Y, Tan, MT. Subgroup analysis with semiparametric models toward precision medicine. Stat Med 2018;37:1830–45. https://doi.org/10.1002/sim.7638.10.1002/sim.7638Search in Google Scholar
20. Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B. 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
21. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat Med 1997;16:385–95. https://doi.org/10.1002/(sici)1097-0258(19970228)16:4%3c;385::aid-sim380%3e;3.0.co;2-3.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Search in Google Scholar
22. Edwards, D. An introduction to graphical modelling, 2nd ed. New York: Springer Verlag; 2000.10.1007/978-1-4612-0493-0Search in Google Scholar
23. Anandkumar, A, Tan, VYF, Huang, F, Willsky, A. High-dimensional Gaussian graphical model selection: walk summability and local separation criterion. J Mach Learn Res 2012;13:2293–337.Search in Google Scholar
24. Yuan, M, Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007;94:19–35. https://doi.org/10.1093/biomet/asm018.10.1093/biomet/asm018Search in Google Scholar
25. Friedman, J, Hastie, T, Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008;9:432–41. https://doi.org/10.1093/biostatistics/kxm045.10.1093/biostatistics/kxm045Search in Google Scholar PubMed PubMed Central
26. Robertson, T, Wright, FT, Dykstra, R. Order restricted statistical inference. Chichester, New York, Brisbane, Toronto, Singapore: John Wiley & Sons; 1988.Search in Google Scholar
27. Best, MJ, Chakravarti, N. Active set algorithms for isotonic regression; a unifying framework. Math Program 1990;47:425–39. https://doi.org/10.1007/bf01580873.10.1007/BF01580873Search in Google Scholar
28. van der Vaart, A. Semiparametric statistics, in part III. Lectures on probability theory and statistics. Berlin: Springer; 2002.Search in Google Scholar
29. van der Vaart, A, Wellner, J. Weak convergence and empirical processes. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar
30. Yuan, A, Yin, A, Tan, MT. Semiparametric subgroup causal inference on treatment difference; 2019. (Submitted).Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- The method of envelopes to concisely calculate semiparametric efficient scores under parametric restrictions
- A machine learning-based approach for estimating and testing associations with multivariate outcomes
- Shrinkage estimation applied to a semi-nonparametric regression model
- Multivariate quasi-beta regression models for continuous bounded data
- Integrative analysis with a system of semiparametric projection non-linear regression models
- Seemingly unrelated regression with measurement error: estimation via Markov Chain Monte Carlo and mean field variational Bayes approximation
- Alternatives to the Kaplan–Meier estimator of progression-free survival
- Two-stage receiver operating-characteristic curve estimator for cohort studies
- Estimating the area under a receiver operating characteristic curve using partially ordered sets
- Modelling ethnic differences in the distribution of insulin resistance via Bayesian nonparametric processes: an application to the SABRE cohort study
- Co-localization analysis in fluorescence microscopy via maximum entropy copula
Articles in the same Issue
- Frontmatter
- Research Articles
- The method of envelopes to concisely calculate semiparametric efficient scores under parametric restrictions
- A machine learning-based approach for estimating and testing associations with multivariate outcomes
- Shrinkage estimation applied to a semi-nonparametric regression model
- Multivariate quasi-beta regression models for continuous bounded data
- Integrative analysis with a system of semiparametric projection non-linear regression models
- Seemingly unrelated regression with measurement error: estimation via Markov Chain Monte Carlo and mean field variational Bayes approximation
- Alternatives to the Kaplan–Meier estimator of progression-free survival
- Two-stage receiver operating-characteristic curve estimator for cohort studies
- Estimating the area under a receiver operating characteristic curve using partially ordered sets
- Modelling ethnic differences in the distribution of insulin resistance via Bayesian nonparametric processes: an application to the SABRE cohort study
- Co-localization analysis in fluorescence microscopy via maximum entropy copula