Cluster-Localized Sparse Logistic Regression for SNP Data

Harald Binder; Tina Müller; Holger Schwender; Klaus Golka; Michael Steffens; Jan G. Hengstler; Katja Ickstadt; Martin Schumacher

doi:10.1515/1544-6115.1694

Article

Cluster-Localized Sparse Logistic Regression for SNP Data

Harald Binder , Tina Müller , Holger Schwender , Klaus Golka , Michael Steffens , Jan G. Hengstler , Katja Ickstadt and Martin Schumacher

Published/Copyright: August 14, 2012

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 11 Issue 4

Abstract

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.

Keywords: single nucleotide polymorphisms; weighted regression; clustering

Published Online: 2012-8-14

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/1544-6115.1694

Keywords for this article

single nucleotide polymorphisms; weighted regression; clustering