Exploring Multicollinearity Using a Random Matrix Theory Approach
-
Kristen Feher
Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Exploring Multicollinearity Using a Random Matrix Theory Approach
- The Beta-Binomial SGoF method for multiple dependent tests
- Detecting Sample Misidentifications in Genetic Association Studies
- Borrowing Information Across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis
- Hierarchical Bayes Model for Predicting Effectiveness of HIV Combination Therapies
- The practical effect of batch on genomic prediction
- Normalization, bias correction, and peak calling for ChIP-seq
- Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model
- Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals
- Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set
- Non-Iterative, Regression-Based Estimation of Haplotype Associations with Censored Survival Outcomes
- Graph Selection with GGMselect
- Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry
- A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification
- Software Communication
- GENOVA: Gene Overlap Analysis of GWAS Results
Articles in the same Issue
- Article
- Exploring Multicollinearity Using a Random Matrix Theory Approach
- The Beta-Binomial SGoF method for multiple dependent tests
- Detecting Sample Misidentifications in Genetic Association Studies
- Borrowing Information Across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis
- Hierarchical Bayes Model for Predicting Effectiveness of HIV Combination Therapies
- The practical effect of batch on genomic prediction
- Normalization, bias correction, and peak calling for ChIP-seq
- Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model
- Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals
- Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set
- Non-Iterative, Regression-Based Estimation of Haplotype Associations with Censored Survival Outcomes
- Graph Selection with GGMselect
- Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry
- A New Approach for the Joint Analysis of Multiple Chip-Seq Libraries with Application to Histone Modification
- Software Communication
- GENOVA: Gene Overlap Analysis of GWAS Results