Sparse Canonical Covariance Analysis for High-throughput Data

Woojoo Lee; Donghwan Lee; Youngjo Lee; Yudi Pawitan

doi:10.2202/1544-6115.1638

Home Life Sciences Sparse Canonical Covariance Analysis for High-throughput Data

Article

Licensed

Unlicensed Requires Authentication

Sparse Canonical Covariance Analysis for High-throughput Data

Woojoo Lee , Donghwan Lee , Youngjo Lee and Yudi Pawitan

Published/Copyright: July 6, 2011

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 10 Issue 1

MLA
APA
Harvard
Chicago
Vancouver

MLA
APA
Harvard
Chicago
Vancouver

Lee, Woojoo, Lee, Donghwan, Lee, Youngjo and Pawitan, Yudi. "Sparse Canonical Covariance Analysis for High-throughput Data" Statistical Applications in Genetics and Molecular Biology, vol. 10, no. 1. https://doi.org/10.2202/1544-6115.1638

Lee, W., Lee, D., Lee, Y. & Pawitan, Y. (). Sparse Canonical Covariance Analysis for High-throughput Data. Statistical Applications in Genetics and Molecular Biology, 10(1). https://doi.org/10.2202/1544-6115.1638

Lee, W., Lee, D., Lee, Y. and Pawitan, Y. () Sparse Canonical Covariance Analysis for High-throughput Data. Statistical Applications in Genetics and Molecular Biology, Vol. 10 (Issue 1). https://doi.org/10.2202/1544-6115.1638

Lee, Woojoo, Lee, Donghwan, Lee, Youngjo and Pawitan, Yudi. "Sparse Canonical Covariance Analysis for High-throughput Data" Statistical Applications in Genetics and Molecular Biology 10, no. 1 (). https://doi.org/10.2202/1544-6115.1638

Lee W, Lee D, Lee Y, Pawitan Y. Sparse Canonical Covariance Analysis for High-throughput Data. Statistical Applications in Genetics and Molecular Biology. ;10(1). https://doi.org/10.2202/1544-6115.1638

Copy

Copied to clipboard

BibTeX EndNote RIS

Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model identification. We also develop an extension of sparse CCA to address more than two sets of variables on the same set of observations. We illustrate the method with an analysis of the NCI cancer dataset.

Keywords: canonical covariance analysis; sparsity; random-effect model; high-dimensional genomic data

Published Online: 2011-7-6

You are currently not able to access this content.

Articles in the same Issue

Invited Editorial
Measurement of Evidence and Evidence of Measurement
Article
Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
Assessing Modularity Using a Random Matrix Theory Approach
Choice of Summary Statistic Weights in Approximate Bayesian Computation
Genetic Linkage Analysis in the Presence of Germline Mosaicism
Fitting Boolean Networks from Steady State Perturbation Data
Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
Bayesian Learning from Marginal Data in Bionetwork Models
Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
Modeling Read Counts for CNV Detection in Exome Sequencing Data
Multiscale Characterization of Signaling Network Dynamics through Features
A Calibrated Multiclass Extension of AdaBoost
False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
A Markov-Chain Model for the Analysis of High-Resolution Enzymatically ¹⁸O-Labeled Mass Spectra
Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
Learning Monotonic Genotype-Phenotype Maps
A Comparison of Multifactor Dimensionality Reduction and L₁-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
Interval Estimation of Familial Correlations from Pedigrees
Information Metrics in Genetic Epidemiology
Linear Combination Test for Hierarchical Gene Set Analysis
Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
Application of the Lasso to Expression Quantitative Trait Loci Mapping
A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
Imputation Estimators Partially Correct for Model Misspecification
On the Statistical Properties of SGoF Multitesting Method
Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
Disequilibrium Coefficient: A Bayesian Perspective
Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
Inferring Gene Networks using Robust Statistical Techniques
A Two-Stage Poisson Model for Testing RNA-Seq Data
Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
The Joint Null Criterion for Multiple Hypothesis Tests
Multiple Imputation of Missing Phenotype Data for QTL Mapping
Sparse Canonical Covariance Analysis for High-throughput Data
Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
Random Forests for Genetic Association Studies
Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
High-Dimensional Regression and Variable Selection Using CAR Scores
Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
Weighted Lasso with Data Integration
MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies

Search journal Search the content of this journal

https://doi.org/10.2202/1544-6115.1638

Keywords for this article

canonical covariance analysis; sparsity; random-effect model; high-dimensional genomic data

Articles in the same Issue

Invited Editorial
Measurement of Evidence and Evidence of Measurement
Article
Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
Assessing Modularity Using a Random Matrix Theory Approach
Choice of Summary Statistic Weights in Approximate Bayesian Computation
Genetic Linkage Analysis in the Presence of Germline Mosaicism
Fitting Boolean Networks from Steady State Perturbation Data
Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
Bayesian Learning from Marginal Data in Bionetwork Models
Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
Modeling Read Counts for CNV Detection in Exome Sequencing Data
Multiscale Characterization of Signaling Network Dynamics through Features
A Calibrated Multiclass Extension of AdaBoost
False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
A Markov-Chain Model for the Analysis of High-Resolution Enzymatically ¹⁸O-Labeled Mass Spectra
Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
Learning Monotonic Genotype-Phenotype Maps
A Comparison of Multifactor Dimensionality Reduction and L₁-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
Interval Estimation of Familial Correlations from Pedigrees
Information Metrics in Genetic Epidemiology
Linear Combination Test for Hierarchical Gene Set Analysis
Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
Application of the Lasso to Expression Quantitative Trait Loci Mapping
A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
Imputation Estimators Partially Correct for Model Misspecification
On the Statistical Properties of SGoF Multitesting Method
Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
Disequilibrium Coefficient: A Bayesian Perspective
Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
Inferring Gene Networks using Robust Statistical Techniques
A Two-Stage Poisson Model for Testing RNA-Seq Data
Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
The Joint Null Criterion for Multiple Hypothesis Tests
Multiple Imputation of Missing Phenotype Data for QTL Mapping
Sparse Canonical Covariance Analysis for High-throughput Data
Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
Random Forests for Genetic Association Studies
Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
High-Dimensional Regression and Variable Selection Using CAR Scores
Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
Weighted Lasso with Data Integration
MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies