Home Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
Article
Licensed
Unlicensed Requires Authentication

Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient

  • Claus-Dieter Mayer , Julie Lorent and Graham W Horgan
Published/Copyright: March 2, 2011

The integration of multiple high-dimensional data sets (omics data) has been a very active but challenging area of bioinformatics research in recent years. Various adaptations of non-standard multivariate statistical tools have been suggested that allow to analyze and visualize such data sets simultaneously. However, these methods typically can deal with two data sets only, whereas systems biology experiments often generate larger numbers of high-dimensional data sets. For this reason, we suggest an explorative analysis of similarity between data sets as an initial analysis steps. This analysis is based on the RV coefficient, a matrix correlation, that can be interpreted as a generalization of the squared correlation from two single variables to two sets of variables. It has been shown before however that the high-dimensionality of the data introduces substantial bias to the RV.We therefore introduce an alternative version, the adjusted RV, which is unbiased in the case of independent data sets. We can also show that in many situations, particularly for very high-dimensional data sets, the adjusted RV is a better estimator than previously RV versions in terms of the mean square error and the power of the independence test based on it.

We demonstrate the usefulness of the adjusted RV by applying it to data set of 19 different multivariate data sets from a systems biology experiment. The pairwise RV values between the data sets define a similarity matrix that we can use as an input to a hierarchical clustering or a multi-dimensional scaling. We show that this reveals biological meaningful subgroups of data sets in our study.

Published Online: 2011-3-2

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Articles in the same Issue

  1. Invited Editorial
  2. Measurement of Evidence and Evidence of Measurement
  3. Article
  4. Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
  5. Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
  6. Assessing Modularity Using a Random Matrix Theory Approach
  7. Choice of Summary Statistic Weights in Approximate Bayesian Computation
  8. Genetic Linkage Analysis in the Presence of Germline Mosaicism
  9. Fitting Boolean Networks from Steady State Perturbation Data
  10. Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
  11. Bayesian Learning from Marginal Data in Bionetwork Models
  12. Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
  13. Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
  14. Modeling Read Counts for CNV Detection in Exome Sequencing Data
  15. Multiscale Characterization of Signaling Network Dynamics through Features
  16. A Calibrated Multiclass Extension of AdaBoost
  17. False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
  18. A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra
  19. Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
  20. Learning Monotonic Genotype-Phenotype Maps
  21. A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
  22. Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
  23. Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
  24. A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
  25. Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
  26. A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
  27. Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
  28. Interval Estimation of Familial Correlations from Pedigrees
  29. Information Metrics in Genetic Epidemiology
  30. Linear Combination Test for Hierarchical Gene Set Analysis
  31. Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
  32. Application of the Lasso to Expression Quantitative Trait Loci Mapping
  33. A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
  34. Imputation Estimators Partially Correct for Model Misspecification
  35. On the Statistical Properties of SGoF Multitesting Method
  36. Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
  37. A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
  38. Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
  39. Disequilibrium Coefficient: A Bayesian Perspective
  40. Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
  41. The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
  42. Inferring Gene Networks using Robust Statistical Techniques
  43. A Two-Stage Poisson Model for Testing RNA-Seq Data
  44. Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
  45. The Joint Null Criterion for Multiple Hypothesis Tests
  46. Multiple Imputation of Missing Phenotype Data for QTL Mapping
  47. Sparse Canonical Covariance Analysis for High-throughput Data
  48. Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
  49. Random Forests for Genetic Association Studies
  50. Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
  51. High-Dimensional Regression and Variable Selection Using CAR Scores
  52. Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
  53. Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
  54. Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
  55. Weighted Lasso with Data Integration
  56. MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
  57. A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies
Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.2202/1544-6115.1540/html
Scroll to top button