Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study
-
Tuan S Nguyen
An important aspect of microarray studies involves the prediction of patient survival based on their gene expression levels. To cope with the high dimensionality of the microarray gene expression data, it is customary to first reduce the dimension of the gene expression data via dimension reduction methods, and then use the Cox proportional hazards model to predict patient survival. In this paper, we propose a variant of Partial Least Squares, denoted as Rank-based Modified Partial Least Squares (RMPLS), that is insensitive to outlying values of both the response and the gene expressions. We assess the performance of RMPLS and several dimension reduction methods using a simulation model for gene expression data with a censored response. In particular, Principal Component Analysis (PCA), modified Partial Least Squares (MPLS), RMPLS, Sliced Inverse Regression (SIR), Correlation Principal Component Regression (CPCR), Supervised Principal Component Regression (SPCR) and Univariate Selection (UNIV) are compared in terms of mean squared error of the estimated survival function and the estimated coefficients of the covariates, and in terms of the bias of the estimated survival function. It turns out that RMPLS outperforms all other methods in terms of the mean squared error and the bias of the survival function in the presence of outliers in the response. In addition, RMPLS is comparable to MPLS in the absence of outliers. In this setting, both RMPLS and MPLS outperform all other methods considered in this study in terms of mean squared error and bias of the estimated survival function.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Sparse Canonical Correlation Analysis with Application to Genomic Data Integration
- Orthology-Based Multilevel Modeling of Differentially Expressed Mouse and Human Gene Pairs
- Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis
- Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study
- A Nonlinear Mixed-Effects Model for Estimating Calibration Intervals for Unknown Concentrations in Two-Color Microarray Data with Spike-Ins
- Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates
- A Multiple Testing Approach to High-Dimensional Association Studies with an Application to the Detection of Associations between Risk Factors of Heart Disease and Genetic Polymorphisms
- Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values
- Inferring Dynamic Genetic Networks with Low Order Independencies
- Normalization Method for Transcriptional Studies of Heterogeneous Samples - Simultaneous Array Normalization and Identification of Equivalent Expression
- A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers
- Modified FDR Controlling Procedure for Multi-Stage Analyses
- Detecting Outlier Samples in Microarray Data
- Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies
- Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis
- Score Statistics for Mapping Quantitative Trait Loci
- Impact of Population Stratification on Family-Based Association Tests with Longitudinal Measurements
- A Multilocus Model for Constructing a Linkage Disequilibrium Map in Human Populations
- Testing of Chromosomal Clumping of Gene Properties
- Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction
- Univariate Shrinkage in the Cox Model for High Dimensional Data
- Multilevel Comparison of Dendrograms: A New Method with an Application for Genetic Classifications
- Weighted Multiple Hypothesis Testing Procedures
- Incorporating Duplicate Genotype Data into Linear Trend Tests of Genetic Association: Methods and Cost-Effectiveness
- Increase of Rejection Rate in Case-Control Studies with the Differential Genotyping Error Rates
- A Parametric Model for Analyzing Anticipation in Genetically Predisposed Families
- Bayesian Unsupervised Learning with Multiple Data Types
- Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data
- A Non-Homogeneous Hidden-State Model on First Order Differences for Automatic Detection of Nucleosome Positions
- Adaptive Transmission Disequilibrium Test for Family Trio Design
- Model Selection Based on FDR-Thresholding Optimizing the Area under the ROC-Curve
- Estimation of Selection Intensity under Overdominance by Bayesian Methods
- A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data
- Rotation Testing in Gene Set Enrichment Analysis for Small Direct Comparison Experiments
- Ancestral Recombination Graphs under Non-Random Ascertainment, with Applications to Gene Mapping
- Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data
- Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry
- A Statistical Model for Genetic Mapping of Viral Infection by Integrating Epidemiological Behavior
- Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis
- Modeling Dependence in Methylation Patterns with Application to Ovarian Carcinomas
- M-quantile Regression Analysis of Temporal Gene Expression Data
- MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data
- Characterizing the D2 Statistic: Word Matches in Biological Sequences
- Transmission Disequilibrium Test Power and Sample Size in the Presence of Locus Heterogeneity
- A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy
- Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study
- A Unified Mixed Effects Model for Gene Set Analysis of Time Course Microarray Experiments
Articles in the same Issue
- Article
- Sparse Canonical Correlation Analysis with Application to Genomic Data Integration
- Orthology-Based Multilevel Modeling of Differentially Expressed Mouse and Human Gene Pairs
- Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis
- Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study
- A Nonlinear Mixed-Effects Model for Estimating Calibration Intervals for Unknown Concentrations in Two-Color Microarray Data with Spike-Ins
- Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates
- A Multiple Testing Approach to High-Dimensional Association Studies with an Application to the Detection of Associations between Risk Factors of Heart Disease and Genetic Polymorphisms
- Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values
- Inferring Dynamic Genetic Networks with Low Order Independencies
- Normalization Method for Transcriptional Studies of Heterogeneous Samples - Simultaneous Array Normalization and Identification of Equivalent Expression
- A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers
- Modified FDR Controlling Procedure for Multi-Stage Analyses
- Detecting Outlier Samples in Microarray Data
- Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies
- Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis
- Score Statistics for Mapping Quantitative Trait Loci
- Impact of Population Stratification on Family-Based Association Tests with Longitudinal Measurements
- A Multilocus Model for Constructing a Linkage Disequilibrium Map in Human Populations
- Testing of Chromosomal Clumping of Gene Properties
- Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction
- Univariate Shrinkage in the Cox Model for High Dimensional Data
- Multilevel Comparison of Dendrograms: A New Method with an Application for Genetic Classifications
- Weighted Multiple Hypothesis Testing Procedures
- Incorporating Duplicate Genotype Data into Linear Trend Tests of Genetic Association: Methods and Cost-Effectiveness
- Increase of Rejection Rate in Case-Control Studies with the Differential Genotyping Error Rates
- A Parametric Model for Analyzing Anticipation in Genetically Predisposed Families
- Bayesian Unsupervised Learning with Multiple Data Types
- Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data
- A Non-Homogeneous Hidden-State Model on First Order Differences for Automatic Detection of Nucleosome Positions
- Adaptive Transmission Disequilibrium Test for Family Trio Design
- Model Selection Based on FDR-Thresholding Optimizing the Area under the ROC-Curve
- Estimation of Selection Intensity under Overdominance by Bayesian Methods
- A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data
- Rotation Testing in Gene Set Enrichment Analysis for Small Direct Comparison Experiments
- Ancestral Recombination Graphs under Non-Random Ascertainment, with Applications to Gene Mapping
- Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data
- Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry
- A Statistical Model for Genetic Mapping of Viral Infection by Integrating Epidemiological Behavior
- Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis
- Modeling Dependence in Methylation Patterns with Application to Ovarian Carcinomas
- M-quantile Regression Analysis of Temporal Gene Expression Data
- MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data
- Characterizing the D2 Statistic: Word Matches in Biological Sequences
- Transmission Disequilibrium Test Power and Sample Size in the Presence of Locus Heterogeneity
- A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy
- Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study
- A Unified Mixed Effects Model for Gene Set Analysis of Time Course Microarray Experiments