Application of the Lasso to Expression Quantitative Trait Loci Mapping
-
Andrew Anand Brown
Univariate methods have frequently been used to discover Quantitative Trait Loci for gene expression measurements, often with much success. However, correlations caused by Linkage Disequilibrium as well as chance correlations, which are functions of the large number of markers typically used in such studies, mean that causative regions can often cause multiple signals. Traditional investigations into the number of QTL for a given phenotype, such as visual inspection of likelihood plots, are not feasible when considering thousands of phenotypes. Stepwise methods have been suggested to counter this, but these are known to produce unstable models and there are difficulties in deriving significance estimates. The Lasso is a shrinkage method which has often been employed to discover true signals when the number of variables exceeds the number of observations. We propose a test statistic based on the threshold at which variables enter the Lasso model, prove analytic properties of this statistic which demonstrate parallels with univariate methods and demonstrate its utility in proposing candidate QTL. We show that this method controls for LD structure, and the estimates of statistical significance produced have superior properties when compared to those derived by stepwise methods. We study the performance of our method using simulation studies. These simulations find that the ratio of true discoveries to false positives is often superior for our method compared to univariate and stepwise approaches. Finally, we apply the derived method to data from a previous eQTL mapping experiment to investigate the nature of genetic regulation in this population.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Invited Editorial
- Measurement of Evidence and Evidence of Measurement
- Article
- Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
- Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
- Assessing Modularity Using a Random Matrix Theory Approach
- Choice of Summary Statistic Weights in Approximate Bayesian Computation
- Genetic Linkage Analysis in the Presence of Germline Mosaicism
- Fitting Boolean Networks from Steady State Perturbation Data
- Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
- Bayesian Learning from Marginal Data in Bionetwork Models
- Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
- Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
- Modeling Read Counts for CNV Detection in Exome Sequencing Data
- Multiscale Characterization of Signaling Network Dynamics through Features
- A Calibrated Multiclass Extension of AdaBoost
- False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
- A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra
- Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
- Learning Monotonic Genotype-Phenotype Maps
- A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
- Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
- Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
- A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
- Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
- A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
- Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
- Interval Estimation of Familial Correlations from Pedigrees
- Information Metrics in Genetic Epidemiology
- Linear Combination Test for Hierarchical Gene Set Analysis
- Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
- Application of the Lasso to Expression Quantitative Trait Loci Mapping
- A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
- Imputation Estimators Partially Correct for Model Misspecification
- On the Statistical Properties of SGoF Multitesting Method
- Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
- A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
- Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
- Disequilibrium Coefficient: A Bayesian Perspective
- Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
- The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
- Inferring Gene Networks using Robust Statistical Techniques
- A Two-Stage Poisson Model for Testing RNA-Seq Data
- Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
- The Joint Null Criterion for Multiple Hypothesis Tests
- Multiple Imputation of Missing Phenotype Data for QTL Mapping
- Sparse Canonical Covariance Analysis for High-throughput Data
- Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
- Random Forests for Genetic Association Studies
- Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
- High-Dimensional Regression and Variable Selection Using CAR Scores
- Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
- Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
- Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
- Weighted Lasso with Data Integration
- MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
- A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies
Articles in the same Issue
- Invited Editorial
- Measurement of Evidence and Evidence of Measurement
- Article
- Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
- Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
- Assessing Modularity Using a Random Matrix Theory Approach
- Choice of Summary Statistic Weights in Approximate Bayesian Computation
- Genetic Linkage Analysis in the Presence of Germline Mosaicism
- Fitting Boolean Networks from Steady State Perturbation Data
- Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
- Bayesian Learning from Marginal Data in Bionetwork Models
- Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
- Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
- Modeling Read Counts for CNV Detection in Exome Sequencing Data
- Multiscale Characterization of Signaling Network Dynamics through Features
- A Calibrated Multiclass Extension of AdaBoost
- False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
- A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra
- Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
- Learning Monotonic Genotype-Phenotype Maps
- A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
- Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
- Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
- A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
- Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
- A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
- Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
- Interval Estimation of Familial Correlations from Pedigrees
- Information Metrics in Genetic Epidemiology
- Linear Combination Test for Hierarchical Gene Set Analysis
- Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
- Application of the Lasso to Expression Quantitative Trait Loci Mapping
- A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
- Imputation Estimators Partially Correct for Model Misspecification
- On the Statistical Properties of SGoF Multitesting Method
- Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
- A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
- Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
- Disequilibrium Coefficient: A Bayesian Perspective
- Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
- The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
- Inferring Gene Networks using Robust Statistical Techniques
- A Two-Stage Poisson Model for Testing RNA-Seq Data
- Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
- The Joint Null Criterion for Multiple Hypothesis Tests
- Multiple Imputation of Missing Phenotype Data for QTL Mapping
- Sparse Canonical Covariance Analysis for High-throughput Data
- Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
- Random Forests for Genetic Association Studies
- Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
- High-Dimensional Regression and Variable Selection Using CAR Scores
- Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
- Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
- Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
- Weighted Lasso with Data Integration
- MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
- A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies