Genotype Copy Number Variations using Gaussian Mixture Models: Theory and Algorithms
-
Chang-Yun Lin
, Yungtai Lo and Kenny Q. Ye
Abstract
Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Large-scale Parentage Inference with SNPs: an Efficient Algorithm for Statistical Confidence of Parent Pair Allocations
- ExactDAS: An Exact Test Procedure for the Detection of Differential Alternative Splicing in Microarray Experiments
- Incorporating Genomic Annotation into a Hidden Markov Model for DNA Methylation Tiling Array Data
- Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
- Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data
- Analyzing Genetic Association Studies with an Extended Propensity Score Approach
- Genotype Copy Number Variations using Gaussian Mixture Models: Theory and Algorithms
- Estimators of the local false discovery rate designed for small numbers of tests
- A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection
- Comparison of Targeted Maximum Likelihood and Shrinkage Estimators of Parameters in Gene Networks
- DNA Pooling and Statistical Tests for the Detection of Single Nucleotide Polymorphisms
Articles in the same Issue
- Article
- Large-scale Parentage Inference with SNPs: an Efficient Algorithm for Statistical Confidence of Parent Pair Allocations
- ExactDAS: An Exact Test Procedure for the Detection of Differential Alternative Splicing in Microarray Experiments
- Incorporating Genomic Annotation into a Hidden Markov Model for DNA Methylation Tiling Array Data
- Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
- Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data
- Analyzing Genetic Association Studies with an Extended Propensity Score Approach
- Genotype Copy Number Variations using Gaussian Mixture Models: Theory and Algorithms
- Estimators of the local false discovery rate designed for small numbers of tests
- A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection
- Comparison of Targeted Maximum Likelihood and Shrinkage Estimators of Parameters in Gene Networks
- DNA Pooling and Statistical Tests for the Detection of Single Nucleotide Polymorphisms