Modeling Read Counts for CNV Detection in Exome Sequencing Data
-
Michael I. Love
Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.
Author Notes:
We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union’s Seventh Framework Program under grant agreement number 241995, project GENCODYS.
References
1000 Genomes Project Consortium (2010): “A map of human genome variation from population-scale sequencing,” Nature, 467, 1061–1073.10.1038/nature09534Search in Google Scholar PubMed PubMed Central
Alkan, C., J. M. Kidd, T. Marques-Bonet, G. Aksay, F. Antonacci, F. Hormozdiari, J. O. Kitzman, C. Baker, M. Malig, O. Mutlu, S. C. Sahinalp, R. A. Gibbs, and E. E. Eichler (2009): “Personalized copy number and segmental duplication maps using next-generation sequencing,” Nature Genetics, 41, 1061–1067.10.1038/ng.437Search in Google Scholar PubMed PubMed Central
Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data.” Genome biology, 11, R106+.10.1186/gb-2010-11-10-r106Search in Google Scholar PubMed PubMed Central
Benjamini, Y. and T. P. Speed (2011): “Estimation and correction for GC-content bias in high throughput sequencing,” Technical report, University of California at Berkeley.Search in Google Scholar
Bliss, C. I. and R. A. Fisher (1953): “Fitting the Negative Binomial Distribution to Biological Data,” Biometrics, 9.10.2307/3001850Search in Google Scholar
Boeva, V., A. Zinovyev, K. Bleakley, J.-P. Vert, I. Janoueix-Lerosey, O. Delattre, and E. Barillot (2011): “Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization,” Bioinformatics, 27, 268–269.10.1093/bioinformatics/btq635Search in Google Scholar PubMed PubMed Central
Campbell, P. J., P. J. Stephens, E. D. Pleasance, S. O’Meara, H. Li, T. Santarius, L. A. Stebbings, C. Leroy, S. Edkins, C. Hardy, J. W. Teague, A. Menzies, I. Goodhead, D. J. Turner, C. M. Clee, M. A. Quail, A. Cox, C. Brown, R. Durbin, M. E. Hurles, P. A. W. Edwards, G. R. Bignell, M. R. Stratton, and P. A. Futreal (2008): “Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing,” Nature Genetics, 40, 722–729.10.1038/ng.128Search in Google Scholar PubMed PubMed Central
Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson, and E. S. Lander (2008): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nature Methods, 6, 99–103.10.1038/nmeth.1276Search in Google Scholar PubMed PubMed Central
Conrad, D. F., D. Pinto, R. Redon, L. Feuk, O. Gokcumen, Y. Zhang, J. Aerts, T. D. Andrews, C. Barnes, P. Campbell, T. Fitzgerald, M. Hu, C. H. Ihm, K. Kristiansson, D. G. MacArthur, J. R. MacDonald, I. Onyiah, A. W. Pang, S. Robson, K. Stirrups, A. Valsesia, K. Walter, J. Wei, C. Tyler-Smith, N. P. Carter, C. Lee, S. W. Scherer, and M. E. Hurles (2010): “Origins and functional impact of copy number variation in the human genome,” Nature, 464, 704–712.10.1038/nature08516Search in Google Scholar PubMed PubMed Central
Fridlyand, J. (2004): “Hidden Markov models approach to the analysis of array CGH data,” Journal of Multivariate Analysis, 90, 132–153.10.1016/j.jmva.2004.02.008Search in Google Scholar
Gentleman, R., V. Carey, D. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Yang, and J. Zhang (2004): “Bioconductor: open software development for computational biology and bioinformatics,” Genome Biology, 5, R80+.10.1186/gb-2004-5-10-r80Search in Google Scholar PubMed PubMed Central
Glessner, J. T., K. Wang, G. Cai, O. Korvatska, C. E. Kim, S. Wood, H. Zhang, A. Estes, C. W. Brune, J. P. Bradfield, M. Imielinski, E. C. Frackelton, J. Reichert, E. L. Crawford, J. Munson, P. M. A. Sleiman, R. Chiavacci, K. Annaiah, K. Thomas, C. Hou, W. Glaberson, J. Flory, F. Otieno, M. Garris, L. Soorya, L. Klei, J. Piven, K. J. Meyer, E. Anagnostou, T. Sakurai, R. M. Game, D. S. Rudd, D. Zurawiecki, C. J. McDougle, L. K. Davis, J. Miller, D. J. Posey, S. Michaels, A. Kolevzon, J. M. Silverman, R. Bernier, S. E. Levy, R. T. Schultz, G. Dawson, T. Owley, W. M. McMahon, T. H. Wassink, J. A. Sweeney, J. I. Nurnberger, H. Coon, J. S. Sutcliffe, N. J. Minshew, S. F. A. Grant, M. Bucan, E. H. Cook, J. D. Buxbaum, B. Devlin, G. D. Schellenberg, and H. Hakonarson (2009): “Autism genome-wide copy number variation reveals ubiquitin and neuronal genes,” Nature, 459, 569–573.10.1038/nature07953Search in Google Scholar PubMed PubMed Central
Gonzalez, E., H. Kulkarni, H. Bolivar, A. Mangano, R. Sanchez, G. Catano, R. J. Nibbs, B. I. Freedman, M. P. Quinones, M. J. Bamshad, K. K. Murthy, B. H. Rovin, W. Bradley, R. A. Clark, S. A. Anderson, R. J. O’Connell, B. K. Agan, S. S. Ahuja, R. Bologna, L. Sen, M. J. Dolan, and S. K. Ahuja (2005): “The Influence of CCL3L1 Gene-Containing Segmental Duplications on HIV-1/AIDS Susceptibility,” Science, 307, 1434–1440.10.1126/science.1101160Search in Google Scholar PubMed
Harismendy, O., P. Ng, R. Strausberg, X. Wang, T. Stockwell, K. Beeson, N. Schork, S. Murray, E. Topol, S. Levy, and K. Frazer (2009): “Evaluation of next generation sequencing platforms for population targeted sequencing studies,” Genome Biology, 10, R32+.10.1186/gb-2009-10-3-r32Search in Google Scholar PubMed PubMed Central
Hedges, D. J., T. Guettouche, S. Yang, G. Bademci, A. Diaz, A. Andersen, W. F. Hulme, S. Linker, A. Mehta, Y. J. K. Edwards, G. W. Beecham, E. R. Martin, M. A. Pericak-Vance, S. Zuchner, J. M. Vance, and J. R. Gilbert (2011): “Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform,” PLoS ONE, 6, e18595+.10.1371/journal.pone.0018595Search in Google Scholar PubMed PubMed Central
Herman, D. S., G. K. Hovingh, O. Iartchouk, H. L. Rehm, R. Kucherlapati, J. G. Seidman, and C. E. Seidman (2009): “Filter-based hybridization capture of subgenomes enables resequencing and copy-number detection.” Nature methods, 6, 507–510.10.1038/nmeth.1343Search in Google Scholar PubMed PubMed Central
Ivakhno, S., T. Royce, A. J. Cox, D. J. Evers, R. K. Cheetham, and S. Tavaré (2010): “CNAsega novel framework for identification of copy number changes in cancer from second-generation sequencing data,” Bioinformatics, 26, 3051–3058.10.1093/bioinformatics/btq587Search in Google Scholar PubMed
Kleinjan, D.-J. and V. van Heyningen (1998): “Position Effect in Human Genetic Disease,” Human Molecular Genetics, 7, 1611–1618.10.1093/hmg/7.10.1611Search in Google Scholar PubMed
Li, Y., N. Vinckenbosch, G. Tian, E. Huerta-Sanchez, T. Jiang, H. Jiang, A. Albrechtsen, G. Andersen, H. Cao, T. Korneliussen, N. Grarup, Y. Guo, I. Hellman, X. Jin, Q. Li, J. Liu, X. Liu, T. Sparso, M. Tang, H. Wu, R. Wu, C. Yu, H. Zheng, A. Astrup, L. Bolund, J. Holmkvist, T. Jorgensen, K. Kristiansen, O. Schmitz, T. W. Schwartz, X. Zhang, R. Li, H. Yang, J. Wang, T. Hansen, O. Pedersen, R. Nielsen, and J. Wang (2010): “Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants,” Nature Genetics, 42, 969–972.10.1038/ng.680Search in Google Scholar PubMed
Madrigal, I., L. Rodríguez-Revenga, L. Armengol, E. González, B. Rodriguez, C. Badenas, A. Sánchez, F. Martínez, M. Guitart, I. Fernández, J. A. Arranz, M. Tejada, L. A. Pérez-Jurado, X. Estivill, and M. Milà (2007): “X-chromosome tiling path array detection of copy number variants in patients with chromosome X-linked mental retardation.” BMC genomics, 8, 443+.10.1186/1471-2164-8-443Search in Google Scholar PubMed PubMed Central
Marioni, J. C., N. P. Thorne, and S. Tavaré (2006): “BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data.” Bioinformatics, 22, 1144–1146.10.1093/bioinformatics/btl089Search in Google Scholar PubMed
Medvedev, P., M. Stanciu, and M. Brudno (2009): “Computational methods for discovering structural variation with next-generation sequencing,” Nature Methods, 6, S13–S20.10.1038/nmeth.1374Search in Google Scholar PubMed
Miller, C. A., O. Hampton, C. Coarfa, and A. Milosavljevic (2011): “ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads,” PLoS ONE, 6, e16327+.10.1371/journal.pone.0016327Search in Google Scholar PubMed PubMed Central
Nord, A., M. Lee, M. C. King, and T. Walsh (2011): “Accurate and exact CNV identification from targeted high-throughput sequence data,” BMC Genomics, 12, 184+.10.1186/1471-2164-12-184Search in Google Scholar PubMed PubMed Central
O’Roak, B. J., P. Deriziotis, C. Lee, L. Vives, J. J. Schwartz, S. Girirajan, E. Karakoc, A. P. MacKenzie, S. B. Ng, C. Baker, M. J. Rieder, D. A. Nickerson, R. Bernier, S. E. Fisher, J. Shendure, and E. E. Eichler (2011): “Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations,” Nature Genetics, 43, 585–589.10.1038/ng.835Search in Google Scholar PubMed PubMed Central
Pang, A., J. MacDonald, D. Pinto, J. Wei, M. Rafiq, D. Conrad, H. Park, M. Hurles, C. Lee, J. C. Venter, E. Kirkness, S. Levy, L. Feuk, and S. Scherer (2010): “Towards a comprehensive structural variation map of an individual human genome,” Genome Biology, 11, R52+.10.1186/gb-2010-11-5-r52Search in Google Scholar PubMed PubMed Central
Pruitt, K. D., J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E. Loveland, B. J. Ruef, E. Hart, M.-M. M. Suner, M. J. Landrum, B. Aken, S. Ayling, R. Baertsch, J. Fernandez-Banet, J. L. Cherry, V. Curwen, M. Dicuccio, M. Kellis, J. Lee, M. F. Lin, M. Schuster, A. Shkeda, C. Amid, G. Brown, O. Dukhanina, A. Frankish, J. Hart, B. L. Maidak, J. Mudge, M. R. Murphy, T. Murphy, J. Rajan, B. Rajput, L. D. Riddick, C. Snow, C. Steward, D. Webb, J. A. Weber, L. Wilming, W. Wu, E. Birney, D. Haussler, T. Hubbard, J. Ostell, R. Durbin, and D. Lipman (2009): “The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.” Genome research, 19, 1316–1323.10.1101/gr.080531.108Search in Google Scholar PubMed PubMed Central
R Development Core Team (2011): R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.Search in Google Scholar
Rabiner, L. R. (1989): “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, 77, 257–286.10.1109/5.18626Search in Google Scholar
Robinson, M. D., D. J. McCarthy, and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics (Oxford, England), 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central
Sathirapongsasuti, J. F., H. Lee, B. A. Horst, G. Brunner, A. J. Cochran, S. Binder, J. Quackenbush, and S. F. Nelson (2011): “Exome Sequencing-Based Copy-Number Variation and Loss of Heterozygosity Detection: ExomeCNV.” Bioinformatics (Oxford, England).10.1093/bioinformatics/btr462Search in Google Scholar PubMed PubMed Central
Sebat, J., B. Lakshmi, D. Malhotra, J. Troge, C. Lese-Martin, T. Walsh, B. Yamrom, S. Yoon, A. Krasnitz, J. Kendall, A. Leotta, D. Pai, R. Zhang, Y.-H. H. Lee, J. Hicks, S. J. Spence, A. T. Lee, K. Puura, T. Lehtimäki, D. Ledbetter, P. K. Gregersen, J. Bregman, J. S. Sutcliffe, V. Jobanputra, W. Chung, D. Warburton, M.-C. C. King, D. Skuse, D. H. Geschwind, T. C. Gilliam, K. Ye, and M. Wigler (2007): “Strong association of de novo copy number mutations with autism.” Science (New York, N.Y.), 316, 445–449.Search in Google Scholar
Shen, J. J. and N. R. Zhang (2011): “Change-Point Model on Non-Homogeneous Poisson Processes with Application in Copy Number Profiling by Next-Generation DNA Sequencing,” Technical report, Division of Biostatistics, Stanford University.10.1214/11-AOAS517Search in Google Scholar
St Clair, D. (2009): “Copy number variation and schizophrenia.” Schizophrenia bulletin, 35, 9–12.10.1093/schbul/sbn147Search in Google Scholar PubMed PubMed Central
Venkatraman, E. S. and A. B. Olshen (2007): “A faster circular binary segmentation algorithm for the analysis of array CGH data,” Bioinformatics, 23, 657–663.10.1093/bioinformatics/btl646Search in Google Scholar PubMed
Weese, D., A.-K. Emde, T. Rausch, A. Döring, and K. Reinert (2009): “RazerSfast read mapping with sensitivity control,” Genome Research, 19, 1646–1654.10.1101/gr.088823.108Search in Google Scholar PubMed PubMed Central
Xie, C. and M. Tammi (2009): “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, 10, 80+.10.1186/1471-2105-10-80Search in Google Scholar PubMed PubMed Central
Yoon, S., Z. Xuan, V. Makarov, K. Ye, and J. Sebat (2009): “Sensitive and accurate detection of copy number variants using read depth of coverage,” Genome Research, 19, 1586–1592.10.1101/gr.092981.109Search in Google Scholar PubMed PubMed Central
Zhang, J., L. Feuk, G. E. Duggan, R. Khaja, and S. W. Scherer (2006): “Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome,” Cytogenetic and Genome Research, 115, 205–214.10.1159/000095916Search in Google Scholar PubMed
©2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Invited Editorial
- Measurement of Evidence and Evidence of Measurement
- Article
- Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
- Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
- Assessing Modularity Using a Random Matrix Theory Approach
- Choice of Summary Statistic Weights in Approximate Bayesian Computation
- Genetic Linkage Analysis in the Presence of Germline Mosaicism
- Fitting Boolean Networks from Steady State Perturbation Data
- Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
- Bayesian Learning from Marginal Data in Bionetwork Models
- Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
- Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
- Modeling Read Counts for CNV Detection in Exome Sequencing Data
- Multiscale Characterization of Signaling Network Dynamics through Features
- A Calibrated Multiclass Extension of AdaBoost
- False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
- A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra
- Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
- Learning Monotonic Genotype-Phenotype Maps
- A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
- Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
- Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
- A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
- Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
- A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
- Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
- Interval Estimation of Familial Correlations from Pedigrees
- Information Metrics in Genetic Epidemiology
- Linear Combination Test for Hierarchical Gene Set Analysis
- Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
- Application of the Lasso to Expression Quantitative Trait Loci Mapping
- A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
- Imputation Estimators Partially Correct for Model Misspecification
- On the Statistical Properties of SGoF Multitesting Method
- Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
- A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
- Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
- Disequilibrium Coefficient: A Bayesian Perspective
- Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
- The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
- Inferring Gene Networks using Robust Statistical Techniques
- A Two-Stage Poisson Model for Testing RNA-Seq Data
- Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
- The Joint Null Criterion for Multiple Hypothesis Tests
- Multiple Imputation of Missing Phenotype Data for QTL Mapping
- Sparse Canonical Covariance Analysis for High-throughput Data
- Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
- Random Forests for Genetic Association Studies
- Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
- High-Dimensional Regression and Variable Selection Using CAR Scores
- Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
- Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
- Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
- Weighted Lasso with Data Integration
- MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
- A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies
Articles in the same Issue
- Invited Editorial
- Measurement of Evidence and Evidence of Measurement
- Article
- Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays
- Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains
- Assessing Modularity Using a Random Matrix Theory Approach
- Choice of Summary Statistic Weights in Approximate Bayesian Computation
- Genetic Linkage Analysis in the Presence of Germline Mosaicism
- Fitting Boolean Networks from Steady State Perturbation Data
- Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing
- Bayesian Learning from Marginal Data in Bionetwork Models
- Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
- Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures
- Modeling Read Counts for CNV Detection in Exome Sequencing Data
- Multiscale Characterization of Signaling Network Dynamics through Features
- A Calibrated Multiclass Extension of AdaBoost
- False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies
- A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra
- Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery
- Learning Monotonic Genotype-Phenotype Maps
- A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
- Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation
- Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy
- A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
- Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain-Backbone Interactions
- A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations
- Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation
- Interval Estimation of Familial Correlations from Pedigrees
- Information Metrics in Genetic Epidemiology
- Linear Combination Test for Hierarchical Gene Set Analysis
- Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
- Application of the Lasso to Expression Quantitative Trait Loci Mapping
- A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction
- Imputation Estimators Partially Correct for Model Misspecification
- On the Statistical Properties of SGoF Multitesting Method
- Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases
- A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments
- Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees
- Disequilibrium Coefficient: A Bayesian Perspective
- Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review
- The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
- Inferring Gene Networks using Robust Statistical Techniques
- A Two-Stage Poisson Model for Testing RNA-Seq Data
- Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power
- The Joint Null Criterion for Multiple Hypothesis Tests
- Multiple Imputation of Missing Phenotype Data for QTL Mapping
- Sparse Canonical Covariance Analysis for High-throughput Data
- Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests
- Random Forests for Genetic Association Studies
- Deviance Information Criteria for Model Selection in Approximate Bayesian Computation
- High-Dimensional Regression and Variable Selection Using CAR Scores
- Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms
- Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes
- Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
- Weighted Lasso with Data Integration
- MA-SNP -- A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model
- A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies