Startseite A penalized regression approach for DNA copy number study using the sequencing data
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

A penalized regression approach for DNA copy number study using the sequencing data

  • Jaeeun Lee und Jie Chen EMAIL logo
Veröffentlicht/Copyright: 30. Mai 2019

Abstract

Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.

References

Abyzov, A., A. E. Urban, M. Snyder and M. Gerstein (2011): “CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing,” Genome Res., 21, 974–984.10.1101/gr.114876.110Suche in Google Scholar PubMed PubMed Central

BIOBASE. (2013): Biological Databases www.Biobase-international.com.Suche in Google Scholar

Boeva, V., A. Zinovyev, K. Bleakley, J. Vert, I. Janoueix-Lerosey, O. Delattre and E. Barillot (2011): “Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization,” Bioinformatics, 27, 268–269.10.1093/bioinformatics/btq635Suche in Google Scholar PubMed PubMed Central

Chen, J. and Y. P. Wang (2009): “A statistical change point model approach for the detection of DNA copy number variations in array cgh data,” IEEE/ACM Trans. Comput. Biol. Bioinform., 6, 529–541.10.1109/TCBB.2008.129Suche in Google Scholar PubMed PubMed Central

Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson and E. S. Lander (2009): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nature Methods, 6, 99–103.10.1038/nmeth.1276Suche in Google Scholar PubMed PubMed Central

Duan, J., J. Zhang, H. Deng and Y. Wang (2013): “CNV-TV: a robust method to discover copy number variation from short sequencing reads,” BMC Bioinformatics, 14, 150.10.1186/1471-2105-14-150Suche in Google Scholar PubMed PubMed Central

Eilers, P. and R. De Menezes (2005): “Quantile smoothing of array CGH data,” Bioinformatics, 21, 1146–1153.10.1093/bioinformatics/bti148Suche in Google Scholar PubMed

Gill, P. E., W. Murray and M. A. Saunders (1997): User’s guide for SQOPT 5.3: A Fortran package for large-scale linear and quadratic programming. Institution Tech. Rep. NA 97-4, Department of Mathematics, University of California, San Diego.Suche in Google Scholar

Huang, T., B. Wu, P. Lizardi and H. Zhao (2005): “Detection of DNA copy number alterations using penalized least squares regression,” Bioinformatics, 21, 3811–3817.10.1093/bioinformatics/bti646Suche in Google Scholar PubMed

Levy-leduc, C. and Z. Harchaoui (2008): “Catching change-points with lasso,” Adv. Neural Inf. Process. Syst., 20, 617–624.Suche in Google Scholar

Ji, T. and J. Chen (2015): “Modeling the next generation sequencing read count data for DNA copy number variant study,” Stat. Appl. Genet. Mol. Biol., 14, 361–374.10.1515/sagmb-2014-0054Suche in Google Scholar PubMed

Lee, J. (2017): A modified information criterion in the 1d fused lasso for DNA copy number variant detection using next generation sequencing data. Ph.D Dissertation presented to the Graduate School at Augusta University, Augusta, Georgia, USA.Suche in Google Scholar

Li, Y. and J. Zhu (2007): “Analysis of array CGH data for cancer studies using fused quantile regression,” Bioinformatics, 23, 2470–2476.10.1093/bioinformatics/btm364Suche in Google Scholar PubMed

Magi, A., L. Tattini, T. Pippucci, F. Torricelli and M. Benelli (2012): “Read count approach for DNA copy number variants detection,” Bioinformatics, 28, 470–478.10.1093/bioinformatics/btr707Suche in Google Scholar PubMed

Olshen, A. B., E. S. Venkatraman, R. Lucito and M. Wigler (2004): “Circular binary segmentation for the analysis of array-based DNA copy number data,” Biostatistics, 5, 557–572.10.1093/biostatistics/kxh008Suche in Google Scholar PubMed

Pan, J. and J. Chen (2006): “Application of modified information criterion to multiple change point problems,” J. Multivar Anal., 97, 2221–2241.10.1016/j.jmva.2006.05.009Suche in Google Scholar

Picard, F., S. Robin, M. Lavielle, C. Vaisse and J. Daudin (2005): “A statistical approach for array CGH data analysis,” BMC Bioinformatics, 6, 1.10.1186/1471-2105-6-1Suche in Google Scholar PubMed PubMed Central

Qian, J. and L. Su (2016): “Shrinkage estimation of regression models with multiple structural changes,” Econ. Theory, 32, 1376–1433.10.1017/S0266466615000237Suche in Google Scholar

Scheinin, I., D. Sie, H. Bengtsson, M. A. Van De Wiel, A. B. Olshen, H. F. Van Thuijl, P. P. Eijk, F. Rustenburg, G. A. Meijer, J. C. Reijneveld, P. Wesseling, D. Pinkel, D. G. Albertson and B. Ylstra. (2014): “DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly,” Genome Res., 24, 2022–2032.10.1101/gr.175141.114Suche in Google Scholar PubMed PubMed Central

Schwarz, G. (1978): “Estimating the dimension of a model,” Ann. Stat., 6, 461–464.10.1214/aos/1176344136Suche in Google Scholar

Teo, S. M., Y. Pawitan, C. S. Ku, K. S. Chia and A. Salim (2012): “Statistical challenges associated with detecting copy number,” Bioinformatics, 28, 2711–2718.10.1093/bioinformatics/bts535Suche in Google Scholar PubMed

Tibshirani, R. J. (1996): “Regression shrinkage and selection via the LASSO,” J. Royal Stat. Soc., 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar

Tibshirani, R. and P. Wang (2008): “Spatial smoothing and hot spot detection for CGH data using the fused lasso,” Biostatistics, 9, 18–29.10.1093/biostatistics/kxm013Suche in Google Scholar PubMed

Tibshirani, R. J. and J. Taylor (2011): “The solution path of the generalized LASSO,” Ann. Stat., 39, 1335–1371.10.1214/11-AOS878Suche in Google Scholar

Tibshirani, R., M. A. Saunders, S. Rosset, J. Zhu and K. Knight (2005): “Sparsity and smoothness via the fused LASSO,” J. Royal Stat. Soc., 67, 91–108.10.1111/j.1467-9868.2005.00490.xSuche in Google Scholar

Wang, P., Y. Kim, J. Pollack, B. Narasimhan and R. Tibshirani (2005): “A method for calling gains and losses in array CGH data,” Biostatistics, 6, 45–58.10.1093/biostatistics/kxh017Suche in Google Scholar PubMed

Xie, C. and M. Tammi (2009): “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, 10, 80.10.1186/1471-2105-10-80Suche in Google Scholar PubMed PubMed Central

Yao, Y. C. and S. T. Au (1989): “Least-squares estimation of a step function,” Sankhya Ser. A, 51, 370–381.Suche in Google Scholar

Zhang, N. R. and D. O. Siegmund (2007): “A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data,” Biometrics, 63, 22–32.10.1111/j.1541-0420.2006.00662.xSuche in Google Scholar PubMed


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0001).


Published Online: 2019-05-30

©2019 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 28.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2018-0001/html?lang=de
Button zum nach oben scrollen