Abstract
Copy number alteration (CNA) data have been collected to study disease related chromosomal amplifications and deletions. The CUSUM procedure and related plots have been used to explore CNA data. In practice, it is possible to observe outliers. Then, modifications of the CUSUM procedure may be required. An outlier reset modification of the CUSUM (ORCUSUM) procedure is developed in this paper. The threshold value for detecting outliers or significant CUSUMs can be derived using results for sums of independent truncated normal random variables. Bartel’s non-parametric test for autocorrelation is also introduced to the analysis of copy number variation data. Our simulation results indicate that the ORCUSUM procedure can still be used even in the situation where the degree of autocorrelation level is low. Furthermore, the results show the outlier’s impact on the traditional CUSUM’s performance and illustrate the advantage of the ORCUSUM’s outlier reset feature. Additionally, we discuss how the ORCUSUM can be applied to examine CNA data with a simulated data set. To illustrate the procedure, recently collected single nucleotide polymorphism (SNP) based CNA data from The Cancer Genome Atlas (TCGA) Research Network is analyzed. The method is applied to a data set collected in an ovarian cancer study. Three cytogenetic bands (cytobands) are considered to illustrate the method. The cytobands 11q13 and 9p21 have been shown to be related to ovarian cancer. They are presented as positive examples. The cytoband 3q22, which is less likely to be disease related, is presented as a negative example. These results illustrate the usefulness of the ORCUSUM procedure as an exploratory tool for the analysis of SNP based CNA data.
References
Aravidis, C., A. D. Panani, Z. Kosmaidou, N. Thomakos, A. Rodolakis and A. Antsaklis (2012): “Detection of numerical abnormalities of chromosome 9 and p16/cdkn2a gene alterations in ovarian cancer with fish analysis,” Anticancer Res., 32, 5309–5313.Suche in Google Scholar
Bartels, R. (1982): “The rank version of von neumann’s ratio test for randomness,” J. Am. Stat. Assoc., 77, 40–46.Suche in Google Scholar
Birnbaum, Z. W. and F. C. Andrews (1949): “On sums of symmetrically truncated normal random variables,” Ann. Math. Stat., 20, 458–461.Suche in Google Scholar
Chen, H., H. Xing and N. R. Zhang (2011): “Estimation of parent specific dna copy number in tumors using high-density genotyping arrays,” PLoS Comput. Biol., 7:e1001060.Suche in Google Scholar
Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson and E. S. Lander (2009): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nat. Methods, 6, 99–103.Suche in Google Scholar
Hawkins, D. M. and D. H. Olwell (1998): Cumulative sum charts and charting for quality improvement, New York, NY, USA: Springer.10.1007/978-1-4612-1686-5Suche in Google Scholar
Hui, W., Y. R. Gel and J. L. Gastwirth (2008): “Lawstat: an r package for law, public policy and biostatistics,” J. Stat. Software, 28. http://www.jstatsoft.org/v28/i03.Suche in Google Scholar
Li, W., A. Lee and P. K Gregersen (2009): “Copy-number-variation and copy-number-alteration region detection by cumulative plots,” BMC Bioinformatics, 10(Suppl 1):S67.10.1186/1471-2105-10-S1-S67Suche in Google Scholar PubMed PubMed Central
Lockhart, D. J., H. Dong, M. C. Byrne, M. T. Follettie, M. V. Gallo, M. S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Norton and E. L. Brown (1996): “Expression monitoring by hybridization to high-density oligonuleotide arrays,” Nat. Biotechnol., 14, 1675–1680.Suche in Google Scholar
McDaniel, S., J. Minnier, R. A. Betensky, G. Mohapatra, Y. Shen, J. F. Gusella, D. N. Louis and T. Cai (2010): “Assessing population level genetic instability via moving average,” Stat. Biosci., 2, 120–136.Suche in Google Scholar
McLachlan, G. and D. Peel (2000): Finite mixture models. Wiley series in probability and statistics, New York, NY, USA: John Wiley & Sons, Inc.10.1002/0471721182Suche in Google Scholar
Niu, Y. S. and H. Zhang (2012): “The screening and ranking algorithm to detect dna copy number variations,” Ann. Appl. Stat., 6, 1306–1326.Suche in Google Scholar
Olshen, A. B., H. Bengtsson, P. Neuvial, P. T. Spellman, R. A. Olshen and V. E. Seshan (2011): “Parent-specific copy number in paired tumor-normal studies using circular binary segmentation,” Bioinformatics, 27, 2038–2046.10.1093/bioinformatics/btr329Suche in Google Scholar PubMed PubMed Central
Pejovic, T. (1995): “Genetic changes in ovarian cancer,” Ann. Med., 27, 73–78.Suche in Google Scholar
Schena, M., D. Shalon, R. W. Davis and P. O. Brown (1995): “Quantitative monitoring of gene expression patterns with a complementary dna microarray,” Science, 270, 467–470.10.1126/science.270.5235.467Suche in Google Scholar PubMed
The Cancer Genome Atlas Network (2008): “Comprehensive genomic characterization defines human glioblastoma genes and core pathways,” Nature, 455, 1061–1068.10.1038/nature07385Suche in Google Scholar PubMed PubMed Central
Tukey, J. W. (1960): A survey of sampling from contaminated distributions. In contributions to probability and statistics, Stanford, California: Stanford University Press.Suche in Google Scholar
Weitzel, J. N., J. Patel, D. M. Smith, A. Goodman, H. Safaii and H. G. Ball (1994): “Molecular genetic changes associated with ovarian cancer,” Gynecol. Oncol., 55, 245–252.Suche in Google Scholar
Zhao, X., C. Li, J. G. Paez, K. Chin, P. A. Janne, T.-H. Chen, L. Girard, J. Minna, D. Christiani, C. Leo, J. W. Gray, W. R. Sellers and M. Meyerson (2004): “An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays,” Cancer Res., 64, 3060–3071.Suche in Google Scholar
©2015 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Exact likelihood-free Markov chain Monte Carlo for elliptically contoured distributions
- Outlier reset CUSUM for the exploration of copy number alteration data
- Simultaneous Bayesian analysis of contingency tables in genetic association studies
- Modeling the next generation sequencing read count data for DNA copy number variant study
- Synonymous and nonsynonymous distances help untangle convergent evolution and recombination
- Node sampling for protein complex estimation in bait-prey graphs
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Exact likelihood-free Markov chain Monte Carlo for elliptically contoured distributions
- Outlier reset CUSUM for the exploration of copy number alteration data
- Simultaneous Bayesian analysis of contingency tables in genetic association studies
- Modeling the next generation sequencing read count data for DNA copy number variant study
- Synonymous and nonsynonymous distances help untangle convergent evolution and recombination
- Node sampling for protein complex estimation in bait-prey graphs