Startseite Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

  • Konstantina Charmpi und Bernard Ycart EMAIL logo
Veröffentlicht/Copyright: 30. Mai 2015

Abstract

Gene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

AMS Subject Classification: Primary 62F03; Secondary 60F17

Corresponding author: Bernard Ycart, 51 rue des Mathématiques, 38041 GRENOBLE cedex 9, France; Université Grenoble Alpes, France; Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France; and Laboratoire d’Excellence TOUCAN, Toulouse, France, e-mail:

Acknowledgments

This work was supported by Laboratoire d’Excellence TOUCAN (Toulouse Cancer). The authors are grateful to Sophie Rousseaux and Jean-Jacques Fournié for using the WKS test on many different datasets; their feedback helped improving the code. They are also indebted to the reviewers for important remarks and suggestions.

Funding: Labex Toucan (Toulouse Cancer).

References

Acevedo, L. G., M. Bieda, R. Green and P. J. Farnham (2008): “Analysis of the mechanisms mediating tumor-specific changes in gene expression in human liver tumors,” Cancer Res., 68(8), 2641–2651.Suche in Google Scholar

Arnold, T. B. and J. W. Emerson (2011): “Nonparametric goodness-of-fit tests for discrete null distributions,” R Journal, 3/2, 34–39.10.32614/RJ-2011-016Suche in Google Scholar

Barbie, D. A., P. Tamayo, J. S. Boehm, S. Y. Kim, S. E. Moody, I. F. Dunn, A. C. Schinzel, P. Sandy, E. Meylan, C. Scholl, S. Fröhling, E. M. Chan, M. L. Sos, K. Michel, C. Mermel, S. J. Silver, B. A. Weir, J. H. Reiling, Q. Sheng, P. B. Gupta, R. C. Wadlow, H. Le, S. Hoersch, B. S. Wittner, S. Ramaswamy, D. M. Livingston, D. M. Sabatini, M. Meyerson, R. K. Thomas, E. S. Lander, J. P. Mesirov, D. E. Root, D. G. Gilliland, T. Jacks and W. C. Hahn (2009): “Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1,” Nature, 462(7269), 108–112.10.1038/nature08460Suche in Google Scholar PubMed PubMed Central

Barretina, J., G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel and L. A. Garraway (2012): “The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity,” Nature, 483(7391), 603–607.10.1038/nature11003Suche in Google Scholar PubMed PubMed Central

Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Statist., 29(4), 1165–1188.Suche in Google Scholar

Bild, A. and P. G. Febbo (2005): “Application of a priori established gene sets to discover biologically important differential expression in microarray data,” PNAS 102(43), 15278–15279.10.1073/pnas.0507477102Suche in Google Scholar PubMed PubMed Central

Carlson, M. (2012): “org.Hs.eg.db: Genome wide annotation for Human,” R package version 2.8.0.Suche in Google Scholar

Carlson, M. “hgug4110b.db: Agilent Human 1A (V2) annotation data (chip hgug4110b),” R package version 2.14.0.Suche in Google Scholar

Dudoit, S. and M. van der Laan (2007): Multiple testing procedures with applications to genomics, New York: Springer.10.1007/978-0-387-49317-6Suche in Google Scholar

Edgar, R., M. Domrachev and A. E. Lash (2002): “Gene expression omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Res., 30(1), 207–210.Suche in Google Scholar

Frei, E., C. Visco, Z. Y. Xu-Monette, S. Dirnhofer, K. Dybkær, A. Orazi, G. Bhagat, E. D. Hsi, J. H. van Krieken, M. Ponzoni, R. S. Go, M. A. Piris, M. B. Møller, K. H. Young and A. Tzankov (2013): “Addition of rituximab to chemotherapy overcomes the negative prognostic impact of cyclin E expression in diffuse large B-cell lymphoma,” J. Clin. Pathol., 66(11), 956–961.10.1136/jclinpath-2013-201619Suche in Google Scholar PubMed

Goeman, J. J. and P. Bühlmann (2007): “Analyzing gene expression data in terms of gene sets: methodological issues,” Bioinformatics, 23(8), 980–987.10.1093/bioinformatics/btm051Suche in Google Scholar PubMed

Héritier, S., E. Cantoni, S. Copt and M. P. Victoria-Feser (2009): Robust methods in biostatistics, New York: Wiley.10.1002/9780470740538Suche in Google Scholar

Herschkowitz, J. I., K. Simin, V. J. Weigman, I. Mikaelian, J. Usary, Z. Hu, K. E. Rasmussen, L. P. Jones, S. Assefnia, S. Chandrasekharan, M. G. Backlund, Y. Yin, A. I. Khramtsov, R. Bastein, J. Quackenbush, R. I. Glazer, P. H. Brown, J. E. Green, L. Kopelovich, P. A. Furth, J. P. Palazzo, O. I. Olopade, P. S. Bernard, G. A. Churchill, T. Van Dyke and C. M. Perou (2007): “Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors,” Genome Biol., 8(5), R76.Suche in Google Scholar

Huang, D. W., B. T. Sherman and R. A. Lempicki (2009): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Res., 37(1), 1–13.Suche in Google Scholar

Irizarry, R. A., C. Wang, Y. Zhou and T. P. Speed (2009): “Gene set enrichment analysis made simple,” Stat. Methods Med. Res., 18(6), 565–575.10.1177/0962280209351908Suche in Google Scholar PubMed PubMed Central

Kim, S. Y. and D. J. Volsky (2005): “PAGE: parametric analysis of gene set enrichment,” BMC Bioinformatics, 6, 144.10.1186/1471-2105-6-144Suche in Google Scholar PubMed PubMed Central

Kosorok, M. R. (2008): Introduction to empirical processes and semiparametric inference, New York: Springer.10.1007/978-0-387-74978-5Suche in Google Scholar

Marisa, L., A. de Reyniès, A. Duval, J. Selves, M. P. Gaub, L. Vescovo, M. C. Etienne-Grimaldi, R. Schiappa, D. Guenot, M. Ayadi, S. Kirzin, M. Chazal, J. F. Fléjou, D. Benchimol, A. Berger, A. Lagarde, E. Pencreach, F. Piard, D. Elias, Y. Parc, S. Olschwang, G. Milano, P. Laurent-Puig and V. Boige (2013): “Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value,” PLoS Med., 10(5), e1001453.Suche in Google Scholar

Mayerle, J., C. M. den Hoed, C. Schurmann, L. Stolk, G. Homuth, M. J. Peters, L. G. Capelle, K. Zimmermann, F. Rivadeneira, S. Gruska, H. Völzke, A. C. de Vries, U. Völker, A. Teumer, J. B. van Meurs, I. Steinmetz, M. Nauck, F. Ernst, F. U. Weiss, A. Hofman, M. Zenker, H. K. Kroemer, H. Prokisch, A. G. Uitterlinden, M. M. Lerch and E. J. Kuipers (2013): “Identification of genetic loci associated with Helicobacter pylori serologic status,” J. Am. Med. Assoc., 309(18), 1912–1920.10.1001/jama.2013.4350Suche in Google Scholar PubMed

Mikheev, A. M., T. Nabekura, A. Kaddoumi, T. K. Bammler, R. Govindarajan, M. F. Hebert and J. D. Unadkat (2008): “Profiling gene expression in human placentae of different gestational ages: an OPRU network and UW SCOR study,” Reprod. Sci., 15(9), 866–877.Suche in Google Scholar

Mootha, V. K., C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstråle, E. Laurila, N. Houstis, M. J. Daly, N. Patterson, J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler and L. C. Groop (2003): “PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes,” Nat. Genet., 34(3), 267–273.Suche in Google Scholar

Nam, D. and S. Y. Kim (2008): “Gene-set approach for expression pattern analysis,”Brief. Bioinform., 9(3), 189–197.Suche in Google Scholar

Obermoser, G., S. Presnell, K. Domico, H. Xu, Y. Wang, E. Anguiano, L. Thompson-Snipes, R. Ranganathan, B. Zeitner, A. Bjork, D. Anderson, C. Speake, E. Ruchaud, J. Skinner, L. Alsina, M. Sharma, H. Dutartre, A. Cepika, E. Israelsson, P. Nguyen, Q. A. Nguyen, A. C. Harrod, S. M. Zurawski, V. Pascual, H. Ueno, G. T. Nepom, C. Quinn, D. Blankenship, K. Palucka, J. Banchereau and D. Chaussabel (2013): “Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines,” Immunity, 38(4), 831–844.10.1016/j.immuni.2012.12.008Suche in Google Scholar PubMed PubMed Central

R Core Team (2013): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, ISBN 3-900051-07-0.Suche in Google Scholar

Sauer, T. (2013): “Computational solution of stochastic differential equations,” WIREs Comput. Stat., 5(5), 362–371.Suche in Google Scholar

Seok, J., H. S. Warren, A. G. Cuenca, M. N. Mindrinos, H. V. Baker, W. Xu, D. R. Richards, G. P. McDonald-Smith, H. Gao, L. Hennessy, C. C. Finnerty, C. M. López, S. Honari, E. E. Moore, J. P. Minei, J. Cuschieri, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, M. G. Jeschke, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. H. Brownstein, C. Miller-Graziano, S. E. Calvano, P. H. Mason, J. P. Cobb, L. G. Rahme, S. F. Lowry, R. V. Maier, L. L. Moldawer, D. N. Herndon, R. W. Davis, W. Xiao and R. G. Tompkins; Inflammation and Host Response to Injury, Large Scale Collaborative Research Program (2013): “Genomic responses in mouse models poorly mimic human inflammatory diseases,” PNAS, 110(9), 3507–3512.10.1073/pnas.1222878110Suche in Google Scholar PubMed PubMed Central

Shorack, G. R. and J. A. Wellner (1986): Empirical processes with applications to statistics, New York: Wiley.Suche in Google Scholar

Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander and J. P. Mesirov (2005): “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” PNAS, 102(43), 15545–15550, URL http://www.pnas.org/content/102/43/15545.full.10.1073/pnas.0506580102Suche in Google Scholar PubMed PubMed Central

Subramanian, A., H. Kuehn, J. Gould, P. Tamayo and J. P. Mesirov (2007): “Gsea-P: a desktop application for gene set enrichment analysis,” Bioinformatics, 23(23), 3251–3253.10.1093/bioinformatics/btm369Suche in Google Scholar PubMed

Tarca, A. L., G. Bhatti and R. Romero (2013): “A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity,” PLoS One, 8(11), e79217.10.1371/journal.pone.0079217Suche in Google Scholar PubMed PubMed Central

Tsodikov, A., A. Szabo and D. Jones (2002): “Adjustments and measures of differential expression for microarray data,” Bioinformatics, 18(2), 251–260.10.1093/bioinformatics/18.2.251Suche in Google Scholar PubMed

Westra, H. J., M. J. Peters, T. Esko, H. Yaghootkar, C. Schurmann, J. Kettunen, M. W. Christiansen, B. P. Fairfax, K. Schramm, J. E. Powell, A. Zhernakova, D. V. Zhernakova, J. H. Veldink, L. H. Van den Berg, J. Karjalainen, S. Withoff, A. G. Uitterlinden, A. Hofman, F. Rivadeneira, P. A. 't Hoen, E. Reinmaa, K. Fischer, M. Nelis, L. Milani, D. Melzer, L. Ferrucci, A. B. Singleton, D. G. Hernandez, M. A. Nalls, G. Homuth, M. Nauck, D. Radke, U. Völker, M. Perola, V. Salomaa, J. Brody, A. Suchy-Dicey, S. A. Gharib, D. A. Enquobahrie, T. Lumley, G. W. Montgomery, S. Makino, H. Prokisch, C. Herder, M. Roden, H. Grallert, T. Meitinger, K. Strauch, Y. Li, R. C. Jansen, P. M. Visscher, J. C. Knight, B. M. Psaty, S. Ripatti, A. Teumer, T. M. Frayling, A. Metspalu, J. B. van Meurs and L. Franke (2013): “Systematic identification of trans eQTLs as putative drivers of known disease associations,” Nat. Genet., 45(10), 1238–1243.Suche in Google Scholar

Wu, D. and G. K. Smyth (2012): “Camera: a competitive gene set test accounting for inter-gene correlation,” Nucleic Acids Res., 40(17), e133.Suche in Google Scholar

Xiao, W., M. N. Mindrinos, J. Seok, J. Cuschieri, A. G. Cuenca, H. Gao, D. L. Hayden, L. Hennessy, E. E. Moore, J. P. Minei, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, B. H. Brownstein, P. H. Mason, H. V. Baker, C. C. Finnerty, M. G. Jeschke, M. C. López, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. Arnoldo, W. Xu, Y. Zhang, S. E. Calvano, G. P. McDonald-Smith, D. A. Schoenfeld, J. D. Storey, J. P. Cobb, H. S. Warren, L. L. Moldawer, D. N. Herndon, S. F. Lowry, R. V. Maier, R. W. Davis and R. G. Tompkins; Inflammation and Host Response to Injury Large-Scale Collaborative Research Program (2011): “A genomic storm in critically injured humans,” J. Exp. Med., 208(13), 2581–2590.10.1084/jem.20111354Suche in Google Scholar PubMed PubMed Central

Ycart, B., F. Pont and J. J. Fournié (2014): “Curbing false discovery rates in interpretation of genome-wide expression profiles,” J. Biomed. Inform., 47, 58–61.Suche in Google Scholar

Published Online: 2015-5-30
Published in Print: 2015-6-1

©2015 by De Gruyter

Heruntergeladen am 17.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2014-0077/pdf?lang=de
Button zum nach oben scrollen