Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets

Patryk Orzechowski; Krzysztof Boryczko

doi:10.1515/bams-2015-0033

Article

Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets

Patryk Orzechowski and Krzysztof Boryczko

Published/Copyright: December 2, 2015

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal Bio-Algorithms and Med-Systems Volume 11 Issue 4

Abstract

Parallel computing architectures are proven to significantly shorten computation time for different clustering algorithms. Nonetheless, some characteristics of the architecture limit the application of graphics processing units (GPUs) for biclustering task, whose function is to find focal similarities within the data. This might be one of the reasons why there have not been many biclustering algorithms proposed so far. In this article, we verify if there is any potential for application of complex biclustering calculations (CPU+GPU). We introduce minimax with Pearson correlation – a complex biclustering method. The algorithm utilizes Pearson’s correlation to determine similarity between rows of input matrix. We present two implementations of the algorithm, sequential and parallel, which are dedicated for heterogeneous environments. We verify the weak scaling efficiency to assess if a heterogeneous architecture may successfully shorten heavy biclustering computation time.

Keywords: biclustering; data mining; graphics processing unit (GPU); OpenCL; parallel algorithms

Corresponding author: Patryk Orzechowski, Faculty of Electrical Engineering, Department of Automatics and Bioengineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Science and Technology, Mickiewicza Av. 30, 30-059 Cracow, Poland, E-mail: patrick@agh.edu.pl

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research was funded by the Polish National Science Center (Narodowe Centrum Nauki, grant no. 2013/11/N/ST6/03204). This research was supported in part by PL-Grid Infrastructure.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Cheng Y, Church G. Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology 2000;8:93–103.Search in Google Scholar

2. Eren K, Deveci M, Küçüktunç O, Çatalyürek Ü. A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform 2013;14:279–92.10.1093/bib/bbs032Search in Google Scholar PubMed PubMed Central

3. Madeira S, Oliveira A. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 2004;1:24–45.10.1109/TCBB.2004.2Search in Google Scholar PubMed

4. Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006;22:1122–9.10.1093/bioinformatics/btl060Search in Google Scholar PubMed

5. Bisson G, Hussain F. Chi-sim: a new similarity measure for the co-clustering task. In: Seventh International Conference on Machine Learning and Applications, ICMLA ‘08, December 2008:211–7.10.1109/ICMLA.2008.103Search in Google Scholar

6. Busygin S, Prokopyev O, Pardalos PM. Biclustering in data mining. Comput Oper Res 2008;35:2964–87.10.1016/j.cor.2007.01.005Search in Google Scholar

7. de Franca F, Coelho G, Zuben FV. Predicting missing values with biclustering: a coherence-based approach. Pattern Recog 2013;46:1255–66.10.1016/j.patcog.2012.10.022Search in Google Scholar

8. Cristovao F, Madeira S. Parallel e-ccc-biclustering: mining approximate temporal patterns in gene expression time series using parallel biclustering. In: Rocha MP, Luscombe N, Fdez-Riverola F, Rodríguez JM, editors. 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, Adv Intell Soft Comput 2012;154:21–31. Berlin, Heidelberg: Springer-Verlag. http://dx.doi.org/10.1007/978-3-642-28839-5_3.10.1007/978-3-642-28839-5_3Search in Google Scholar

9. Liu B, Xin Y, Cheung RC, Yan H. GPU-based biclustering for microarray data analysis in neurocomputing. Neurocomputing 2014;134:239–46.10.1016/j.neucom.2013.06.049Search in Google Scholar

10. Lo A, Liu B, Cheung R. GPU-based biclustering for neural information processing. In: Huang T, Zeng Z, Li C, Leung C, editors. Neural information processing, Lecture Notes Comput Sci 2012;7667:134–41. Berlin, Heidelberg: Springer-Verlag. http://dx.doi.org/10.1007/978-3-642-34500-5_17.10.1007/978-3-642-34500-5_17Search in Google Scholar

11. Mejia-Roa E, Garcia C, Gomez JI, Prieto M, Tirado F, Nogales R, et al. Biclustering and classification analysis in gene expression using nonnegative matrix factorization on multi-GPU systems. In: 11th International Conference on Intelligent Systems Design and Applications (ISDA). Cordoba, Spain: IEEE 2011:882–7.10.1109/ISDA.2011.6121769Search in Google Scholar

12. Aguilar-Ruiz J. Shifting and scaling patterns from gene expression data. Bioinformatics 2005;21:3840–5.10.1093/bioinformatics/bti641Search in Google Scholar PubMed

13. Bozdağ D, Parvin JD, Catalyurek UV. A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Proceedings of the 1st International Conference on Bioinformatics and Computational Biology, BICoB ‘09. Berlin, Heidelberg: Springer-Verlag, 2009:151–63. http://dx.doi.org/10.1007/978-3-642-00727-9_16.10.1007/978-3-642-00727-9_16Search in Google Scholar

14. Ben-Dor A, Chor B, Karp R, Yakhini Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB ‘02, ACM, New York, NY, USA, 2002:49–57. http://doi.acm.org/10.1145/565196.565203.10.1145/565196.565203Search in Google Scholar

15. Li G, Ma Q, Tang H, Paterson A, Xu Y. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucl Acids Res 2009;37:e101.10.1093/nar/gkp491Search in Google Scholar PubMed PubMed Central

16. Orzechowski P, Boryczko K. Effective biclustering on GPU – capabilities and constraints. Prz Elektrotechniczn 2015;1:133–6.10.15199/48.2015.08.31Search in Google Scholar

17. NVIDIA Corporation: CUDA C Programming Guide 2014, pG-02829-001_v6.0.Search in Google Scholar

18. Lazzeroni L, Owen A. Plaid models for gene expression data. Stat Sin 2002;12:61–86.Search in Google Scholar

19. Murali T, Kasif S. Extracting conserved gene expression motifs from gene expression data. In: Proceeding of the Pacific Symposium on Biocomputing, vol. 3, 2003:77–88.Search in Google Scholar

20. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995;57:289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Received: 2015-9-25

Accepted: 2015-10-20

Published Online: 2015-12-2

Published in Print: 2015-12-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/bams-2015-0033

Keywords for this article

biclustering; data mining; graphics processing unit (GPU); OpenCL; parallel algorithms