Abstract
Parallel computing architectures are proven to significantly shorten computation time for different clustering algorithms. Nonetheless, some characteristics of the architecture limit the application of graphics processing units (GPUs) for biclustering task, whose function is to find focal similarities within the data. This might be one of the reasons why there have not been many biclustering algorithms proposed so far. In this article, we verify if there is any potential for application of complex biclustering calculations (CPU+GPU). We introduce minimax with Pearson correlation – a complex biclustering method. The algorithm utilizes Pearson’s correlation to determine similarity between rows of input matrix. We present two implementations of the algorithm, sequential and parallel, which are dedicated for heterogeneous environments. We verify the weak scaling efficiency to assess if a heterogeneous architecture may successfully shorten heavy biclustering computation time.
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research was funded by the Polish National Science Center (Narodowe Centrum Nauki, grant no. 2013/11/N/ST6/03204). This research was supported in part by PL-Grid Infrastructure.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.
References
1. Cheng Y, Church G. Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology 2000;8:93–103.Search in Google Scholar
2. Eren K, Deveci M, Küçüktunç O, Çatalyürek Ü. A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform 2013;14:279–92.10.1093/bib/bbs032Search in Google Scholar PubMed PubMed Central
3. Madeira S, Oliveira A. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 2004;1:24–45.10.1109/TCBB.2004.2Search in Google Scholar PubMed
4. Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006;22:1122–9.10.1093/bioinformatics/btl060Search in Google Scholar PubMed
5. Bisson G, Hussain F. Chi-sim: a new similarity measure for the co-clustering task. In: Seventh International Conference on Machine Learning and Applications, ICMLA ‘08, December 2008:211–7.10.1109/ICMLA.2008.103Search in Google Scholar
6. Busygin S, Prokopyev O, Pardalos PM. Biclustering in data mining. Comput Oper Res 2008;35:2964–87.10.1016/j.cor.2007.01.005Search in Google Scholar
7. de Franca F, Coelho G, Zuben FV. Predicting missing values with biclustering: a coherence-based approach. Pattern Recog 2013;46:1255–66.10.1016/j.patcog.2012.10.022Search in Google Scholar
8. Cristovao F, Madeira S. Parallel e-ccc-biclustering: mining approximate temporal patterns in gene expression time series using parallel biclustering. In: Rocha MP, Luscombe N, Fdez-Riverola F, Rodríguez JM, editors. 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, Adv Intell Soft Comput 2012;154:21–31. Berlin, Heidelberg: Springer-Verlag. http://dx.doi.org/10.1007/978-3-642-28839-5_3.10.1007/978-3-642-28839-5_3Search in Google Scholar
9. Liu B, Xin Y, Cheung RC, Yan H. GPU-based biclustering for microarray data analysis in neurocomputing. Neurocomputing 2014;134:239–46.10.1016/j.neucom.2013.06.049Search in Google Scholar
10. Lo A, Liu B, Cheung R. GPU-based biclustering for neural information processing. In: Huang T, Zeng Z, Li C, Leung C, editors. Neural information processing, Lecture Notes Comput Sci 2012;7667:134–41. Berlin, Heidelberg: Springer-Verlag. http://dx.doi.org/10.1007/978-3-642-34500-5_17.10.1007/978-3-642-34500-5_17Search in Google Scholar
11. Mejia-Roa E, Garcia C, Gomez JI, Prieto M, Tirado F, Nogales R, et al. Biclustering and classification analysis in gene expression using nonnegative matrix factorization on multi-GPU systems. In: 11th International Conference on Intelligent Systems Design and Applications (ISDA). Cordoba, Spain: IEEE 2011:882–7.10.1109/ISDA.2011.6121769Search in Google Scholar
12. Aguilar-Ruiz J. Shifting and scaling patterns from gene expression data. Bioinformatics 2005;21:3840–5.10.1093/bioinformatics/bti641Search in Google Scholar PubMed
13. Bozdağ D, Parvin JD, Catalyurek UV. A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Proceedings of the 1st International Conference on Bioinformatics and Computational Biology, BICoB ‘09. Berlin, Heidelberg: Springer-Verlag, 2009:151–63. http://dx.doi.org/10.1007/978-3-642-00727-9_16.10.1007/978-3-642-00727-9_16Search in Google Scholar
14. Ben-Dor A, Chor B, Karp R, Yakhini Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB ‘02, ACM, New York, NY, USA, 2002:49–57. http://doi.acm.org/10.1145/565196.565203.10.1145/565196.565203Search in Google Scholar
15. Li G, Ma Q, Tang H, Paterson A, Xu Y. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucl Acids Res 2009;37:e101.10.1093/nar/gkp491Search in Google Scholar PubMed PubMed Central
16. Orzechowski P, Boryczko K. Effective biclustering on GPU – capabilities and constraints. Prz Elektrotechniczn 2015;1:133–6.10.15199/48.2015.08.31Search in Google Scholar
17. NVIDIA Corporation: CUDA C Programming Guide 2014, pG-02829-001_v6.0.Search in Google Scholar
18. Lazzeroni L, Owen A. Plaid models for gene expression data. Stat Sin 2002;12:61–86.Search in Google Scholar
19. Murali T, Kasif S. Extracting conserved gene expression motifs from gene expression data. In: Proceeding of the Pacific Symposium on Biocomputing, vol. 3, 2003:77–88.Search in Google Scholar
20. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995;57:289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar
©2015 by De Gruyter
Articles in the same Issue
- Frontmatter
- Reviews
- Genetic background of urinary incontinence – state-of-the-art and perspectives
- Modeling of metabolic diseases – a review of selected methods
- Research Articles
- Evaluation of Emotiv EEG neuroheadset
- Experimental research and modeling of selected layers of the trachea based on EBUS
- Application of S-transform to signal analysis
- The application of spherical standards for the evaluation of the accuracy of mapping shape in computed tomography
- Branched iterated function system (IFS) models with positioners for biological visualizations
- Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets
- Research on dynamics of the knee joint for different types of loads
- Modeling of lumbar spine equipped with fixator
Articles in the same Issue
- Frontmatter
- Reviews
- Genetic background of urinary incontinence – state-of-the-art and perspectives
- Modeling of metabolic diseases – a review of selected methods
- Research Articles
- Evaluation of Emotiv EEG neuroheadset
- Experimental research and modeling of selected layers of the trachea based on EBUS
- Application of S-transform to signal analysis
- The application of spherical standards for the evaluation of the accuracy of mapping shape in computed tomography
- Branched iterated function system (IFS) models with positioners for biological visualizations
- Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets
- Research on dynamics of the knee joint for different types of loads
- Modeling of lumbar spine equipped with fixator