A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble

Yinghui Zhang

doi:10.1515/jisys-2017-0513

Article Open Access

A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble

Yinghui Zhang

Published/Copyright: December 30, 2017

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Intelligent Systems Volume 29 Issue 1

Abstract

Co-clustering is used to analyze the row and column clusters of a dataset, and it is widely used in recommendation systems. In general, different co-clustering models often obtain very different results for a dataset because each algorithm has its own optimization criteria. It is an alternative way to combine different co-clustering results to produce a final one for improving the quality of co-clustering. In this paper, a semi-supervised co-clustering ensemble is illustrated in detail based on semi-supervised learning and ensemble learning. A semi-supervised co-clustering ensemble is a framework for combining multiple base co-clusterings and the side information of a dataset to obtain a stable and robust consensus co-clustering. First, the objective function of the semi-supervised co-clustering ensemble is formulated according to normalized mutual information. Then, a kernel probabilistic model for semi-supervised co-clustering ensemble (KPMSCE) is presented and the inference of KPMSCE is illustrated in detail. Furthermore, the corresponding algorithm is designed. Moreover, different algorithms and the proposed algorithm are used for experiments on real datasets. The experimental results demonstrate that the proposed algorithm can significantly outperform the compared algorithms in terms of several indices.

Keywords: Co-clustering; co-cluster ensemble; semi-supervised learning; kernel probabilistic model; recommend system

MSC 2010: 97R40

1 Introduction

Co-clustering [4], [5] has recently received much attention in recommendation system applications. Co-clustering and the motivations were first illustrated in a paper [16]. The term co-clustering was later used by Mirkin. A co-clustering algorithm based on variance [5] was proposed for biological gene expression data analysis, in one of the most important literature reports in gene expression analysis. There are also some research works [6], [7] that applied cluster algorithms for bio-information processes. Two algorithms [9] were presented to apply co-clustering to documents and words. The two algorithms were designed based on bipartite spectral graph partitioning and information theory. A co-clustering algorithm-based weighted Bregman distance instead of KL distance [2] was proposed, and the algorithm is suitable for any kind of matrix. A new preference-based multi-objective optimization algorithm [12] was proposed to compete with the gradient ascent approach. The proposed approach use multiple heuristics to process the co-clustering problem, and it also makes a preference selection through the gradient ascent algorithm and the heuristic. A scalable algorithm [18] was designed to co-cluster massive, sparse, and high-dimensional data, and to combine individual clustering results to produce a better final result. The proposed algorithm is particularly suitable for distributed computing environment, which have been revealed in the experiments, and it is implemented on Hadoop platform with MapReduce programming framework in practice. For higher-order data, the authors developed [28] a new tensor spectral co-clustering (SCC) method that applies to any non-negative tensor of data. The authors presented a new distributed framework [8] to support efficient implementations of the algorithms with sequential updates, and the framework was evaluated on both a local cluster of machines and the Amazon EC2 cloud.

Co-clustering results are improved by an ensemble learning technique. An ensemble approach [13] is used to improve the performance of these co-clustering methods. The bagged co-clustering method generates a collection of co-clusters by using the bootstrap samples of the original data and aggregates them into new co-clusters. The principle consists in generating a set of co-clusters and aggregating the results. A novel ensemble technique for co-clustering solutions using mutual information [1] is presented. Asteris et al. [3] firstly presented the algorithm with provable approximation guarantees for Max-Agree, which relied on formulating the algorithm as a constrained bilinear maximization over the sets of cluster assignment matrices. Ensemble [17], [25] is a very popular way to improve the accuracy, robustness, and flexibility of learning. A novel robust spectral ensemble clustering [20], [24] approach is proposed for the cluster ensemble, which learns a robust representation for the co-association matrix through a low-rank constraint. Random projection [23] is used in ensemble fuzzy clustering. A co-clustering ensemble method [11] is used to overcome some limitations through repeatedly applying the plaid model with different parameters. Hanczar and Nadif [14] proposed a new method that can improve the accuracies of co-clustering with the ensemble methods, and they [15] also used a bagging approach for gene expression data. An ensemble co-clustering can be formalized by using a binary tri-clustering problem. The author designed a simple and efficient algorithm for the co-clustering problem described above. In order to generate more diverse and high-quality co-clusters to be fused through an ensemble perspective, the author have adopted a well-known multi-modal particle swarm optimization algorithm [21]. An ensemble method for the co-clustering problem [1] that uses optimization techniques to generate consensus is presented. Manifold ensemble learning [19] is used to improve the co-clustering performance, which aims to maximally approximate the intrinsic manifolds of both the feature and sample spaces. An approach [14] is proposed to improve the performance of co-clustering. It is shown that ensemble co-clustering can be considered a problem of binary tri-clustering and the problem can be solved by the proposed algorithm. Except for the co-clustering ensemble algorithms described above, there are also several semi-supervised co-clustering ensemble algorithms. Wang et al. [27] proposed a non-parametric Bayesian approach to co-clustering ensembles. Similar to clustering ensembles, co-clustering ensembles combine several base co-clustering results to obtain a final co-clustering that is a more robust consensus co-clustering. Pio et al. [22] used the co-clustering method to discover the miRNA regulatory networks. Teng and Tan [26] proposed a semi-supervised co-clustering algorithm to find a combinatorial histone code, which is a successful example of co-clustering.

In general, most of the above existing algorithms did not take advantage of ensemble learning and semi-supervised learning. There are two motivations in this paper. First, ensemble learning and semi-supervised learning are integrated to improve the accuracy of co-clustering, which is inspired by the advantage of ensemble learning. Second, the model selection of co-clustering is partially solved by ensemble learning, which is practically used in recommendation systems.

The rest of the paper is organized as follows. In Section 2, the objective function of semi-supervised co-clustering ensemble is proposed in detail. In Section 3, a kernel probabilistic model for semi-supervised co-clustering ensemble (KPMSCE) is designed, and the corresponding algorithm is illustrated in detail. Experimental results are presented in Section 4, and the paper ends with the conclusions in Section 5.

2 Semi-supervised Co-clustering Ensemble

In this section, the pairwise constraints (side information) of co-clustering, which are the extensions of clustering pairwise constraints, are introduced. In general, the pairwise constraints are a popular way for semi-supervised learning. Then, the semi-supervised co-clustering ensemble is illustrated in detail.

2.1 Pairwise Constraints of Co-clustering

The popular way of semi-supervised clustering algorithms is to use the background information of pairwise constraints, such as must-link (ML) and cannot-link (CL) constraints. An ML constraint means that two data points are in the same cluster, while a CL constraint denotes that two data points are in different clusters. However, pairwise constraints will be extended in the problem of co-clustering. Co-cluster ML constraints specify that two entities, or two features, or one entity and one feature must be related, which can be used in co-clustering.

Suppose g_i and g_j are two connected components. Let x_i and x_j be the entities in g_i and g_j, respectively. Let M denote the set of ML constraints. We have

(xi, xj)∈M, xi∈gi, xj∈gj.

CL constraints denote that two entities, or two features, or one entity and one feature cannot be placed in the same cluster, and CL constraints can also be entailed. Suppose g_i and g_j are two connected components (completely connected subgraphs by ML constraints). x_i and x_j denote the entities in g_i and g_j, respectively. Denote C as the set of CL constraints. Then

(xi, xj)∈C, xi∈gi, xj∈gj.

Given a data matrix X_m with m rows and n columns. o_i and o_j denote the i^th and j^th objects (rows) of X_mn, while f_i and f_j denote the ß^ik and æ^th features (columns) of X_mn. Let k_i, k_j, k_i, and k_j be four connected components.

Then, the corresponding pairwise ML constraint sets (including object ML constraint set M_o and feature ML constraint set M_f) are

Mo={(oi, oj)|oi∈ki;oj∈kj},Mf={(fi, fj)|fi∈ki;fj∈kj}.

Moreover, the pairwise CL constraint sets (including object CL constraint set C_o and feature CL constraint set C_f) are

Co={(oi, oj)|oi∈ki;oj∈kj;ki≠kj},Cf={(fi, fj)|fi∈ki;fj∈kj;ki≠kj}.

2.2 Semi-supervised Co-clustering Ensemble Problem Formulation

In this subsection, the semi-supervised co-clustering ensemble objective function is defined. In detail, suppose there is an original data matrix X_mn with m rows (i.e. objects) and n columns (i.e. features).

These m objects can be simultaneously grouped into κ row clusters and n columns into ℓ column clusters, so there are κ×ℓ co-clusters in total. Moreover, co-clustering can be considered as a set of κ sets of objects {α_r|r=1, …, κ} and a set of ℓ sets of features {β_c|c=1, …, ℓ}, respectively. In general, the procedure can deliver row labels of objects and column labels of features. If there are several base co-clustering algorithms to process the same dataset, sets of row labels and column labels are obtained. Co-clustering ensemble uses a consensus function Γ to combine the set of q row labels μ^{(1, …, ^q)} into a single row label μ, and it simultaneously combines the set of q column labels ν^{(1, …, ^q)} into a single column label ν.

Commonly, in a dataset, there are (μ^{(^q)}, ν^{(^q)}) groupings including κ^{(^q)} row clusters and ℓ^{(^q)} column clusters. Γ is defined as a consensus function ℕ^{{^m×^t,ⁿ×^t}}→ℕ^{{^m,ⁿ}} projecting a set of co-clusterings to an integrated co-clustering:

(1) Γ:{(μ(q), ν(q))|q∈{1, …, t}}→{(μ, ν), (Cd−Md)}.

Let the set of groupings {(μ^{(^q)}, ν^{(^q)})|q∈{1, …, t}} be denoted by Φ. The co-clustering ensemble is used to seek a consensus co-clustering that shares the most information with the original co-clusterings.

Moreover, the side information in the dataset is the two sets of CL C and ML M, and it is called semi-supervised co-cluster ensemble that the side information is used in the combining step of co-clustering. In order to measure the quality of the statistical information that is shared between two co-clusterings, the objective function of the semi-supervised co-clustering ensemble can be defined as follows:

(2) (μ, ν)(κ,ℓ−opt)=arg max Σq=1tϕ(NMI){(μ, ν)^, (μ(q), ν(q)), (Cd−Md)},

where (μ, ν)^{(^κ,ℓ−opt)} is the consensus result of co-clustering and it is one of the results that maximize the average mutual information among all individual co-clustering labels (μ^{(^q)}, ν^{(^q)}) in Φ. We define a measure between a set of t co-clustering labels, Φ, and a single co-clustering label, (μ, ν)^, as the average normalized mutual information (ANMI) based on this pairwise measure of mutual information, and the definition of ANMI for co-clustering is as follows:

(3) ϕ(ANMI)(Φ, (μ, ν)^)=1t∑q=1tϕ(NMI)((μ, ν)^, (μ(q), ν(q))).

Mutual information is a sound indication of the shared information between a pair of co-clusterings. The normalized mutual information (NMI) was defined as follows:

(4) NMI(X, Y)=I(X, Y)H(X)H(Y),

where X and Y denote the variables described by the cluster labeling, and I(X, Y) denotes the mutual information between X and Y. H(X) denotes the entropy of X and H(X)=I(X, Y).

In co-clustering, suppose there are two co-clustering labeling variables (X_r, X_c) and (Y_r, Y_c), i.e. (X_r, Y_r), (X_c, Y_c) denote the row cluster labeling variables and column cluster labeling variables, respectively. When we want to obtain the mutual information between the two co-clustering variables, we must measure the mutual information of row cluster labels (X_r, Y_r) and column cluster labels (X_r, X_c), respectively. We define the NMI of co-cluster labeling as follows:

(5) NMI((Xr, Xc), (Yr, Yc))=NMI(Xr, Yr)+NMI(Xc, Yc)=I(Xr, Yr)H(Xr)H(Yr)+I(Xc, Yc)H(Xc)H(Yc).

One can easily find that NMI(X_r, X_r)=NMI(Y_c, Y_c)=1. Equation (3) needs to be estimated by using the sampled quantities provided by the co-clusterings. Then, from Eq. (5), the estimation of the NMI ϕ^(NMI) is

(6) ϕ(NMI)((μi, νi), (μj, νj))=ϕ(NMI)(μi, μj)+ϕ(NMI)(νi, νj) =∑α=1κ(i)∑β=1κ(j)Oα,β log(|O|⋅Oα,βOαiOβj)(∑α=1κ(i)OαilogOαi|O|)(∑β=1κ(j)OβjlogOβj|O|)+∑α=1ℓ(i)∑β=1ℓ(j)Fα,βlog(|F|⋅Fα,βFαiFβj)(∑α=1ℓ(i)FαilogFαi|F|)(∑β=1ℓ(j)FβjlogFβj|F|),

where |O| and |F| denote the number of objects and features in a co-cluster, respectively; (Oαi, Fαi) denote the number of objects and features in co-cluster Co_α according to (μⁱ, νⁱ); and (Oβj, Fβj) denote the number of objects and features in co-cluster Co_β according to (μ^j, ν^j). O_α_{,_β} and F_α_{,_β} denote the number of objects and features, respectively, in co-cluster Co_α according to (μⁱ, νⁱ) as well as in group Co_β according to (μ^j, ν^j).

3 Semi-supervised Co-clustering Ensemble Based on Kernel Probabilistic Model

In this section, a generative model for semi-supervised co-clustering ensemble based on the kernel probabilistic theory is proposed, and the gradient descent method is used to infer the model. At last, the corresponding algorithm is illustrated step by step.

3.1 Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble

The model in KPMSCE, a zero mean Gaussian process of U_{:;_d} and V_{:;_d} , is regarded as the prior distribution for the feature of a data set. For a universal situation, a generalization of the multivariate Gaussian distribution can be used for this process in the model. In general, a mean value and a covariance matrix can determine a multivariate Gaussian, and in the same situation, a mean function m(x) and a covariance function k(x;x^T) can also determine the Gaussian process.

For the semi-supervised co-cluster ensemble problem, x is an index of matrix rows or columns in different ways. If m(x) equals 0 and the corresponding kernel function is k(x;x^T), the function can represent the covariance and the corresponding pair of objects or features. K_U∈R^N^{×^N} is set to be a full covariance matrix for objects, and it can be a prior that can force the factorization to capture the covariance among rows. Meanwhile, K_V∈R^M^{×^M} is set to be a full covariance matrix for features, and it can be a prior that can force the factorization to capture the covariance among columns.

If K_U and K_V are the priors and they are assumed to be known, the generative process steps for KPMSCE are as follows:

Sample U_{:,_d} according to GP(0, K_U), [d]1D.
Sample V_{:,_d} according to GP(0, K_V), [d]1D.
For each object R_n,m, sample R_n,m according to N(Un,:VmT,:,σ2), where σ is a constant.

If U and V are known, the likelihood over the visible objects in the target field R is

(7) p(R|U, V, σ2)=∏n=1N∏m=1M[N(Rn,m|Un, VmT, σ2)]δn,m,

and U and V are given by

(8) p(U|KU)=∏d=1DGP(U:,d|0, KU),

(9) p(V|KV)=∏d=1DGP(V:,d|0, KV).

For simplicity, we denote KU−1 by S_U and KV−1 by S_V. The log-posterior over U and V can be calculated by

(10) logp(U, V|R, σ2, KU, KV)=−12σ2∑n=1N∑m=1Mδn,m(Rn,m−Un, VmT)2 −12∑d=1DU:,dTSUU:,d−12∑d=1DV:,dTSVV:,d−Alogσ2−D2(log|KU|+log|KV|)+C,

where A is the total number of objects and |K| is the determinant of K. The proposed model is a generative model that is used to simulate how to sample the results of base co-clustering results. Then, if we extract the latent labels in the graphical model, at last the semi-supervised co-cluster ensemble results are obtained.

3.2 Inference of KPMSCE Based on Gradient Descent

There are several latent variables to be inferred in this model. Expectation maximization and maximum a posteriori can be used to estimate these latent variables. In this paper, a maximum a posteriori is applied to estimate the latent matrices U and V, and these matrices can maximize the posteriors of the model. In other words, the following objective function can be minimized:

(11) E=12σ2∑n=1N∑m=1Mδn,m(Rn,m−Un, VmT)2+12∑d=1DU:,dTSUU:,d+12∑d=1DV:,dTSVV:,d.

In general, gradient descent can be used for minimizing the function E. The gradient of objects (rows) is as follows:

(12) ∂E∂Un,d=−1σ2∑m=1Mδn,m(Rn,m−Un, VmT)Vd,m+e(n)TSUU:,d,

where e(n) is an token vector with the corresponding bit being 1 and others being 0. Then, the gradient of features (columns) is defined as

(13) ∂E∂Vm,d=−1σ2∑m=1Mδn,m(Rn,m−Un, VmT)Ud,m+e(m)TSVV:,d,

where e(m) is also an token vector with the corresponding bit being 1 and others being 0. Given the initial guess of the priors, U is updated by

(14) Un,dt+1=Un,dt−η∂E∂Un,d,

where η is the learning rate for flexibility. It can be settled from 0 to 1. V is updated by

(15) Vn,dt+1=Vn,dt−η∂E∂Vn,d.

According to the updating functions, U and V are updated alternatively until convergence. When K_U and K_V remain stable throughout all iterations, S_U and S_V are computed only once. In the extreme case, the whole objects or feature are missed but the appropriate side information is known. The update rules with missing values are the following equations:

(16) Un,d(t+1)=Un,d(t)−ηe(n)TSUU:,d=Un,d(t)−η∑n′=1NSU(n, n′)Un′,d,

and

(17) Vm,d(t+1)=Vm,d(t)−ηe(m)TSVV:,d=Vm,d(t)−η∑m′=1NSV(m, m′)Vm′,d.

In this case, U_n_,: is updated according to the weighted average of the current U over all rows, whether the rows are missing or not. The weights S_U(n, n′) show the correlation between the current n and all the rest. V_m_,: is updated according to the weighted average of the current V over all rows, whether the columns are missing or not. The weights S_V(m, m′) show the correlation between the current m and all the rest.

3.3 Algorithm

In this subsection, the KPMSCE algorithm is described. The diversity of base co-clustering results is an important reason to improve the result of semi-supervised co-cluster ensemble, so KPMSCE is used to semi-supervise some diversity base co-clustering results to integrate and obtain the final semi-supervised co-clustering ensemble result. According to the above model and inference, a kernel probabilistic algorithm for semi-supervised co-clustering ensemble algorithm is designed. The algorithm procedure is described step by step below.

Algorithm: KPMSCE Algorithm.
Input: Pairwise constraint set P(i, j), original data matrix X_mn, number of row clusters κ, and number of column clusters ℓ (i.e. κ×ℓ co-clusters in total).
Output: The final consensus co-clustering result.

Divide X_mn into κ row clusters and ℓ column clusters by the co-clustering algorithms, and the base co-clustering labels are obtained.
Compute the NMI among the base co-clusters and obtain a new data matrix.
Calculate the likelihood for each column according to the equation

P(Rn<m>,m|U, V)=N(Rn<m>,m|(Un<m>,:VmT,:), σ2I).
Marginalize the probability over V, obtaining P(R|U)=∏m=1Mp(Rn<m>,m|U).
Compute the objective function, E=∑m=1M(Rn<m>,mTC−1Rn<m>,m+log(C))+∑d=1DU:,dTSUU:,d, and we can see that V is deleted in the objective function, so the gradient descent can be obtained on U, which is updated at each iteration according to the inversion of C.
The maximum likelihood estimation is computed by R^n,m=U^n,:V^m,:T.
Obtain the column and row cluster ensemble according to maximum likelihood.
Integrate the final row and column clusters.

4 Empirical Study

In this section, 10 datasets are used in the experiments. In particular, eight datasets are from the UCI machine learning repository, and a dataset is from the KDD Cup. The last dataset, called yeast cell data, has been analyzed by using many clustering and co-clustering algorithms. For all reported results, there are two steps to obtain the final co-clustering ensemble results in the experiments. First, a set of base co-clustering labels is obtained by running the base co-clustering algorithms. Second, KPMSCE is applied to the base co-clustering labels to generate the final consensus co-clustering.

The standard deviation of the co-clusters is applied as the criterion. For a machine learning method, a final co-clustering ensemble result has two equally important measures of accuracy. The result is not only good for clusters of objects but also for clusters of features. The quality of the co-cluster is the final comprehensive assessment measure that takes into account both co-clustering aspects. RSD is defined as the standard deviation of all rows in a co-cluster, CSD as the standard deviation of all columns in a co-cluster, and CoSD as the standard deviation of all entries in the co-cluster. The smaller the RSD, CSD, and CoSD, the better the quality of row clustering, column clustering, and co-clustering, respectively.

Because all datasets have their labels, micro-precision (MP) is used to measure the accuracy of the cluster with respect to the true labels. MP is defined as MP=∑i=1kai/N, where k is the number of clusters, N is the number of objects, and a_i denotes the number of objects in the cluster i that are correctly assigned to the corresponding class [29]. Moreover, AMP means average MP.

4.1 Comparison of Experimental Results Among SCC, Information Theoretic Co-clustering, Bregman Co-clustering, and KPMSCE

To illustrate the performance of KPMSCE, the results obtained by KPMSCE are compared with the co-clustering results generated by SCC [9], information theoretic co-clustering (ITCC) [10], and Bregman co-clustering (BCC) [2]. The experimental procedure is described as follows. First, the base co-clusterings are obtained by running each co-clustering algorithm three times on each dataset, i.e. nine co-clusterings of each dataset are obtained. Then, the final consensus co-clustering is obtained by combining the base co-clusterings via KPMSCE. The experimental results are shown in Table 1.

Among all the algorithms on the 10 datasets, it was six times that KPMSCE achieved the best results, while the other three algorithms only achieved four best results. Table 1 shows that the performance of KPMSCE is better than the other three algorithms most of the times. All results showed that the performance of co-clustering can be enhanced by the ensemble method. Table 1 also shows that the row clustering performance of BCC is better compared to the other two co-clustering algorithms because the number of the best row clustering results obtained by BCC is bigger than that of the other co-clustering algorithms. Similarly, we can find that the column clustering performance of ITCC is the best among the three base co-clustering algorithms.

4.2 Comparison of Experimental Results Between KPMSCE and Relational Multi-manifold Co-clustering Ensemble

To demonstrate how the method works for the co-clustering problem and improves the co-clustering performance, it is compared with the co-clustering ensemble method named relational multi-manifold co-clustering ensemble (RMCCE) [19]. The base co-clustering labels are obtained by running each co-clustering algorithm five times on each dataset, and the results are shown in Table 2. In Table 2, it is clear that KPMSCE outperforms the RMCCE most of the times. In other words, the co-clustering ensemble method actually gives better results than RMCCE.

4.3 AMP Results

In this subsection, SCC, ITCC, BCC, KPMSCE, and RMCCE are used for this experiment on all datasets. There are two steps in the proposed experiment. First, SCC, ITCC, and BCC with random initializations are used many times for the base co-clustering, and the average AMPs are recorded on different numbers of base co-clusterings. Second, all base co-clustering results are drawn as input data of KPMSCE and RMCCE for the co-clustering ensemble, and the ensemble results are recorded. The results are reported in Figure 1. The x-axis shows the number of base co-clusterings, and the y-axis shows the AMP results on different numbers of base co-clusterings. We can see that KPMSCE obtains the best AMP result, and RMCCE obtains the second best. The results show that ensemble learning can improve the performance of co-clustering. Moreover, semi-supervised learning can positively leverage the base co-clustering and co-clustering ensemble.

5 Conclusions

In this paper, the semi-supervised co-cluster ensemble was illustrated in detail based on semi-supervised learning and ensemble learning. Semi-supervised co-cluster ensembles provide a framework for combining multiple base co-clusterings and the side information of a dataset to generate a stable and robust consensus co-clustering. Moreover, the objective function of the semi-supervised co-cluster ensemble was formulated in detail. Then, KPMSCE was presented, and the inference-oriented KPMSCE was illustrated in detail. Furthermore, the corresponding algorithm was designed. In addition, different algorithms and the proposed algorithm were used for experiments on a real dataset. The experimental results demonstrated that the proposed algorithm can significantly outperform the compared algorithms in terms of several indices.

Future work will focus on the diversity of the base co-clustering labels for the co-clustering ensemble.

Table 1:

RSD, CSD, and CoSD of Each Data Obtained by Base Co-clustering Algorithms and KPMSCE.

Dataset	RSD				CSD				CoSD
Dataset	SCC	ITCC	BCC	KPMSCE	SCC	ITCC	BCC	KPMSCE	SCC	ITCC	BCC	KPMSCE
sonar	127.80	125.26	113.87	124.73	20.92	21.49	22.36	20.47	1.32	1.30	1.23	1.28
spectheart	246.40	255.15	254.8295	242.53	20.16	20.10	20.31	20.17	2.12	2.15	2.13	2.11
breast	2178.17	2214.55	2161.27	2126.92	47.06	47.04	47.96	46.62	11.87	11.32	11.35	11.12
semeion	7649.95	7814.65	7543.46	7442.21	1178.87	1175.61	1205.44	1171.19	54.17	56.73	56.30	54.66
hepatitis	12,746.32	11,137.8	3459.69	8338.81	359.14	339.87	362.61	340.78	182.77	166.92	190.58	124.39
cred	93,168.57	5643.97	5652.27	50,286.18	414.66	414.05	417.38	423.19	312.04	351.50	356.21	186.26
yeast	231,596.70	224,648.70	223,290.60	222,387.00	26,132.06	24,633.90	25,933.71	24,354.13	8389.86	8053.27	8106.78	7444.40
Kdd99sub	625,626.50	544,649.20	208,676.10	417,943.00	16,336.52	19,701.23	8433.84	15,932.78	7426.51	4312.35	2811.02	3951.52
hvwnt	927,011.00	966,412.60	752,109.30	927,012.40	3,954,112.0	3,964,344.0	3,170,334.0	3,954,368.0	79,431.88	79,561.0	63,554.5	79,424.0
secom	2,446,928.0	4,492,293.0	3,799,032.0	2,104,492.0	75,289.87	74,446.81	75,154.10	74,193.08	3227.71	5968.66	6881.71	2875.36

Bold values denote the best results corresponding to criteria.

Table 2:

RSD, CSD, and CoSD obtained by KPMSCE and RMCCE.

Dataset	sonar	spectheart	breast	semeion	hepatitis	cred	yeast	Kdd99sub	hvwnt	secom
RSD
RMCCE	114.85	305.26	2221.23	8676.22	12,782.15	93,588.21	236,639.07	208,682.96	957,721.38	5,349,347.15
KPMSCE	124.74	237.84	2240.81	7604.99	8338.88	50,286.18	231,664.26	417,943.33	927,012.43	1,926,235.25
CSD
RMCCE	21.93	20.35	47.56	1208.49	401.30	423.75	27,394.69	20,470.31	3,602,929.88	73,621.96
KPMSCE	20.41	20.18	46.66	1171.46	340.78	423.19	24,357.67	17,444.44	3,953,442.72	74,467.02
CoSD
RMCCE	1.18	2.41	11.38	61.11	188.90	322.33	8601.97	4739.50	72,101.06	7055.71
KPMSCE	1.28	2.10	11.23	54.72	124.39	186.26	7912.92	4493.20	79,424.02	2873.34

Bold values denote the best results corresponding to criteria.

Figure 1:

AMP Results of Algorithms on 10 Datasets.

Bibliography

[1] G. Aggarwal and N. Gupta, BEMI bicluster ensemble using mutual information, in: 2013 12th International Conference on Machine Learning and Applications (ICMLA), 1, pp. 321–324, IEEE, 2013.10.1109/ICMLA.2013.65Search in Google Scholar

[2] B. Arindam, A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 509–514, 2004.10.1145/1014052.1014111Search in Google Scholar

[3] M. Asteris, A. Kyrillidis, D. Papailiopoulos and A. G. Dimakis, Bipartite correlation clustering – maximizing agreements, in: Proceedings of the 19th International Conference on Artificial Intelligence and statistics, pp. 121–129, 2016.Search in Google Scholar

[4] A. Beutel, A. Ahmed and A. J. Smola, ACCAMS: additive co-clustering to approximate matrices succinctly, in: International Conference on World Wide Web, pp. 119–129, 2015.10.1145/2736277.2741091Search in Google Scholar

[5] J. Cheng, Z. -S. Tong and L. Zhang, Scaling behavior of nucleotide cluster in DNA sequences, J. Zhejiang Univ. Sci. B 8 (2007), 359–364.10.1631/jzus.2007.B0359Search in Google Scholar PubMed PubMed Central

[6] J. Cheng and L. -x. Zhang, Statistical properties of nucleotide clusters in DNA sequences, J. Zhejiang Univ. Sci. B 6 (2005), 408–412.10.1631/jzus.2005.B0408Search in Google Scholar PubMed PubMed Central

[7] X. Cheng, S. Su, L. Gao and J. Yin, Co-ClusterD: a distributed framework for data co-clustering with sequential updates, IEEE Trans. Knowl. Data Eng. 27 (2015), 3231–3244.10.1109/TKDE.2015.2451634Search in Google Scholar

[8] Y. Cheng and G. M. Church, Biclustering of expression data, in: International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103, 2000.Search in Google Scholar

[9] I. S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274, 2001.10.1145/502512.502550Search in Google Scholar

[10] I. S. Dhillon, S. Mallela and D. S. Modha, Information-theoretic co-clustering, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98, ACM, 2003.10.1145/956750.956764Search in Google Scholar

[11] P. Georg, Ensemble Methods for Plaid Bicluster Algorithm, 2010.Search in Google Scholar

[12] F. Gullo, A. K. M. K. A. Talukder, S. Luke, C. Domeniconi and A. Tagarelli, Multiobjective optimization of co-clustering ensembles, in: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 1495–1496, 2012.10.1145/2330784.2331010Search in Google Scholar

[13] B. Hanczar and M. Nadif, Bagged biclustering for microarray data, in: ECAI, pp. 1131–1132, 2010.Search in Google Scholar

[14] B. Hanczar and M. Nadif, Using the bagging approach for biclustering of gene expression data, Neurocomputing 74 (2011), 1595–1605.10.1016/j.neucom.2011.01.013Search in Google Scholar

[15] B. Hanczar and M. Nadif, Ensemble methods for biclustering tasks, Pattern Recognit. 45 (2012), 3938–3949.10.1016/j.patcog.2012.04.010Search in Google Scholar

[16] J. A. Hartigan, Direct clustering of a data matrix, J. Am. Stat. Assoc. 67 (1972), 123–129.10.1080/01621459.1972.10481214Search in Google Scholar

[17] D. Huang, C. D. Wang and J. H. Lai, Locally weighted ensemble clustering, IEEE Trans. Cybern. PP (2017), 1–14.10.1109/TCYB.2017.2702343Search in Google Scholar PubMed

[18] Q. Huang, X. Chen, J. Huang, S. Feng and J. Fan, Scalable ensemble information-theoretic co-clustering for massive data, in: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, 2012.Search in Google Scholar

[19] P. Li, J. Bu, C. Chen and Z. He, Relational co-clustering via manifold ensemble learning, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1687–1691, ACM, 2012.10.1145/2396761.2398498Search in Google Scholar

[20] H. Liu, J. Wu, T. Liu, D. Tao and Y. Fu, Spectral ensemble clustering via weighted K-means: theoretical and practical evidence, IEEE Trans. Knowl. Data Eng. 29 (2017), 1129–1143.10.1109/TKDE.2017.2650229Search in Google Scholar

[21] L. Menezes and A. L. V. Coelho, On ensembles of biclusters generated by NichePSO, in: 2011 IEEE Congress on Evolutionary Computation (CEC), pp. 601–607, IEEE, 2011.10.1109/CEC.2011.5949674Search in Google Scholar

[22] G. Pio, D. Malerba, D. D’Elia and M. Ceci, Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach, BMC Bioinformatics 15 (2014), S4.10.1186/1471-2105-15-S1-S4Search in Google Scholar PubMed PubMed Central

[23] P. Rathore, J. C. Bezdek, S. M. Erfani, S. Rajasegarar and M. Palaniswami, Ensemble fuzzy clustering using cumulative aggregation on random projections, IEEE Trans. Fuzzy Syst. PP (2017), 1–1.10.1109/TFUZZ.2017.2729501Search in Google Scholar

[24] Z. Tao, H. Liu and Y. Fu, Simultaneous clustering and ensemble, in: AAAI, 2017.10.1609/aaai.v31i1.10720Search in Google Scholar

[25] Z. Tao, H. Liu, S. Li and Y. Fu, Robust spectral ensemble clustering, pp. 367–376, 2016.10.1145/2983323.2983745Search in Google Scholar

[26] L. Teng and K. Tan, Finding combinatorial histone code by semi-supervised biclustering, BMC Genomics 13 (2012), 301.10.1186/1471-2164-13-301Search in Google Scholar PubMed PubMed Central

[27] P. Wang, K. B. Laskey, C. Domeniconi and M. I. Jordan, Nonparametric Bayesian co-clustering ensembles, in: SDM, pp. 331–342, SIAM, 2011.10.1137/1.9781611972818.29Search in Google Scholar

[28] T. Wu, A. R. Benson and D. F. Gleich, General tensor spectral co-clustering for higher-order data, 2016.Search in Google Scholar

[29] Z. Zhou and W. Tang, Clusterer ensemble, Knowl. Based Syst. 19 (2006), 77–83.10.1016/j.knosys.2005.11.003Search in Google Scholar

Received: 2017-10-12

Published Online: 2017-12-30

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

https://doi.org/10.1515/jisys-2017-0513

Keywords for this article

Co-clustering; co-cluster ensemble; semi-supervised learning; kernel probabilistic model; recommend system

Creative Commons

BY 4.0