Home A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
Article Open Access

A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble

  • Yinghui Zhang EMAIL logo
Published/Copyright: December 30, 2017
Become an author with De Gruyter Brill

Abstract

Co-clustering is used to analyze the row and column clusters of a dataset, and it is widely used in recommendation systems. In general, different co-clustering models often obtain very different results for a dataset because each algorithm has its own optimization criteria. It is an alternative way to combine different co-clustering results to produce a final one for improving the quality of co-clustering. In this paper, a semi-supervised co-clustering ensemble is illustrated in detail based on semi-supervised learning and ensemble learning. A semi-supervised co-clustering ensemble is a framework for combining multiple base co-clusterings and the side information of a dataset to obtain a stable and robust consensus co-clustering. First, the objective function of the semi-supervised co-clustering ensemble is formulated according to normalized mutual information. Then, a kernel probabilistic model for semi-supervised co-clustering ensemble (KPMSCE) is presented and the inference of KPMSCE is illustrated in detail. Furthermore, the corresponding algorithm is designed. Moreover, different algorithms and the proposed algorithm are used for experiments on real datasets. The experimental results demonstrate that the proposed algorithm can significantly outperform the compared algorithms in terms of several indices.

MSC 2010: 97R40

1 Introduction

Co-clustering [4], [5] has recently received much attention in recommendation system applications. Co-clustering and the motivations were first illustrated in a paper [16]. The term co-clustering was later used by Mirkin. A co-clustering algorithm based on variance [5] was proposed for biological gene expression data analysis, in one of the most important literature reports in gene expression analysis. There are also some research works [6], [7] that applied cluster algorithms for bio-information processes. Two algorithms [9] were presented to apply co-clustering to documents and words. The two algorithms were designed based on bipartite spectral graph partitioning and information theory. A co-clustering algorithm-based weighted Bregman distance instead of KL distance [2] was proposed, and the algorithm is suitable for any kind of matrix. A new preference-based multi-objective optimization algorithm [12] was proposed to compete with the gradient ascent approach. The proposed approach use multiple heuristics to process the co-clustering problem, and it also makes a preference selection through the gradient ascent algorithm and the heuristic. A scalable algorithm [18] was designed to co-cluster massive, sparse, and high-dimensional data, and to combine individual clustering results to produce a better final result. The proposed algorithm is particularly suitable for distributed computing environment, which have been revealed in the experiments, and it is implemented on Hadoop platform with MapReduce programming framework in practice. For higher-order data, the authors developed [28] a new tensor spectral co-clustering (SCC) method that applies to any non-negative tensor of data. The authors presented a new distributed framework [8] to support efficient implementations of the algorithms with sequential updates, and the framework was evaluated on both a local cluster of machines and the Amazon EC2 cloud.

Co-clustering results are improved by an ensemble learning technique. An ensemble approach [13] is used to improve the performance of these co-clustering methods. The bagged co-clustering method generates a collection of co-clusters by using the bootstrap samples of the original data and aggregates them into new co-clusters. The principle consists in generating a set of co-clusters and aggregating the results. A novel ensemble technique for co-clustering solutions using mutual information [1] is presented. Asteris et al. [3] firstly presented the algorithm with provable approximation guarantees for Max-Agree, which relied on formulating the algorithm as a constrained bilinear maximization over the sets of cluster assignment matrices. Ensemble [17], [25] is a very popular way to improve the accuracy, robustness, and flexibility of learning. A novel robust spectral ensemble clustering [20], [24] approach is proposed for the cluster ensemble, which learns a robust representation for the co-association matrix through a low-rank constraint. Random projection [23] is used in ensemble fuzzy clustering. A co-clustering ensemble method [11] is used to overcome some limitations through repeatedly applying the plaid model with different parameters. Hanczar and Nadif [14] proposed a new method that can improve the accuracies of co-clustering with the ensemble methods, and they [15] also used a bagging approach for gene expression data. An ensemble co-clustering can be formalized by using a binary tri-clustering problem. The author designed a simple and efficient algorithm for the co-clustering problem described above. In order to generate more diverse and high-quality co-clusters to be fused through an ensemble perspective, the author have adopted a well-known multi-modal particle swarm optimization algorithm [21]. An ensemble method for the co-clustering problem [1] that uses optimization techniques to generate consensus is presented. Manifold ensemble learning [19] is used to improve the co-clustering performance, which aims to maximally approximate the intrinsic manifolds of both the feature and sample spaces. An approach [14] is proposed to improve the performance of co-clustering. It is shown that ensemble co-clustering can be considered a problem of binary tri-clustering and the problem can be solved by the proposed algorithm. Except for the co-clustering ensemble algorithms described above, there are also several semi-supervised co-clustering ensemble algorithms. Wang et al. [27] proposed a non-parametric Bayesian approach to co-clustering ensembles. Similar to clustering ensembles, co-clustering ensembles combine several base co-clustering results to obtain a final co-clustering that is a more robust consensus co-clustering. Pio et al. [22] used the co-clustering method to discover the miRNA regulatory networks. Teng and Tan [26] proposed a semi-supervised co-clustering algorithm to find a combinatorial histone code, which is a successful example of co-clustering.

In general, most of the above existing algorithms did not take advantage of ensemble learning and semi-supervised learning. There are two motivations in this paper. First, ensemble learning and semi-supervised learning are integrated to improve the accuracy of co-clustering, which is inspired by the advantage of ensemble learning. Second, the model selection of co-clustering is partially solved by ensemble learning, which is practically used in recommendation systems.

The rest of the paper is organized as follows. In Section 2, the objective function of semi-supervised co-clustering ensemble is proposed in detail. In Section 3, a kernel probabilistic model for semi-supervised co-clustering ensemble (KPMSCE) is designed, and the corresponding algorithm is illustrated in detail. Experimental results are presented in Section 4, and the paper ends with the conclusions in Section 5.

2 Semi-supervised Co-clustering Ensemble

In this section, the pairwise constraints (side information) of co-clustering, which are the extensions of clustering pairwise constraints, are introduced. In general, the pairwise constraints are a popular way for semi-supervised learning. Then, the semi-supervised co-clustering ensemble is illustrated in detail.

2.1 Pairwise Constraints of Co-clustering

The popular way of semi-supervised clustering algorithms is to use the background information of pairwise constraints, such as must-link (ML) and cannot-link (CL) constraints. An ML constraint means that two data points are in the same cluster, while a CL constraint denotes that two data points are in different clusters. However, pairwise constraints will be extended in the problem of co-clustering. Co-cluster ML constraints specify that two entities, or two features, or one entity and one feature must be related, which can be used in co-clustering.

Suppose gi and gj are two connected components. Let xi and xj be the entities in gi and gj, respectively. Let M denote the set of ML constraints. We have

(xi,xj)M,xigi,xjgj.

CL constraints denote that two entities, or two features, or one entity and one feature cannot be placed in the same cluster, and CL constraints can also be entailed. Suppose gi and gj are two connected components (completely connected subgraphs by ML constraints). xi and xj denote the entities in gi and gj, respectively. Denote C as the set of CL constraints. Then

(xi,xj)C,xigi,xjgj.

Given a data matrix Xm with m rows and n columns. oi and oj denote the ith and jth objects (rows) of Xmn, while fi and fj denote the ßik and æth features (columns) of Xmn. Let ki, kj, ki, and kj be four connected components.

Then, the corresponding pairwise ML constraint sets (including object ML constraint set Mo and feature ML constraint set Mf) are

Mo={(oi,oj)|oiki;ojkj},Mf={(fi,fj)|fiki;fjkj}.

Moreover, the pairwise CL constraint sets (including object CL constraint set Co and feature CL constraint set Cf) are

Co={(oi,oj)|oiki;ojkj;kikj},Cf={(fi,fj)|fiki;fjkj;kikj}.

2.2 Semi-supervised Co-clustering Ensemble Problem Formulation

In this subsection, the semi-supervised co-clustering ensemble objective function is defined. In detail, suppose there is an original data matrix Xmn with m rows (i.e. objects) and n columns (i.e. features).

These m objects can be simultaneously grouped into κ row clusters and n columns into ℓ column clusters, so there are κ×ℓ co-clusters in total. Moreover, co-clustering can be considered as a set of κ sets of objects {αr|r=1, …, κ} and a set of ℓ sets of features {βc|c=1, …, ℓ}, respectively. In general, the procedure can deliver row labels of objects and column labels of features. If there are several base co-clustering algorithms to process the same dataset, sets of row labels and column labels are obtained. Co-clustering ensemble uses a consensus function Γ to combine the set of q row labels μ(1, …, q) into a single row label μ, and it simultaneously combines the set of q column labels ν(1, …, q) into a single column label ν.

Commonly, in a dataset, there are (μ(q), ν(q)) groupings including κ(q) row clusters and ℓ(q) column clusters. Γ is defined as a consensus function ℕ{m×t,n×t}→ℕ{m,n} projecting a set of co-clusterings to an integrated co-clustering:

(1) Γ:{(μ(q),ν(q))|q{1,,t}}{(μ,ν),(CdMd)}.

Let the set of groupings {(μ(q), ν(q))|q∈{1, …, t}} be denoted by Φ. The co-clustering ensemble is used to seek a consensus co-clustering that shares the most information with the original co-clusterings.

Moreover, the side information in the dataset is the two sets of CL C and ML M, and it is called semi-supervised co-cluster ensemble that the side information is used in the combining step of co-clustering. In order to measure the quality of the statistical information that is shared between two co-clusterings, the objective function of the semi-supervised co-clustering ensemble can be defined as follows:

(2) (μ,ν)(κ,opt)=arg max Σq=1tϕ(NMI){(μ,ν)^,(μ(q),ν(q)),(CdMd)},

where (μ, ν)(κ,ℓ−opt) is the consensus result of co-clustering and it is one of the results that maximize the average mutual information among all individual co-clustering labels (μ(q), ν(q)) in Φ. We define a measure between a set of t co-clustering labels, Φ, and a single co-clustering label, (μ,ν)^, as the average normalized mutual information (ANMI) based on this pairwise measure of mutual information, and the definition of ANMI for co-clustering is as follows:

(3) ϕ(ANMI)(Φ,(μ,ν)^)=1tq=1tϕ(NMI)((μ,ν)^,(μ(q),ν(q))).

Mutual information is a sound indication of the shared information between a pair of co-clusterings. The normalized mutual information (NMI) was defined as follows:

(4) NMI(X,Y)=I(X,Y)H(X)H(Y),

where X and Y denote the variables described by the cluster labeling, and I(X, Y) denotes the mutual information between X and Y. H(X) denotes the entropy of X and H(X)=I(X, Y).

In co-clustering, suppose there are two co-clustering labeling variables (Xr, Xc) and (Yr, Yc), i.e. (Xr, Yr), (Xc, Yc) denote the row cluster labeling variables and column cluster labeling variables, respectively. When we want to obtain the mutual information between the two co-clustering variables, we must measure the mutual information of row cluster labels (Xr, Yr) and column cluster labels (Xr, Xc), respectively. We define the NMI of co-cluster labeling as follows:

(5) NMI((Xr,Xc),(Yr,Yc))=NMI(Xr,Yr)+NMI(Xc,Yc)=I(Xr,Yr)H(Xr)H(Yr)+I(Xc,Yc)H(Xc)H(Yc).

One can easily find that NMI(Xr, Xr)=NMI(Yc, Yc)=1. Equation (3) needs to be estimated by using the sampled quantities provided by the co-clusterings. Then, from Eq. (5), the estimation of the NMI ϕ(NMI) is

(6) ϕ(NMI)((μi,νi),(μj,νj))=ϕ(NMI)(μi,μj)+ϕ(NMI)(νi,νj)=α=1κ(i)β=1κ(j)Oα,β log(|O|Oα,βOαiOβj)(α=1κ(i)OαilogOαi|O|)(β=1κ(j)OβjlogOβj|O|)+α=1(i)β=1(j)Fα,βlog(|F|Fα,βFαiFβj)(α=1(i)FαilogFαi|F|)(β=1(j)FβjlogFβj|F|),

where |O| and |F| denote the number of objects and features in a co-cluster, respectively; (Oαi,Fαi) denote the number of objects and features in co-cluster Coα according to (μi, νi); and (Oβj,Fβj) denote the number of objects and features in co-cluster Coβ according to (μj, νj). Oα,β and Fα,β denote the number of objects and features, respectively, in co-cluster Coα according to (μi, νi) as well as in group Coβ according to (μj, νj).

3 Semi-supervised Co-clustering Ensemble Based on Kernel Probabilistic Model

In this section, a generative model for semi-supervised co-clustering ensemble based on the kernel probabilistic theory is proposed, and the gradient descent method is used to infer the model. At last, the corresponding algorithm is illustrated step by step.

3.1 Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble

The model in KPMSCE, a zero mean Gaussian process of U:;d and V:;d , is regarded as the prior distribution for the feature of a data set. For a universal situation, a generalization of the multivariate Gaussian distribution can be used for this process in the model. In general, a mean value and a covariance matrix can determine a multivariate Gaussian, and in the same situation, a mean function m(x) and a covariance function k(x;xT) can also determine the Gaussian process.

For the semi-supervised co-cluster ensemble problem, x is an index of matrix rows or columns in different ways. If m(x) equals 0 and the corresponding kernel function is k(x;xT), the function can represent the covariance and the corresponding pair of objects or features. KU∈RN×N is set to be a full covariance matrix for objects, and it can be a prior that can force the factorization to capture the covariance among rows. Meanwhile, KVRM×M is set to be a full covariance matrix for features, and it can be a prior that can force the factorization to capture the covariance among columns.

If KU and KV are the priors and they are assumed to be known, the generative process steps for KPMSCE are as follows:

  1. Sample U:,d according to GP(0, KU), [d]1D.

  2. Sample V:,d according to GP(0, KV), [d]1D.

  3. For each object Rn,m, sample Rn,m according to N(Un,:VmT,:,σ2), where σ is a constant.

If U and V are known, the likelihood over the visible objects in the target field R is

(7) p(R|U,V,σ2)=n=1Nm=1M[N(Rn,m|Un,VmT,σ2)]δn,m,

and U and V are given by

(8) p(U|KU)=d=1DGP(U:,d|0,KU),
(9) p(V|KV)=d=1DGP(V:,d|0,KV).

For simplicity, we denote KU1 by SU and KV1 by SV. The log-posterior over U and V can be calculated by

(10) logp(U,V|R,σ2,KU,KV)=12σ2n=1Nm=1Mδn,m(Rn,mUn,VmT)212d=1DU:,dTSUU:,d12d=1DV:,dTSVV:,dAlogσ2D2(log|KU|+log|KV|)+C,

where A is the total number of objects and |K| is the determinant of K. The proposed model is a generative model that is used to simulate how to sample the results of base co-clustering results. Then, if we extract the latent labels in the graphical model, at last the semi-supervised co-cluster ensemble results are obtained.

3.2 Inference of KPMSCE Based on Gradient Descent

There are several latent variables to be inferred in this model. Expectation maximization and maximum a posteriori can be used to estimate these latent variables. In this paper, a maximum a posteriori is applied to estimate the latent matrices U and V, and these matrices can maximize the posteriors of the model. In other words, the following objective function can be minimized:

(11) E=12σ2n=1Nm=1Mδn,m(Rn,mUn,VmT)2+12d=1DU:,dTSUU:,d+12d=1DV:,dTSVV:,d.

In general, gradient descent can be used for minimizing the function E. The gradient of objects (rows) is as follows:

(12) EUn,d=1σ2m=1Mδn,m(Rn,mUn,VmT)Vd,m+e(n)TSUU:,d,

where e(n) is an token vector with the corresponding bit being 1 and others being 0. Then, the gradient of features (columns) is defined as

(13) EVm,d=1σ2m=1Mδn,m(Rn,mUn,VmT)Ud,m+e(m)TSVV:,d,

where e(m) is also an token vector with the corresponding bit being 1 and others being 0. Given the initial guess of the priors, U is updated by

(14) Un,dt+1=Un,dtηEUn,d,

where η is the learning rate for flexibility. It can be settled from 0 to 1. V is updated by

(15) Vn,dt+1=Vn,dtηEVn,d.

According to the updating functions, U and V are updated alternatively until convergence. When KU and KV remain stable throughout all iterations, SU and SV are computed only once. In the extreme case, the whole objects or feature are missed but the appropriate side information is known. The update rules with missing values are the following equations:

(16) Un,d(t+1)=Un,d(t)ηe(n)TSUU:,d=Un,d(t)ηn=1NSU(n,n)Un,d,

and

(17) Vm,d(t+1)=Vm,d(t)ηe(m)TSVV:,d=Vm,d(t)ηm=1NSV(m,m)Vm,d.

In this case, Un,: is updated according to the weighted average of the current U over all rows, whether the rows are missing or not. The weights SU(n, n′) show the correlation between the current n and all the rest. Vm,: is updated according to the weighted average of the current V over all rows, whether the columns are missing or not. The weights SV(m, m′) show the correlation between the current m and all the rest.

3.3 Algorithm

In this subsection, the KPMSCE algorithm is described. The diversity of base co-clustering results is an important reason to improve the result of semi-supervised co-cluster ensemble, so KPMSCE is used to semi-supervise some diversity base co-clustering results to integrate and obtain the final semi-supervised co-clustering ensemble result. According to the above model and inference, a kernel probabilistic algorithm for semi-supervised co-clustering ensemble algorithm is designed. The algorithm procedure is described step by step below.

  • Algorithm: KPMSCE Algorithm.

  • Input: Pairwise constraint set P(i, j), original data matrix Xmn, number of row clusters κ, and number of column clusters ℓ (i.e. κ×ℓ co-clusters in total).

  • Output: The final consensus co-clustering result.

  1. Divide Xmn into κ row clusters and ℓ column clusters by the co-clustering algorithms, and the base co-clustering labels are obtained.

  2. Compute the NMI among the base co-clusters and obtain a new data matrix.

  3. Calculate the likelihood for each column according to the equation

    P(Rn<m>,m|U,V)=N(Rn<m>,m|(Un<m>,:VmT,:),σ2I).

  4. Marginalize the probability over V, obtaining P(R|U)=m=1Mp(Rn<m>,m|U).

  5. Compute the objective function, E=m=1M(Rn<m>,mTC1Rn<m>,m+log(C))+d=1DU:,dTSUU:,d, and we can see that V is deleted in the objective function, so the gradient descent can be obtained on U, which is updated at each iteration according to the inversion of C.

  6. The maximum likelihood estimation is computed by R^n,m=U^n,:V^m,:T.

  7. Obtain the column and row cluster ensemble according to maximum likelihood.

  8. Integrate the final row and column clusters.

4 Empirical Study

In this section, 10 datasets are used in the experiments. In particular, eight datasets are from the UCI machine learning repository, and a dataset is from the KDD Cup. The last dataset, called yeast cell data, has been analyzed by using many clustering and co-clustering algorithms. For all reported results, there are two steps to obtain the final co-clustering ensemble results in the experiments. First, a set of base co-clustering labels is obtained by running the base co-clustering algorithms. Second, KPMSCE is applied to the base co-clustering labels to generate the final consensus co-clustering.

The standard deviation of the co-clusters is applied as the criterion. For a machine learning method, a final co-clustering ensemble result has two equally important measures of accuracy. The result is not only good for clusters of objects but also for clusters of features. The quality of the co-cluster is the final comprehensive assessment measure that takes into account both co-clustering aspects. RSD is defined as the standard deviation of all rows in a co-cluster, CSD as the standard deviation of all columns in a co-cluster, and CoSD as the standard deviation of all entries in the co-cluster. The smaller the RSD, CSD, and CoSD, the better the quality of row clustering, column clustering, and co-clustering, respectively.

Because all datasets have their labels, micro-precision (MP) is used to measure the accuracy of the cluster with respect to the true labels. MP is defined as MP=i=1kai/N, where k is the number of clusters, N is the number of objects, and ai denotes the number of objects in the cluster i that are correctly assigned to the corresponding class [29]. Moreover, AMP means average MP.

4.1 Comparison of Experimental Results Among SCC, Information Theoretic Co-clustering, Bregman Co-clustering, and KPMSCE

To illustrate the performance of KPMSCE, the results obtained by KPMSCE are compared with the co-clustering results generated by SCC [9], information theoretic co-clustering (ITCC) [10], and Bregman co-clustering (BCC) [2]. The experimental procedure is described as follows. First, the base co-clusterings are obtained by running each co-clustering algorithm three times on each dataset, i.e. nine co-clusterings of each dataset are obtained. Then, the final consensus co-clustering is obtained by combining the base co-clusterings via KPMSCE. The experimental results are shown in Table 1.

Among all the algorithms on the 10 datasets, it was six times that KPMSCE achieved the best results, while the other three algorithms only achieved four best results. Table 1 shows that the performance of KPMSCE is better than the other three algorithms most of the times. All results showed that the performance of co-clustering can be enhanced by the ensemble method. Table 1 also shows that the row clustering performance of BCC is better compared to the other two co-clustering algorithms because the number of the best row clustering results obtained by BCC is bigger than that of the other co-clustering algorithms. Similarly, we can find that the column clustering performance of ITCC is the best among the three base co-clustering algorithms.

4.2 Comparison of Experimental Results Between KPMSCE and Relational Multi-manifold Co-clustering Ensemble

To demonstrate how the method works for the co-clustering problem and improves the co-clustering performance, it is compared with the co-clustering ensemble method named relational multi-manifold co-clustering ensemble (RMCCE) [19]. The base co-clustering labels are obtained by running each co-clustering algorithm five times on each dataset, and the results are shown in Table 2. In Table 2, it is clear that KPMSCE outperforms the RMCCE most of the times. In other words, the co-clustering ensemble method actually gives better results than RMCCE.

4.3 AMP Results

In this subsection, SCC, ITCC, BCC, KPMSCE, and RMCCE are used for this experiment on all datasets. There are two steps in the proposed experiment. First, SCC, ITCC, and BCC with random initializations are used many times for the base co-clustering, and the average AMPs are recorded on different numbers of base co-clusterings. Second, all base co-clustering results are drawn as input data of KPMSCE and RMCCE for the co-clustering ensemble, and the ensemble results are recorded. The results are reported in Figure 1. The x-axis shows the number of base co-clusterings, and the y-axis shows the AMP results on different numbers of base co-clusterings. We can see that KPMSCE obtains the best AMP result, and RMCCE obtains the second best. The results show that ensemble learning can improve the performance of co-clustering. Moreover, semi-supervised learning can positively leverage the base co-clustering and co-clustering ensemble.

5 Conclusions

In this paper, the semi-supervised co-cluster ensemble was illustrated in detail based on semi-supervised learning and ensemble learning. Semi-supervised co-cluster ensembles provide a framework for combining multiple base co-clusterings and the side information of a dataset to generate a stable and robust consensus co-clustering. Moreover, the objective function of the semi-supervised co-cluster ensemble was formulated in detail. Then, KPMSCE was presented, and the inference-oriented KPMSCE was illustrated in detail. Furthermore, the corresponding algorithm was designed. In addition, different algorithms and the proposed algorithm were used for experiments on a real dataset. The experimental results demonstrated that the proposed algorithm can significantly outperform the compared algorithms in terms of several indices.

Future work will focus on the diversity of the base co-clustering labels for the co-clustering ensemble.

Table 1:

RSD, CSD, and CoSD of Each Data Obtained by Base Co-clustering Algorithms and KPMSCE.

Dataset RSD
CSD
CoSD
SCC ITCC BCC KPMSCE SCC ITCC BCC KPMSCE SCC ITCC BCC KPMSCE
sonar 127.80 125.26 113.87 124.73 20.92 21.49 22.36 20.47 1.32 1.30 1.23 1.28
spectheart 246.40 255.15 254.8295 242.53 20.16 20.10 20.31 20.17 2.12 2.15 2.13 2.11
breast 2178.17 2214.55 2161.27 2126.92 47.06 47.04 47.96 46.62 11.87 11.32 11.35 11.12
semeion 7649.95 7814.65 7543.46 7442.21 1178.87 1175.61 1205.44 1171.19 54.17 56.73 56.30 54.66
hepatitis 12,746.32 11,137.8 3459.69 8338.81 359.14 339.87 362.61 340.78 182.77 166.92 190.58 124.39
cred 93,168.57 5643.97 5652.27 50,286.18 414.66 414.05 417.38 423.19 312.04 351.50 356.21 186.26
yeast 231,596.70 224,648.70 223,290.60 222,387.00 26,132.06 24,633.90 25,933.71 24,354.13 8389.86 8053.27 8106.78 7444.40
Kdd99sub 625,626.50 544,649.20 208,676.10 417,943.00 16,336.52 19,701.23 8433.84 15,932.78 7426.51 4312.35 2811.02 3951.52
hvwnt 927,011.00 966,412.60 752,109.30 927,012.40 3,954,112.0 3,964,344.0 3,170,334.0 3,954,368.0 79,431.88 79,561.0 63,554.5 79,424.0
secom 2,446,928.0 4,492,293.0 3,799,032.0 2,104,492.0 75,289.87 74,446.81 75,154.10 74,193.08 3227.71 5968.66 6881.71 2875.36
  1. Bold values denote the best results corresponding to criteria.

Table 2:

RSD, CSD, and CoSD obtained by KPMSCE and RMCCE.

Dataset sonar spectheart breast semeion hepatitis cred yeast Kdd99sub hvwnt secom
RSD
 RMCCE 114.85 305.26 2221.23 8676.22 12,782.15 93,588.21 236,639.07 208,682.96 957,721.38 5,349,347.15
 KPMSCE 124.74 237.84 2240.81 7604.99 8338.88 50,286.18 231,664.26 417,943.33 927,012.43 1,926,235.25
CSD
 RMCCE 21.93 20.35 47.56 1208.49 401.30 423.75 27,394.69 20,470.31 3,602,929.88 73,621.96
 KPMSCE 20.41 20.18 46.66 1171.46 340.78 423.19 24,357.67 17,444.44 3,953,442.72 74,467.02
CoSD
 RMCCE 1.18 2.41 11.38 61.11 188.90 322.33 8601.97 4739.50 72,101.06 7055.71
 KPMSCE 1.28 2.10 11.23 54.72 124.39 186.26 7912.92 4493.20 79,424.02 2873.34
  1. Bold values denote the best results corresponding to criteria.

Figure 1: AMP Results of Algorithms on 10 Datasets.
Figure 1:

AMP Results of Algorithms on 10 Datasets.

Bibliography

[1] G. Aggarwal and N. Gupta, BEMI bicluster ensemble using mutual information, in: 2013 12th International Conference on Machine Learning and Applications (ICMLA), 1, pp. 321–324, IEEE, 2013.10.1109/ICMLA.2013.65Search in Google Scholar

[2] B. Arindam, A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 509–514, 2004.10.1145/1014052.1014111Search in Google Scholar

[3] M. Asteris, A. Kyrillidis, D. Papailiopoulos and A. G. Dimakis, Bipartite correlation clustering – maximizing agreements, in: Proceedings of the 19th International Conference on Artificial Intelligence and statistics, pp. 121–129, 2016.Search in Google Scholar

[4] A. Beutel, A. Ahmed and A. J. Smola, ACCAMS: additive co-clustering to approximate matrices succinctly, in: International Conference on World Wide Web, pp. 119–129, 2015.10.1145/2736277.2741091Search in Google Scholar

[5] J. Cheng, Z. -S. Tong and L. Zhang, Scaling behavior of nucleotide cluster in DNA sequences, J. Zhejiang Univ. Sci. B 8 (2007), 359–364.10.1631/jzus.2007.B0359Search in Google Scholar PubMed PubMed Central

[6] J. Cheng and L. -x. Zhang, Statistical properties of nucleotide clusters in DNA sequences, J. Zhejiang Univ. Sci. B 6 (2005), 408–412.10.1631/jzus.2005.B0408Search in Google Scholar PubMed PubMed Central

[7] X. Cheng, S. Su, L. Gao and J. Yin, Co-ClusterD: a distributed framework for data co-clustering with sequential updates, IEEE Trans. Knowl. Data Eng. 27 (2015), 3231–3244.10.1109/TKDE.2015.2451634Search in Google Scholar

[8] Y. Cheng and G. M. Church, Biclustering of expression data, in: International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103, 2000.Search in Google Scholar

[9] I. S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274, 2001.10.1145/502512.502550Search in Google Scholar

[10] I. S. Dhillon, S. Mallela and D. S. Modha, Information-theoretic co-clustering, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98, ACM, 2003.10.1145/956750.956764Search in Google Scholar

[11] P. Georg, Ensemble Methods for Plaid Bicluster Algorithm, 2010.Search in Google Scholar

[12] F. Gullo, A. K. M. K. A. Talukder, S. Luke, C. Domeniconi and A. Tagarelli, Multiobjective optimization of co-clustering ensembles, in: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 1495–1496, 2012.10.1145/2330784.2331010Search in Google Scholar

[13] B. Hanczar and M. Nadif, Bagged biclustering for microarray data, in: ECAI, pp. 1131–1132, 2010.Search in Google Scholar

[14] B. Hanczar and M. Nadif, Using the bagging approach for biclustering of gene expression data, Neurocomputing 74 (2011), 1595–1605.10.1016/j.neucom.2011.01.013Search in Google Scholar

[15] B. Hanczar and M. Nadif, Ensemble methods for biclustering tasks, Pattern Recognit. 45 (2012), 3938–3949.10.1016/j.patcog.2012.04.010Search in Google Scholar

[16] J. A. Hartigan, Direct clustering of a data matrix, J. Am. Stat. Assoc. 67 (1972), 123–129.10.1080/01621459.1972.10481214Search in Google Scholar

[17] D. Huang, C. D. Wang and J. H. Lai, Locally weighted ensemble clustering, IEEE Trans. Cybern. PP (2017), 1–14.10.1109/TCYB.2017.2702343Search in Google Scholar PubMed

[18] Q. Huang, X. Chen, J. Huang, S. Feng and J. Fan, Scalable ensemble information-theoretic co-clustering for massive data, in: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, 2012.Search in Google Scholar

[19] P. Li, J. Bu, C. Chen and Z. He, Relational co-clustering via manifold ensemble learning, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1687–1691, ACM, 2012.10.1145/2396761.2398498Search in Google Scholar

[20] H. Liu, J. Wu, T. Liu, D. Tao and Y. Fu, Spectral ensemble clustering via weighted K-means: theoretical and practical evidence, IEEE Trans. Knowl. Data Eng. 29 (2017), 1129–1143.10.1109/TKDE.2017.2650229Search in Google Scholar

[21] L. Menezes and A. L. V. Coelho, On ensembles of biclusters generated by NichePSO, in: 2011 IEEE Congress on Evolutionary Computation (CEC), pp. 601–607, IEEE, 2011.10.1109/CEC.2011.5949674Search in Google Scholar

[22] G. Pio, D. Malerba, D. D’Elia and M. Ceci, Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach, BMC Bioinformatics 15 (2014), S4.10.1186/1471-2105-15-S1-S4Search in Google Scholar PubMed PubMed Central

[23] P. Rathore, J. C. Bezdek, S. M. Erfani, S. Rajasegarar and M. Palaniswami, Ensemble fuzzy clustering using cumulative aggregation on random projections, IEEE Trans. Fuzzy Syst. PP (2017), 1–1.10.1109/TFUZZ.2017.2729501Search in Google Scholar

[24] Z. Tao, H. Liu and Y. Fu, Simultaneous clustering and ensemble, in: AAAI, 2017.10.1609/aaai.v31i1.10720Search in Google Scholar

[25] Z. Tao, H. Liu, S. Li and Y. Fu, Robust spectral ensemble clustering, pp. 367–376, 2016.10.1145/2983323.2983745Search in Google Scholar

[26] L. Teng and K. Tan, Finding combinatorial histone code by semi-supervised biclustering, BMC Genomics 13 (2012), 301.10.1186/1471-2164-13-301Search in Google Scholar PubMed PubMed Central

[27] P. Wang, K. B. Laskey, C. Domeniconi and M. I. Jordan, Nonparametric Bayesian co-clustering ensembles, in: SDM, pp. 331–342, SIAM, 2011.10.1137/1.9781611972818.29Search in Google Scholar

[28] T. Wu, A. R. Benson and D. F. Gleich, General tensor spectral co-clustering for higher-order data, 2016.Search in Google Scholar

[29] Z. Zhou and W. Tang, Clusterer ensemble, Knowl. Based Syst. 19 (2006), 77–83.10.1016/j.knosys.2005.11.003Search in Google Scholar

Received: 2017-10-12
Published Online: 2017-12-30

©2020 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

  1. An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm
  2. Texture Feature Extraction Using Intuitionistic Fuzzy Local Binary Pattern
  3. Leaf Disease Segmentation From Agricultural Images via Hybridization of Active Contour Model and OFA
  4. Deadline Constrained Task Scheduling Method Using a Combination of Center-Based Genetic Algorithm and Group Search Optimization
  5. Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
  6. Distributed Multi-agent Bidding-Based Approach for the Collaborative Mapping of Unknown Indoor Environments by a Homogeneous Mobile Robot Team
  7. An Efficient Technique for Three-Dimensional Image Visualization Through Two-Dimensional Images for Medical Data
  8. Combined Multi-Agent Method to Control Inter-Department Common Events Collision for University Courses Timetabling
  9. An Improved Particle Swarm Optimization Algorithm for Global Multidimensional Optimization
  10. A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
  11. Pythagorean Hesitant Fuzzy Information Aggregation and Their Application to Multi-Attribute Group Decision-Making Problems
  12. Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data
  13. A Bayesian Multiresolution Approach for Noise Removal in Medical Magnetic Resonance Images
  14. Gbest-Guided Artificial Bee Colony Optimization Algorithm-Based Optimal Incorporation of Shunt Capacitors in Distribution Networks under Load Growth
  15. Graded Soft Expert Set as a Generalization of Hesitant Fuzzy Set
  16. Universal Liver Extraction Algorithm: An Improved Chan–Vese Model
  17. Software Effort Estimation Using Modified Fuzzy C Means Clustering and Hybrid ABC-MCS Optimization in Neural Network
  18. Handwritten Indic Script Recognition Based on the Dempster–Shafer Theory of Evidence
  19. An Integrated Intuitionistic Fuzzy AHP and TOPSIS Approach to Evaluation of Outsource Manufacturers
  20. Automatically Assess Day Similarity Using Visual Lifelogs
  21. A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering
  22. Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
  23. Self-Adaptive Mussels Wandering Optimization Algorithm with Application for Artificial Neural Network Training
  24. A Framework for Image Alignment of TerraSAR-X Images Using Fractional Derivatives and View Synthesis Approach
  25. Intelligent Systems for Structural Damage Assessment
  26. Some Interval-Valued Pythagorean Fuzzy Einstein Weighted Averaging Aggregation Operators and Their Application to Group Decision Making
  27. Fuzzy Adaptive Genetic Algorithm for Improving the Solution of Industrial Optimization Problems
  28. Approach to Multiple Attribute Group Decision Making Based on Hesitant Fuzzy Linguistic Aggregation Operators
  29. Cubic Ordered Weighted Distance Operator and Application in Group Decision-Making
  30. Fault Signal Recognition in Power Distribution System using Deep Belief Network
  31. Selector: PSO as Model Selector for Dual-Stage Diabetes Network
  32. Oppositional Gravitational Search Algorithm and Artificial Neural Network-based Classification of Kidney Images
  33. Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure
  34. Sparse Decomposition Technique for Segmentation and Compression of Compound Images
  35. Automatic Genetic Fuzzy c-Means
  36. Harmony Search Algorithm for Patient Admission Scheduling Problem
  37. Speech Signal Compression Algorithm Based on the JPEG Technique
  38. i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
  39. Prediction of User Future Request Utilizing the Combination of Both ANN and FCM in Web Page Recommendation
  40. Presentation of ACT/R-RBF Hybrid Architecture to Develop Decision Making in Continuous and Non-continuous Data
  41. An Overview of Segmentation Algorithms for the Analysis of Anomalies on Medical Images
  42. Blind Restoration Algorithm Using Residual Measures for Motion-Blurred Noisy Images
  43. Extreme Learning Machine for Credit Risk Analysis
  44. A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings
  45. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
  46. A One-Pass Approach for Slope and Slant Estimation of Tri-Script Handwritten Words
  47. Secure Communication through MultiAgent System-Based Diabetes Diagnosing and Classification
  48. Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
  49. Pythagorean Fuzzy Einstein Hybrid Averaging Aggregation Operator and its Application to Multiple-Attribute Group Decision Making
  50. Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content
  51. A Flame Detection Method Based on Novel Gradient Features
  52. Modeling and Optimization of a Liquid Flow Process using an Artificial Neural Network-Based Flower Pollination Algorithm
  53. Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
  54. A Grey Wolf Optimizer for Text Document Clustering
  55. Classification of Masses in Digital Mammograms Using the Genetic Ensemble Method
  56. A Hybrid Grey Wolf Optimiser Algorithm for Solving Time Series Classification Problems
  57. Gray Method for Multiple Attribute Decision Making with Incomplete Weight Information under the Pythagorean Fuzzy Setting
  58. Multi-Agent System Based on the Extreme Learning Machine and Fuzzy Control for Intelligent Energy Management in Microgrid
  59. Deep CNN Combined With Relevance Feedback for Trademark Image Retrieval
  60. Cognitively Motivated Query Abstraction Model Based on Associative Root-Pattern Networks
  61. Improved Adaptive Neuro-Fuzzy Inference System Using Gray Wolf Optimization: A Case Study in Predicting Biochar Yield
  62. Predict Forex Trend via Convolutional Neural Networks
  63. Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  64. A Novel Weakest t-norm based Fuzzy Fault Tree Analysis Through Qualitative Data Processing and Its Application in System Reliability Evaluation
  65. FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
  66. A Modified Jaya Algorithm for Mixed-Variable Optimization Problems
  67. An Improved Robust Fuzzy Algorithm for Unsupervised Learning
  68. Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems
  69. An Efficient Lossless ROI Image Compression Using Wavelet-Based Modified Region Growing Algorithm
  70. Predicting Automatic Trigger Speed for Vehicle-Activated Signs
  71. Group Recommender Systems – An Evolutionary Approach Based on Multi-expert System for Consensus
  72. Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
  73. A New Feature Selection Method for Sentiment Analysis in Short Text
  74. Optimizing Software Modularity with Minimum Possible Variations
  75. Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices
  76. Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach
  77. Feature Pair Index Graph for Clustering
  78. Tangramob: An Agent-Based Simulation Framework for Validating Urban Smart Mobility Solutions
  79. A New Algorithm Based on Magic Square and a Novel Chaotic System for Image Encryption
  80. Video Steganography Using Knight Tour Algorithm and LSB Method for Encrypted Data
  81. Clay-Based Brick Porosity Estimation Using Image Processing Techniques
  82. AGCS Technique to Improve the Performance of Neural Networks
  83. A Color Image Encryption Technique Based on Bit-Level Permutation and Alternate Logistic Maps
  84. A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
  85. Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language
  86. Trapezoidal Linguistic Cubic Fuzzy TOPSIS Method and Application in a Group Decision Making Program
  87. Histopathological Image Segmentation Using Modified Kernel-Based Fuzzy C-Means and Edge Bridge and Fill Technique
  88. Proximal Support Vector Machine-Based Hybrid Approach for Edge Detection in Noisy Images
  89. Early Detection of Parkinson’s Disease by Using SPECT Imaging and Biomarkers
  90. Image Compression Based on Block SVD Power Method
  91. Noise Reduction Using Modified Wiener Filter in Digital Hearing Aid for Speech Signal Enhancement
  92. Secure Fingerprint Authentication Using Deep Learning and Minutiae Verification
  93. The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code
  94. Non-word Attributes’ Efficiency in Text Mining Authorship Prediction
  95. Design and Evaluation of Outlier Detection Based on Semantic Condensed Nearest Neighbor
  96. An Efficient Quality Inspection of Food Products Using Neural Network Classification
  97. Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation
  98. M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
  99. Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
  100. Linear Regression Supporting Vector Machine and Hybrid LOG Filter-Based Image Restoration
  101. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
  102. Implementation of Improved Ship-Iceberg Classifier Using Deep Learning
  103. Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM
  104. Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory
  105. A 4D Trajectory Prediction Model Based on the BP Neural Network
  106. A Blind Medical Image Watermarking for Secure E-Healthcare Application Using Crypto-Watermarking System
  107. Discriminating Healthy Wheat Grains from Grains Infected with Fusarium graminearum Using Texture Characteristics of Image-Processing Technique, Discriminant Analysis, and Support Vector Machine Methods
  108. License Plate Recognition in Urban Road Based on Vehicle Tracking and Result Integration
  109. Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection
  110. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic
  111. Cloud Security: LKM and Optimal Fuzzy System for Intrusion Detection in Cloud Environment
  112. Power Average Operators of Trapezoidal Cubic Fuzzy Numbers and Application to Multi-attribute Group Decision Making
Downloaded on 14.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2017-0513/html
Scroll to top button