Startseite Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
Artikel Open Access

Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering

  • Omkaresh Kulkarni EMAIL logo , Sudarson Jena und C. H. Sanjay
Veröffentlicht/Copyright: 17. Juli 2019
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

The recent advancements in information technology and the web tend to increase the volume of data used in day-to-day life. The result is a big data era, which has become a key issue in research due to the complexity in the analysis of big data. This paper presents a technique called FPWhale-MRF for big data clustering using the MapReduce framework (MRF), by proposing two clustering algorithms. In FPWhale-MRF, the mapper function estimates the cluster centroids using the Fractional Tangential-Spherical Kernel clustering algorithm, which is developed by integrating the fractional theory into a Tangential-Spherical Kernel clustering approach. The reducer combines the mapper outputs to find the optimal centroids using the proposed Particle-Whale (P-Whale) algorithm, for the clustering. The P-Whale algorithm is proposed by combining Whale Optimization Algorithm with Particle Swarm Optimization, for effective clustering such that its performance is improved. Two datasets, namely localization and skin segmentation datasets, are used for the experimentation and the performance is evaluated regarding two performance evaluation metrics: clustering accuracy and DB-index. The maximum accuracy attained by the proposed FPWhale-MRF technique is 87.91% and 90% for the localization and skin segmentation datasets, respectively, thus proving its effectiveness in big data clustering.

1 Introduction

One of the popular and significant topics in computing science research is big data [16]. Basically, datasets are composed of several hundred items, and advancements in technologies make it possible to store and process these data sets that are larger in volume. Such kind of data is known as big data, which is a collection of large-sized and complex data sets. Big data includes three characteristics formed by three V’s (volume, variety, and velocity). Volume refers to the huge size of the data set, variety infers various types of data, and velocity is new data that accumulates constantly. When one of these three characteristics of data exceeds the capacity of the system in storing, analyzing, and processing, the data becomes big. In recent times, big data is popular with the inclusion of two additional V’s such that it can be characterized by five V’s, as follows: volume, velocity, variety, veracity, and value. Big data not only concerns with large-sized data, but it is also a new concept that offers a choice to discover new insights into already existing data. Big data is applicable in various fields, such as business, telecommunication, technology, health care, medicine, bioinformatics, e-commerce, science, finance, information search, etc. [14]. Social networking sites like Facebook, Twitter, Instagram, and so on, have billions of users that create several gigabytes of contents in a minute, and retail shops require continuous data collection of their customers [18]. Hence, such “big data” can create “big challenges” [8].

Besides dealing with the challenges of data collection, the major issue has been focused on processing these enormous volumes of data. Processing big data using traditional data processing techniques is usually difficult. It is often difficult to use analytics and traditional inference approaches using individual processors due to the dimensionality and massive size of the data [20]. It is essential to use optimum mechanisms for knowledge discovery to handle such data. One of the common knowledge discovery tools used for this purpose is the data mining approach. Clustering is a data mining approach where the data is split into groups such that the objects within each group contain more similarity than with the objects in different groups [14]. The major idea behind the clustering approach is to find the target cluster accurately for every case in the data set. Data clustering is an accepted method in different fields of computer science and other related areas. The clustering methods [4], [13] in big data can be categorized into two: single machine and multiple machine clustering techniques. Nowadays, clustering methods based on multiple machines gain more attention, as they are faster and adaptable to most of the challenges of big data [27]. Big data generates heterogeneous data that is difficult to exploit.

Clustering techniques can be a powerful solution, as they can overcome these challenges. Clustering, also known as unsupervised classification, is the process of classifying a set of data into clusters or groups of homogeneous data such that the elements in each cluster are similar. However, clustering methods also have their limitations in processing big data. One of the main challenges in big data is in offering a clustering technique that can generate a satisfactory quality of clustering within a reasonable time [17]. Hence, strategies, techniques, and architectural models that are presently in use are not suitable for handling big data. Accordingly, the MapReduce programming model [2], [11] has been introduced to deal with large data analytic applications, overcoming the challenges of query processing, analysis, and data modeling. The MapReduce model is generally implemented using Hadoop1, which is a parallel programming structure, to handle issues related to large-scale datasets. The MapReduce framework (MRF) comprises mapper and reducer functions, where the mapper performs filtering and sorting and the reducer does a summary operation to obtain the result. This framework with its simple Hadoop architecture can provide better performance for processing large-scale data if the configuration factors are adjusted accurately [19]. The parallelization process using MapReduce has become an attractive technique due to its programming model, which automatically processes the tasks in parallel, providing load balancing and fault tolerance.

1.1 Novelties of the Paper

In this paper, an MRF, FPWhale-MRF, is proposed for big data clustering by developing two hybrid clustering algorithms: Fractional Tangential-Spherical Kernel (FTSK) and Particle-Whale (P-Whale). Due to the complexity in the mapping of large datasets, a novel clustering algorithm, FTSK, is designed by including fractional theory in the Tangential-Spherical Kernel (TSK) clustering approach [12]. TSK clustering is a clustering approach designed by combining tangential and spherical kernels. This approach can improve the scalability by reducing the mapping problems. For further improvement in the performance of the TSK clustering approach, the fractional theory has been included, which finds the centroids such that the convergence speed is enhanced, thereby improving the performance. The fractional theory is a division of applied mathematics that solves fractional-order equations using Laplace transforms. FTSK is utilized in the mapper to find the cluster centroids for clustering, whereas the reducers use P-Whale that combines Particle Swarm Optimization (PSO) with Whale Optimization Algorithm (WOA) to provide optimal clustering based on the intermediate results obtained from the mappers. The function of a reducer is to combine the local centroids generated by the mappers to find the feasible cluster centroids. In P-Whale, the update process of PSO is modified using WOA, so that the clustering task is made effective and the P-Whale algorithm determines the optimal centroid. With the proposed P-Whale, the reducer clusters the data, where the number of clusters is user-defined. Thus, the proposed MRF with the newly designed clustering algorithms can perform big data clustering effectively.

The major contributions of the proposed FPWhale-MRF used for big data clustering are as follows:

  1. Introducing FTSK clustering by integrating fractional theory into the TSK clustering algorithm such that the mapper can locate the most appropriate cluster centroids for the clustering.

  2. Combining PSO with WOA to design a novel algorithm, P-Whale, and utilizing it in the reducer for the optimal clustering of big data, based on the given input, which are the intermediate results of the mappers.

  3. Designing the FPWhale-MRF clustering technique for big data clustering using FTSK clustering and the P-Whale algorithm in the mapper and reducer functions, respectively, for the effective clustering of data.

The organization of the paper is as follows: The literature survey presenting various techniques of big data clustering is given in Section 2. Section 3 explains the proposed FPWhale-MRF technique developed for big data clustering using the proposed FTSK and P-Whale algorithms adopted in MRF with a suitable block diagram. In Section 4, the results and the comparative analysis to evaluate the performance of the proposed technique are demonstrated. Finally, the paper is concluded in Section 5.

2 Literature Survey

Different techniques based on the MapReduce model used in the literature for big data clustering are explained in this section, stating their drawbacks along with the challenges discovered.

2.1 Clustering Techniques

Xia et al. [24] designed an MRF using the Parallel Three-Phase K-means (Par3PKM) algorithm depending on the Hadoop technology. The algorithm utilized a modified distance measure with K-means algorithm for the initialization. To improve the optimized K-means algorithm, the MRF was employed to cluster large-scale taxi trajectory data. The Par3PKM algorithm could perform clustering with better efficiency and higher scalability. However, it suffers from the drawback of missing kernel space for the distance measurement.

Traganitis et al. [20] had presented two variants, sketch and validate, of kernel-based K-means clustering for big data clustering. The former was based on batch processing, whereas the latter was the sequential one used to provide efficiency in computation. The algorithm can perform clustering of data efficiently; however, the failure to consider the defined MRF is the limitation.

Vadivel and Raghunath [23] designed an approach for hierarchical clustering to group big data based on the MRF. The clustering task utilized feature selection based on co-occurrence depending on the distributed architecture that shuffled the results obtained from the mappers according to the queues. Even though the method offers less computational time, it takes considerable time for the computation in the merging step for hierarchical clustering.

Fries et al. [7] had extended the state-of-the-art projected clustering algorithm P3C by investigating its solutions for large-scale data sets that were in high-dimensional spaces. The authors proved that the original model of the algorithm was not appropriate to handle large datasets. Hence, they designed a MapReduce-based implementation, known as the P3C+-MR algorithm, by providing the changes required in the basic clustering model. It has better scalability provided by the MRF, whereas the curse of dimensionality becomes a challenging process.

Akthar et al. [1] had modified the K-means clustering algorithm for big data clustering by taking the optimized centers according to the data dimensional density. The modification of the algorithm was based on the basic idea of selecting the optimal “K” data points that are in the highly dense areas as the initial centers, so that the data points outside the specified areas were eliminated from the computation of final clusters. Even though the algorithm provides better results, it has several limitations, as follows: (i) the results of the algorithm are compared only with a single algorithm, which is not adequate for effective performance analysis, and (ii) computation of distance measure takes considerable time. The clustering techniques have the following limitations: fixed number of clusters can make it difficult to predict the K-value; they do not work well in non-globular clusters; and different initial partitions can result in different final clusters.

2.2 Optimization Techniques

Hans et al. [9] developed a technique to implement clustering based on Genetic Algorithm (GA) in a parallel manner using Hadoop MapReduce by extending the coarse-grained parallel model of GAs. Thereby, the authors could perform a two-phase clustering process on the given data set based on the MapReduce architecture utilized. However, the technique requires further improvement in accuracy and speed. The optimization techniques have drawbacks, such as being time consuming and requiring more repetition.

2.3 Supervised Learning Techniques

With the utilization of a series of optimizations, Chen et al. [5] had developed the Parallel Semi-supervised Extreme Learning Machine (PASS-ELM) algorithm using the MapReduce model to enhance the performance of SS-ELM. The design of PASS-ELM was based on the Approximate Adjacent Similarity Matrix (AASM) algorithm, which utilized the Locality-Sensitive Hashing method to compute AASM, thereby reducing the complexity and the storage space required. However, the requirement for several optimizations to enhance the efficiency of the algorithm is a drawback.

Kamal et al. [10] presented a distributed clustering approach based on imbalance data reduction with the K-nearest neighbor classification method. The major contribution of the approach was to demonstrate real training data sets by reducing the number of instances such that the data classification could be performed quickly. The difficulties during data reduction were managed by an MRF, which was developed based on several clusters of automated contents with different algorithms. The approach can handle large datasets with better speedup, low reduction time, and less storage capacity. The supervised learning techniques have drawbacks, such as being inaccurate and unable to detect emerging and unknown anomalies.

2.4 Challenges

The methods discussed in the literature survey pose various challenges, which the proposed technique tries to overcome. Some of the challenges found in big data clustering are as follows:

  1. In Ref. [20], the clustering task is carried out using the kernel space, which is not applicable to the determination of clusters that are non-linear. As the accuracy of the algorithm in clustering depends on those clusters, it is a major issue in clustering big data.

    The proposed technique for big data clustering is applicable for the determination of all types of clusters, as it takes the advantages of fractional theory, which is used to model the non-linearity.

  2. One of the main challenges in big data is in data clustering, due to the large size and varieties of data to be considered. The common tools available for the processing of such data are not effective, even if various computer clusters are employed. Hence, it is necessary to find a better approach for clustering and handling large datasets [14].

    The proposed FTSK clustering approach handles the large datasets. FTSK is designed by including fractional theory in the TSK clustering approach. The TSK clustering approach can improve scalability by reducing the mapping problems.

  3. Another challenge in big data is its complexity, which increases with increasing amount of data. Usual techniques managing relational database tools cannot provide a satisfactory outcome meeting all the requirements [27].

    The proposed FPWhale-MRF technique offers satisfactory results, as it takes the advantages of both the FTSK and P-Whale algorithms.

  4. Increasing demands for data is another issue, which can be managed only by increasing the capacity and performance of the techniques utilized, within the resources provided.

    The proposed method manages the big data as it utilizes the MRF, which performs clustering by processing the partitioned data in parallel. Thus, it is possible to handle large-scale datasets using the MapReduce model, by sharing a task on several cluster nodes.

  5. MapReduce models can handle big data issues to a great extent. However, the techniques utilized in Refs. [7], [23], [24] are not effective without additional processes to enhance their performances in terms of computational time and accuracy.

    The proposed MRF offers good performance in terms of computational time and accuracy.

3 Proposed MRF Using Fractional P-Whale-Based Clustering for Big Data Clustering

This section presents the proposed technique of FPWhale-MRF, used for big data clustering based on two-hybrid clustering algorithms adopted in the MRF. Each mapper in the MRF utilizes a novel clustering algorithm, FTSK, which finds the cluster centroids using tangential and spherical kernels combined with fractional theory. The centroids obtained by the mappers are merged and fed as input to the reducers, where the proposed P-Whale is employed to determine the optimal clusters for data clustering. The P-Whale algorithm is developed by integrating WOA into PSO, and is used in clustering to estimate the centroids for the final cluster. The block diagram of the proposed technique of clustering big data is depicted in Figure 1.

Figure 1: Block Diagram of the Proposed FPWhale-MRF.
Figure 1:

Block Diagram of the Proposed FPWhale-MRF.

3.1 Proposed Technique of Big Data Clustering Using FPWhale-MRF

In this subsection, the proposed FPWhale-MRF technique designed for big data clustering is described. MRF, which is a programming model, is composed of mappers and reducers. It performs clustering by processing the partitioned data in parallel. Thus, it is possible to handle large-scale datasets using the MapReduce model, by sharing a task on several cluster nodes. In FPWhale-MRF, the mappers, which receive the partitioned data as input, find the clustering centroids using the FTSK clustering algorithm. Meanwhile, the reducer functions use the P-Whale algorithm for the optimal selection of centroids for the clustering. Let B denote the database having m number of data with n attributes, as represented below.

(1) B={bk,l}; 0km; 0ln.

The data bk,l in the database is split into a finite number, which is equal to the number of mappers in the MRF. The partitioned data is given by

(2) bk,l={Dq}; 1qN,

where N is the total number of mappers. Let N number of mappers in the MRF be represented as

(3) M={M1,M2,,Mq,,MN};  1qN.

Hence, the input to the qth mapper can be given as

(4) Dq={dr,l}; 1rmq; 1ln,

where dr,l is the partitioned data given to the qth mapper for processing and mq is the number of data in the qth mapper. Each mapper maps its input data based on the number of clusters defined by the user and generates the intermediate results. Thus, the outcome of each mapper that depends on the data pair and the attributes represents the cluster centroid. All N number of mappers generates output of size m × n, represented as follows:

(5) I={p1||p2||pN},

where p indicates the mapper outcome, which represents the cluster centroid.

The generated output from the mapper is given as input to the set of reducers, represented as

(6) R={R1,R2,,Rj,RP}; 1jP,

where P is the number of reducers. The reducers merge the resulting cluster centroids obtained from the mappers to produce the final clusters, as given below:

(7) cj={cx,lj;1xCj1ln1jP},

where Cj is the size of the cluster assigned to the jth reducer and cx,lj represents the cluster in the jth reducer.

3.2 Proposed FTSK-Based Mapper Function for Centroid Estimation

The mapper function designed using the proposed FTSK clustering algorithm is explained in this section. Due to the complexity in the mapping of large datasets, a novel clustering algorithm is designed by including fractional theory in the TSK clustering approach [12]. TSK clustering is a clustering approach designed by combining tangential and spherical kernels. This approach can improve the scalability by reducing the mapping problems. For further improvement in the performance of the TSK clustering approach, the fractional theory has been included. Hence, every mapper in the MRF finds the cluster centroids based on the proposed FTSK clustering algorithm, which will be discussed below.

Initially, the cluster centers are chosen in random from the data given as input. Let pi be the selected cluster center, after which the partitioned data is applied to the cluster center depending on the similarity measured. In the distance matrix computed, the column denotes the data, while the row indicates the cluster. For the first column, the distance matrix is obtained by calculating the Euclidean distance between the first centroid and the data. Similarly, for the second column, Euclidean distance is measured between the second centroid and the data. Then, the tangential and spherical distances are computed at iteration zero between the cluster centroids and the data. The data is clustered by minimizing the distance measures based on the tangential and spherical kernels. The tangential kernel-based distance matrix computation is expressed as

(8) Kr,iT=ET(dr,pi)i=1NCET(dr,pi),

where ET(dr,pi) represents the tangential Euclidean distance between the data dr and the cluster centroid pi, and NC is the number of clusters. Meanwhile, the spherical kernel-based distance measure is computed as follows:

(9) Kr,iS=ES(dr,pi)i=1NCES(dr,pi),

where ES(dr,pi) represents the spherical Euclidean distance between the data and the cluster centroid. The tangential and spherical Euclidean distances are measured as

(10) ET(dr,pi)=tanh(α||drTpi||+a),
(11) ES(dr,pi)=132(||drTpi||σ)+12(||drTpi||σ)2,

where tanh represents the hyperbolic tangent, α represents the slope, ||drTpi|| is the distance measure, a represents the constant, and σ represents the standard deviation. The centroids for each cluster are measured using these kernel functions. The procedure of distance measurement and the computation of new cluster centroids continue until the convergence condition is reached. Hence, the cluster centroids are formulated based on both tangential and spherical kernels as

(12) pi(t+1)T=1mir=1rimi(Kr,iT×dri=1NCKr,iT),
(13) pi(t+1)S=1mir=1rimi(Kr,iS×dri=1NCKr,iS),

where mi is the number of data in the ith mapper. Combining both kernels, the TSK clustering approach increases the clustering accuracy by finding the cluster centroids as

(14) pi(t+1)F=12(pi(t+1)T+pi(t+1)S),

where piT is the cluster centroid obtained based on the tangential kernel and piS is the cluster centroid based on the spherical kernel. In the proposed FTSK clustering algorithm, fractional calculus [3] is incorporated to find the centroids such that the convergence speed is enhanced, thereby improving the performance. Fractional calculus is a division of applied mathematics that solves fractional-order equations using Laplace transforms. Accordingly, Eq. (14) can be rearranged by subtracting the cluster centroid estimated at current iteration as

(15) pi(t+1)Fpi(t)F=12(pi(t+1)T+pi(t+1)S)pi(t)F,

where pi(t)F is the centroid estimated at iteration t. Considering the left-hand side of the above equation as a derivative of order γ, Eq. (15) can be represented as follows:

(16) Dγ[pi(t+1)F]=12(pi(t+1)T+pi(t+1)S)pi(t)F,

where γ takes a value between 0 and 1 and hence considers the first four terms of the derivative. Thus, Eq. (15) becomes

(17) pi(t+1)F=γpi(t+1)F+12γpi(t1)F+16γ(1γ)pi(t2)F+124γ(1γ)(2γ)pi(t3)F+12(pi(t+1)T+pi(t+1)S)pi(t)F.

Rearranging the equation, the cluster centroid estimated by the proposed FTSK clustering algorithm is given by

(18) pi(t+1)F=pi(t+1)F[γ1]+12γpi(t1)F+16γ(1γ)pi(t2)F+124γ(1γ)(2γ)pi(t3)F+12(pi(t+1)T+pi(t+1)S),

where pi(t2)F and pi(t3)F are the cluster centroids measured at iterations (t2) and (t3), respectively. Hence, the mapper function calculates the cluster centroids based on Eq. (18) formulated using the proposed FTSK clustering algorithm so that the performance of clustering can be improved from that of TSK clustering [12].

3.3 Proposed P-Whale-Based Centroid Estimation for Big Data Clustering

This section illustrates the reducer function, which is executed using the proposed P-Whale algorithm. In P-Whale, the update process of PSO is modified using WOA, so that the clustering task is made effective. The function of a reducer is to combine the local centroids generated by the mappers to find the feasible cluster centroids. The reducer utilizes the intermediate results produced by the mappers, given as

(19) I={Iy,l}; 1yNC×N; 1ln,

where NC is the number of clusters and Ix,l is the intermediate result generated by N mappers. Processing I obtained from the mapper, the reducer finds the cluster, as defined below:

(20) R(I)=cj,

where cj is the number of clusters in the jth reducer. Thus, the reducer generates the output given by the following function:

(21) RP={pi,l;1qN1iNqC1ln},

where NqC is the number of clusters in the qth mapper. This calculation is done using the proposed P-Whale algorithm that generates the optimal cluster centroids for the clustering based on the fitness defined using the DB-index. Thus, with the proposed P-Whale, the reducer clusters the data, where the number of clusters is user-defined. PSO [26] is a stochastic optimization method that mimics the social behavior of fish schooling, whereas WOA [15] is a nature-inspired algorithm developed based on the hunting behavior of humpback whales. Integrating WOA in the update process of PSO, the performance of the proposed algorithm in clustering can be improved. Adopting P-Whale, the reducer processes the data, which is represented in vector form, to find the clusters.

3.3.1 Solution Representation

The solution encoding presents the simplest view of representing the proposed P-Whale algorithm designed for finding the clusters in the proposed FPWhale-MRF. Here, the solution is the cluster centroid, which is initialized in random, depending on the intermediate data produced by the mappers. Thus, the solution is a vector, whose size is equivalent to the number of clusters and the data. Based on the fitness evaluated using the DB-index, the cluster centroids can be determined optimally using the P-Whale algorithm.

3.3.2 Fitness Evaluation

The fitness function, which decides the quality of the solution, is designed using a DB-index [6], similar to that utilized in Ref. [12]. The DB-index measures the similarity between the clusters and is suitable for clustering algorithms that depend on distance conditions. The fitness function is formulated based on the DB-index as

(22) DB=1cje=1cjFe,

where Fe is a function that selects the maximum similarity value measured between the clusters, as defined by the following equation:

(23) Fe=maxSe,ffe,

where Se,f is the similarity measure that measures the similarity between the clusters based on the Euclidean distance measured between the clusters and is represented as

(24) Se,f=Ae+AfEe,f,

where Ae and Af are the measures of scattering in two clusters and Ee,f is the Euclidean distance between the two cluster matrices. The lower the distance between the cluster centroid and the data points, the greater is the performance of clustering:

(25) Ee,f=LeLf,

where Le and Lf denote the centroids of the two clusters. Based on the data given to the cluster, Ae is computed with respect to the Euclidean distance between the data and the cluster:

(26) Ae=1mee=1meIy,lLe,

where me is the number of data points associated with Le. Even though the distance between the centroids and the data points needs to be minimum, the distance between two centroids has to be maximum for better clustering.

3.3.3 Proposed P-Whale Algorithm

The proposed P-Whale algorithm, designed for the optimal selection of cluster centroids to perform clustering, is summarized in this section. The P-Whale algorithm is designed by modifying the update process of PSO using WOA. In PSO [26], a number of particles interact to find the optimal solutions in the search space, where the particles learn based on the personal best and global best solutions. This way of learning may lead to premature convergence, which can be solved using WOA, which has better convergence behavior with local optima avoidance. The steps involved in the proposed P-Whale algorithm are described below.

I. Initialization

The foremost step is the random initialization of the swarm population with a number of solutions, represented as

(27) H={H1,H2,,Hs,,Hz}; 1sz,

where Hs represents the position of the sth solution, such that the dimension of each solution is 1 × J and z is the number of swarm particles.

II. Fitness Calculation

Once the population is initialized, the fitness of all the solutions is computed using the fitness function formulated in Section 3.3.2. The solution having the minimum fitness value is considered as the best solution. Thus, the algorithm selects the personal best and the global best solution, represented as Gpb and Ggb, respectively.

III. Whale-Based Update Process

The update process of PSO involves the velocity and position updates. The velocity assigned to the sth particle can be updated based on the personal and the global best solutions together with the velocity computed at current iteration t, as given below:

(28) vs(t+1)=Wvs(t)+k1h1(GpbHs(t))+k2h2(GgbHs(t)),

where W is the inertia weight, k1 and k2 are the acceleration rates, h1 and h2 are two numbers chosen in random between 0 and 1, vs(t) is the velocity at iteration t, and Hs(t) is the position of the sth particle at the current iteration. Based on the velocity update, the position of the sth particle can be updated as follows:

(29) Hs(t+1)=Hs(t)+vs(t+1),

where vs(t+1) is the velocity at (t+1)th iteration. The P-Whale algorithm is introduced by modifying the above equation using the position update equation of WOA, represented as

(30) Hs(t+1)=Xeugcos(2πg)+Ggb,

where the distance measure X′ is given by X=|GgbHs(t)|, u is a constant, g is a random number ranging from −1 to 1, and the best solution in WOA is replaced by the global best solution Ggb. The competitive performance of WOA ensures the effectiveness of the proposed algorithm. Equation (30) is rearranged as

(31) Ggb=Hs(t+1)Xeugcos(2πg).

Substituting Eq. (31) in the velocity update equation of PSO, given in Eq. (28):

(32) vs(t+1)=Wvs(t)+k1h1(GpbHs(t))+k2h2(Hs(t+1)Xeugcos(2πg)Hs(t)).

This newly obtained velocity update equation is substituted in the position update equation [Eq. (29)], as expressed below

(33) Hs(t+1)=Hs(t)+Wvs(t)+k1h1(GpbHs(t))+k2h2(Hs(t+1)Xeugcos(2πg)Hs(t)),
(34) Hs(t+1)=Hs(t)+Wvs(t)+k1h1(GpbHs(t))+k2h2Hs(t+1)k2h2(Xeugcos(2πg)+Hs(t)),
(35) Hs(t+1)k2h2Hs(t+1)=Hs(t)+Wvs(t)+k1h1(GpbHs(t))k2h2(Xeugcos(2πg)+Hs(t)),
(36) Hs(t+1)[1k2h2]=Hs(t)+Wvs(t)+k1h1(GpbHs(t))k2h2(Xeugcos(2πg)+Hs(t)),
(37) Hs(t+1)=1[1k2h2][Hs(t)+Wvs(t)+k1h1(GpbHs(t))k2h2(Xeugcos(2πg)+Hs(t))],
(38) Hs(t+1)=1Z[Hs(t)+d(t)],

where Z=[1k2h2] and d(t)=Wvs(t)+k1h1(GpbHs(t))k2h2(Xeugcos(2πg)+Hs(t)). Equation (38) forms the position update equation of the proposed P-Whale algorithm, which improves the performance of the algorithm in clustering.

IV. Determining the Best Solution

After the position update, the new solutions generated are evaluated using the same fitness function. Among the fitness-evaluated solutions, the solution with the minimum fitness replaces the existing one and thus becomes the optimal solution.

V. Termination

The above steps are repeated until the number of iterations t reaches the maximum number of counts within which the optimal solution for clustering can be determined.

The pseudocode of the proposed P-Whale algorithm is presented in Table 1.

Table 1:

Pseudocode of the P-Whale Algorithm.

Proposed P-Whale algorithm
1 Input: Intermediate data
2 Output: Global best solution Ggb
3 Parameters: t → iteration, vs(t) → velocity at iteration t, Ggb → global best solution, Gpb → personal best solution
4 Begin
5            Initialize the random population
6            Assign the velocity vs(t) to each particle in the population
7            While (t < max_ t)
8                    for each particle
9                              Evaluate the fitness using Eq. (22)
10                              Determine Gpb and Ggb
11                              Update the velocity vs(t + 1) using Eq. (32)
12                              Update the position of the particle Hs(t + 1) using Eq. (38)
13                    end for
14                    Determine the best solution by replacing the existing solution based on the fitness function
15            end while
16            t = t + 1
17            Return Ggb
18 Terminate

4 Results and Discussion

The results of the proposed FPWhale-MRF technique used for big data clustering are demonstrated in this section with the experimental setup and comparative analysis.

4.1 Experimental Setup

The experiment is carried out in a system operated using Windows 10 with the following configurations: RAM, 2 GB; system type, 64-bit Operating System (OS); and processor, Intel Pentium. The proposed technique is implemented using the JAVA software tool (Sun Microsystems, Oracle Corporation, Redwood City, CA, USA). The number of mappers and reducers used for the experimentation is six and seven, respectively.

Dataset Description: The number of datasets utilized for the experimentation is two, namely the localization dataset (dataset 1) [21] and the skin segmentation dataset (dataset 2) [22], taken from UCI Machine Learning Repository. The first dataset includes data obtained from various activities recorded from five different people who wore four tags: ankle left, ankle right, chest, and belt. The number of instances in the dataset is 164,860, and every instance represents a localization data for each tag. It consists of eight attributes, which can be used to identify the tag. The second is the skin segmentation dataset, which is built by sampling the R, G, and B values, generating skin and non-skin dataset from the FERET and PAL databases. This includes four attributes and 245,057 instances, with 50,859 skin samples and 194,198 non-skin samples.

4.2 Comparative Techniques

The performance of the proposed FPWhale-MRF is compared with four existing techniques, such as (i) Multiple Kernel and the Swarm-Based Map-Reduce Framework (MKS-MRF) [12], (ii) K-means-MRF [24], (iii) Fuzzy C-means-MRF (FCM-MRF) [25], and (iv) kernel fuzzy C-means-MRF (KFCM-MRF) [28]. Clustering is performed based on these existing techniques considering MRF in each technique for processing the big data. The performance of these techniques is evaluated using two performance evaluation metrics and compared in the comparative analysis.

4.3 Performance Evaluation Measures

The comparison of the performance of the comparative techniques is based on two evaluation metrics: DB-index, which is explained in Section 3.3.2, and clustering accuracy, which is defined as

(39) ACC=1mi=1NCmaxj=1CL(cicjl),

where m is the number of data, NC is the number of clusters, CL is the number of classes, ci denotes the ith cluster, and cjl denotes the jth class.

4.4 Evaluation of Performance

This section illustrates the performance evaluation of the proposed technique evaluated using the measures, accuracy, and DB-index, in the two datasets.

4.4.1 Accuracy Analysis

The analysis based on accuracy in the five comparative techniques performed using the datasets, skin segmentation and localization, is explained in this subsection using Figure 2. Figure 2 shows the accuracy analysis using Dataset 1 and Dataset 2. In Figure 2A, the resulting graph of accuracy analysis for dataset 1 is shown by plotting the accuracy for various mappers, denoted here as M, against the number of clusters varied from 2 to 6. Here, the maximum clustering accuracy of 90% is produced for M = 2 and 4, when the cluster size is 3 and 4, respectively. For a number of clusters of 6, the maximum accuracy possible is 88.95%, which is 1.18% less than the maximum accuracy produced. The accuracy analysis plot for dataset 2 is sketched out in Figure 2B. When M = 2, the accuracy obtained for the number of clusters 2 is 90%, which reduces to 83.33% when M = 5. Increasing the number of clusters to 6, the maximum accuracy of 90% is attained for M = 3.

Figure 2: Accuracy Analysis Using (A) Dataset 1 and (B) Dataset 2.
Figure 2:

Accuracy Analysis Using (A) Dataset 1 and (B) Dataset 2.

4.4.2 DB-Index Analysis

Figure 3 presents the results of analysis based on the DB-index for the two datasets in the comparative techniques. The accuracy analysis for dataset 1 is given in Figure 3A. The lower the DB-index, the greater is the clustering performance. Here, the minimum value computed is 6.24 for M = 4 when the number of clusters is 2. When the number of clusters is 6, for M = 3, the DB-index value increases to a peak value of 365.98, which reduces to 91.79 for M = 5. In Figure 3B, the accuracy analysis for the second dataset is plotted. As M = 2, 3, 4, and 5, the DB-index measured is 19.24, 15.9, 5.85, and 10.96, for the number of clusters fixed as 2. The minimum DB-index obtained using the proposed FPWhale-MRF technique is 5.85. When the number of clusters is kept 6, the DB-index produced is 83.63, 172.53, 49.83, and 85.44, respectively, for M = 2, 3, 4, and 5.

Figure 3: DB-Index Analysis Using (A) Dataset 1 and (B) Dataset 2.
Figure 3:

DB-Index Analysis Using (A) Dataset 1 and (B) Dataset 2.

4.5 Comparative Analysis

To evaluate the level of performance of the proposed technique with the existing techniques, the comparative analysis is performed. The analysis is done based on the accuracy and the DB-index using the two datasets.

4.5.1 Using Dataset 1

The comparative analysis made in the proposed technique and the four existing techniques using dataset 1 is depicted in Figure 4. Figure 4A presents the result of analysis based on accuracy using the first dataset by varying the number of clusters. When the number of clusters is 2, the accuracy obtained using the existing MKS-MRF, K-means-MRF, and FCM-MRF is the same, 75.58%, while that in FPWhale-MRF is 87.91%. As the number of clusters is increased to 5, the accuracy attained by the proposed technique is 85%, whereas 82.43% is the maximum accuracy produced by the existing MKS-MRF. The DB-index values analyzed using the comparative techniques with dataset 1 are shown in Figure 4B. Minimum value is observed in FPWhale-MRF for all the cluster sizes considered. When 10.83 is the minimum DB-index provided by FPWhale-MRF for the cluster size of 3, MKS-MRF, K-means-MRF, FCM-MRF, and KFCM-MRF have a DB-index of 29.79, 172.68, 187.72, and 185.26, respectively. Thus, from the analysis using dataset 1, i.e. localization data, the proposed FPWhale-MRF is observed to have the maximum performance than the other techniques compared.

Figure 4: Comparative Analysis Using Dataset 1: (A) Accuracy and (B) DB-Index.
Figure 4:

Comparative Analysis Using Dataset 1: (A) Accuracy and (B) DB-Index.

4.5.2 Using Dataset 2

In Figure 5, the comparative analysis result obtained in the five considered techniques using dataset 2 is sketched out. In the accuracy analysis graph shown in Figure 5A, the maximum accuracy produced by the proposed FPWhale-MRF is 90%, for the number of clusters of 2. In the same instant, the accuracy obtained using the existing MKS-MRF, FCM-MRF, and KFCM-MRF is 79.24%. Meanwhile, K-means-MRF has a clustering accuracy of 78.57%. The analysis based on DB-index using dataset 2 is depicted in Figure 5B, where the minimum value is achieved by the proposed technique, with a DB-index of 7.73, for two clusters. For the same case, the minimum DB-index produced among the existing techniques is 12.01 by MKS-MRF. Hence, from the results of the analysis, the proposed FPWhale seems to have better performance than the other techniques considered for the comparison.

Figure 5: Comparative Analysis Using Dataset 2: (A) Accuracy and (B) DB-Index.
Figure 5:

Comparative Analysis Using Dataset 2: (A) Accuracy and (B) DB-Index.

4.6 Convergence Analysis

Figure 6 shows the convergence analysis of the proposed technique for dataset 1 and dataset 2. The analysis is performed for various iterations (1–100). Figure 6A shows the convergence analysis of the proposed technique for dataset 1. When the number of iteration is 20, the DB-index of the proposed method is 98.36, which gradually decreases when the number of iterations increases. At 100th iteration, the proposed method has a DB-index of 10.83, which is smaller than the DB-index of the existing methods. Figure 6B shows the convergence analysis of the proposed technique for dataset 2. The proposed method has the minimum DB-index of 7.73 at the 100th iteration, which is lower than the DB-index of the existing methods.

Figure 6: Comparative Analysis Using (A) Dataset 1 and (B) Dataset 2.
Figure 6:

Comparative Analysis Using (A) Dataset 1 and (B) Dataset 2.

4.7 Analysis Based on Computational Time

Table 2 shows the computational time of the proposed method and the existing methods. From the table, it can be seen that the proposed method has a minimum computational time of 5 s, while the existing methods, such as K-means-MRF, FCM-MRF, KFCM-MRF, and MKS-MRF, have computational times of 8, 7, 6, and 6.5 s, respectively.

Table 2:

Computational Time of the Comparative Methods.

Methods Computational time (s)
K-means-MRF 8
FCM-MRF 7
KFCM-MRF 6
MKS-MRF 6.5
FPWhale-MRF 5
  1. The bold value represents best performance.

4.8 Discussion

Based on the comparative analysis made in Section 4.5, a discussion is carried out regarding the maximum performance measures in MKS-MRF, K-means-MRF, FCM-MRF, KFCM-MRF, and FPWhale-MRF, as presented in Table 3.

Table 3:

Performance Comparison Based on Maximum Performance.

Methods Dataset 1
Dataset 2
Accuracy DB-Index Accuracy DB-Index
MKS-MRF 82.43 29.79 85.06 12.01
K-means-MRF 75.58 83.2 79.24 18.11
FCM-MRF 75.59 34.04 79.24 88.17
KFCM-MRF 79.52 30.27 79.24 23.07
FPWhale-MRF 87.91 10.83 90 7.73
  1. The bold values represent best performance.

Table 3 lists the maximum performance obtained by the comparative techniques for the two datasets by varying the cluster size from 2 to 5. For dataset 1, when MKS-MRF had an accuracy of 82.43%, the proposed FPWhale-MRF could provide 87.91% accuracy. For the same dataset, the minimum DB-index measured using the existing technique is 29.79 by MKS-MRF. Meanwhile, FPWhale-MRF has only 10.83 as the DB-index. On the analysis using dataset 2, the accuracy and the DB-index produced by the existing MKS-MRF are 85.06% and 12.01, whereas those in K-means-MRF are 79.24% and 18.11, respectively. In the meantime, the proposed FPWhale-MRF could obtain an accuracy of 90% and a DB-index of 7.73.

Table 4 shows the minimum performance attained by the comparative techniques for the two datasets by varying the cluster size from 2 to 5. For dataset 1, the minimum accuracy attained by the proposed FP-Whale-MRF is 84.44, while the minimum accuracy attained by the other existing methods is 75.58. For the same dataset, the maximum DB-index measured using the proposed technique is 91.86, which is smaller than the DB-index of other existing techniques. On the analysis using dataset 2, the accuracy and the DB-index produced by the existing MKS-MRF are 79.24 and 115.09, whereas those in K-means-MRF are 78.57 and 167.56, respectively. In the meantime, the proposed FPWhale-MRF could obtain an accuracy of 85% and a DB-index of 69.47.

Table 4:

Performance Comparison Based on Minimum Performance.

Methods Dataset 1
Dataset 2
Accuracy DB-Index Accuracy DB-Index
MKS-MRF 75.58 216.91 79.24 115.09
K-means-MRF 75.58 325.92 78.57 167.56
FCM-MRF 75.58 214.35 79.24 114.36
KFCM-MRF 75.58 267.28 79.24 769.56
FPWhale-MRF 84.44 91.86 85 69.47
  1. The bold values represent best performance.

Table 5 lists the mean performance obtained by the comparative techniques for the two datasets by varying the cluster size from 2 to 5. For dataset 1, when MKS-MRF has an accuracy of 78.81%, the proposed FPWhale-MRF has 86.06% accuracy. For the same dataset, the DB-index measured using the existing MKS-MRF technique is 91.79. Meanwhile, FPWhale-MRF has only 45.33 as the DB-index. On the analysis using dataset 2, the accuracy and the DB-index produced by the existing MKS-MRF are 81.37% and 55.23, whereas those in K-means-MRF are 79.03% and 68.59, respectively. In the meantime, the proposed FPWhale-MRF obtains an accuracy of 87.01% and a DB-index of 37.04.

Table 5:

Performance Comparison Based on the Mean Performance.

Methods Dataset 1
Dataset 2
Accuracy DB-Index Accuracy DB-Index
MKS-MRF 78.81 91.79 81.37 55.23
K-means-MRF 75.58 185.25 79.03 68.59
FCM-MRF 75.58 158.02 79.24 101.01
KFCM-MRF 76.59 149.08 79.24 263.14
FPWhale-MRF 86.06 45.33 87.01 37.04
  1. The bold values represent best performance.

5 Conclusion

In this paper, a technique for big data clustering is presented using MRF, named FPWhale-MRF, based on two clustering algorithms, FTSK and P-Whale. FTSK adopted in the mapper is developed by the integration of fractional calculus in the TSK clustering algorithm to find the cluster centroids. Meanwhile, the reducers contain an optimization-based clustering algorithm, P-Whale, developed by modifying PSO using WOA, for optimal clustering. Thus, the proposed FPWhale-MRF technique performs big data clustering effectively using the proposed clustering algorithm. The experiment is performed using two datasets, localization and skin segmentation, and the results are compared with that of the existing techniques, such as MKS-MRF, K-means-MRF, FCM-MRF, and KFCM-MRF. The performance of the proposed technique is evaluated using two metrics, clustering accuracy and DB-index. FPWhale-MRF could attain the maximum accuracy of 87.91% and 90% for the localization and skin segmentation datasets, whereas that in the existing MKS-MRF is 82.43% and 85.06%, respectively. Therefore, it can be concluded that the proposed FPWhale-MRF technique can perform big data clustering effectively with maximum clustering accuracy compared with the existing comparative techniques.

Bibliography

[1] N. Akthar, M. V. Ahamad and S. Khan, Clustering on big data using Hadoop MapReduce, in: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, pp. 789–795, IEEE, Piscataway, NJ, USA, 2015.10.1109/CICN.2015.161Suche in Google Scholar

[2] A. Banharnsakun, A MapReduce-based artificial bee colony for large-scale data clustering, Pattern Recognit. Lett. 93 (2017), 78–84.10.1016/j.patrec.2016.07.027Suche in Google Scholar

[3] P. R. Bhaladhare and D. C. Jinwala, A clustering approach for the l-diversity model in privacy preserving data mining using fractional calculus-bacterial foraging optimization algorithm, Adv. Comput. Eng. 2014, (2014), 1–12.10.1155/2014/396529Suche in Google Scholar

[4] N. Bharill, A. Tiwari and A. Malviya, Fuzzy based scalable clustering algorithms for handling big data using Apache Spark, IEEE Trans. Big Data 2 (2016), 339–352.10.1109/TBDATA.2016.2622288Suche in Google Scholar

[5] C. Chen, K. Li, A. Ouyang and K. Li, A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets, J. Parallel Distrib. Comput. 108 (2017), 85–94.10.1016/j.jpdc.2017.01.007Suche in Google Scholar

[6] D. L. Davies and D. W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell. 1 (1979), 224–227.10.1109/TPAMI.1979.4766909Suche in Google Scholar

[7] S. Fries, S. Wels and T. Seidl, Projected clustering for huge data sets in MapReduce, in: Proceedings of International Conference on Extending Database Technology (EDBT), Athens, Greece, pp. 49–60, OpenProceedings.org, Konstanz, Germany, 2014.Suche in Google Scholar

[8] K. Grolinger, M. Hayes, W. A. Higashino, A. L’Heureux, D. S. Allison and M. A. M. Capretz, Challenges for MapReduce in big data, in: 2014 IEEE World Congress on Services, Anchorage, AK, pp. 182–189, IEEE, Piscataway, NJ, USA, 2014.10.1109/SERVICES.2014.41Suche in Google Scholar

[9] N. Hans, S. Mahajan and S. N. Omkar, Big data clustering using genetic algorithm on Hadoop Mapreduce, Int. J. Sci. Technol. Res. 4 (2015), 58–62.Suche in Google Scholar

[10] S. Kamal, S. H. Ripon, N. Dey, A. S. Ashour and V. Santhi, A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset, Comput. Methods Progr. Biomed. 131 (2016), 191–206.10.1016/j.cmpb.2016.04.005Suche in Google Scholar PubMed

[11] H. Ke, P. Li, S. Guo and M. Guo, On traffic-aware partition and aggregation in MapReduce for big data applications, IEEE Trans. Parallel Distrib. Syst. 27 (2016), 818–828.10.1109/TPDS.2015.2419671Suche in Google Scholar

[12] O. Kulkarni and S. Jena, MKS-MRF: a multiple kernel and a swarm-based MapReduce framework for big data clustering, Int. Rev. Comput. Softw. 11 (2016).10.15866/irecos.v11i11.10117Suche in Google Scholar

[13] D. Kumar, J. C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie and T. C. Havens, A hybrid approach to clustering in big data, in: IEEE Trans. Cybernet. 46 (2016), 2372–2385.10.1109/TCYB.2015.2477416Suche in Google Scholar PubMed

[14] O. Kurasova, V. Marcinkevicius, V. Medvedev, A. Rapecka and P. Stefanovic, Strategies for big data clustering, in: Proceedings of IEEE 26th International Conference on Tools with Artificial Intelligence, Limassol, pp. 740–747, IEEE, Piscataway, NJ, USA, 2014.10.1109/ICTAI.2014.115Suche in Google Scholar

[15] S. Mirjalili and A. Lewis, The whale optimization algorithm, Adv. Eng. Softw. 95 (2016), 51–67.10.1016/j.advengsoft.2016.01.008Suche in Google Scholar

[16] F. Pulgar-Rubio, A. J. Rivera-Rivas, M. D. Pérez-Godoy, P. González, C. J. Carmona and M. J. del Jesus, MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments – a MapReduce solution, Knowl.-Based Syst. 117 (2017), 70–78.10.1016/j.knosys.2016.08.021Suche in Google Scholar

[17] H. Rehioui, A. Idrissi, M. Abourezq and F. Zegrari, DENCLUE-IM: a new approach for big data clustering, Proc. Comput. Sci. 83 (2016), 560–567.10.1016/j.procs.2016.04.265Suche in Google Scholar

[18] A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah and T. Herawan, Big data clustering: a review, in: Proceedings of International Conference on Computational Science and Its Applications, ICCSA 2014, pp. 707–720, 2014.10.1007/978-3-319-09156-3_49Suche in Google Scholar

[19] H. Singh and S. Bawa, A MapReduce-based scalable discovery and indexing of structured big data, Future Gen. Comput. Syst. 73 (2017), 32–43.10.1016/j.future.2017.03.028Suche in Google Scholar

[20] P. A. Traganitis, K. Slavakis and G. B. Giannakis, Sketch and validate for big data clustering,IEEE J. Select. Topics Signal Process. 9 (2015), 678–690.10.1109/JSTSP.2015.2396477Suche in Google Scholar

[21] UCI Machine Learning Repository, Localization data for person activity data set, Available at: https://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity, Accessed 24 March, 2018.Suche in Google Scholar

[22] UCI Machine Learning Repository, Skin segmentation data set, Available at: https://archive.ics.uci.edu/ml/datasets/skin+segmentation, Accessed 24 March, 2018.Suche in Google Scholar

[23] M. Vadivel and V. Raghunath, Enhancing Map-Reduce framework for big data with hierarchical clustering, Innov. Res. Comput. Commun. Eng. 2 (2014), 490–498.Suche in Google Scholar

[24] D. Xia, B. Wang, Y. Li, Z. Rong and Z. Zhang, An efficient MapReduce-based parallel clustering algorithm for distributed traffic subarea division, Discrete Dynam. Nat. Soc. 2015 (2015) Article ID 793010, 18 pp.10.1155/2015/793010Suche in Google Scholar

[25] Q. Yu and Z. Ding, An improved fuzzy C-means algorithm based on MapReduce, in: Proceedings of 8th International Conference on Biomedical Engineering and Informatics (BMEI), Shenyang, China; IEEE, Piscataway, NJ, USA, 2015.10.1109/BMEI.2015.7401581Suche in Google Scholar

[26] M. Yuwono, S. W. Su, B. D. Moulton and H. T. Nguyen, Data clustering using variants of rapid centroid estimation, IEEE Trans. Evol. Comput. 18 (2014), 366–377.10.1109/TEVC.2013.2281545Suche in Google Scholar

[27] B. Zerhari, A. A. Lahcen and S. Mouline, Big data clustering: algorithms and challenges, in: Proceedings of International Conference on Big Data, Cloud and Applications BDCA’15, May 2015.Suche in Google Scholar

[28] H. Zhu, Y. Guo, M. Niu, G. Yang and L. Jiao, Distributed SAR image change detection based on Spark, in: Proceedings of IEEE International Conference on Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy; IEEE, Piscataway, NJ, USA, 2015.10.1109/IGARSS.2015.7326739Suche in Google Scholar

Received: 2018-01-31
Published Online: 2019-07-17

©2020 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Artikel in diesem Heft

  1. An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm
  2. Texture Feature Extraction Using Intuitionistic Fuzzy Local Binary Pattern
  3. Leaf Disease Segmentation From Agricultural Images via Hybridization of Active Contour Model and OFA
  4. Deadline Constrained Task Scheduling Method Using a Combination of Center-Based Genetic Algorithm and Group Search Optimization
  5. Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
  6. Distributed Multi-agent Bidding-Based Approach for the Collaborative Mapping of Unknown Indoor Environments by a Homogeneous Mobile Robot Team
  7. An Efficient Technique for Three-Dimensional Image Visualization Through Two-Dimensional Images for Medical Data
  8. Combined Multi-Agent Method to Control Inter-Department Common Events Collision for University Courses Timetabling
  9. An Improved Particle Swarm Optimization Algorithm for Global Multidimensional Optimization
  10. A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
  11. Pythagorean Hesitant Fuzzy Information Aggregation and Their Application to Multi-Attribute Group Decision-Making Problems
  12. Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data
  13. A Bayesian Multiresolution Approach for Noise Removal in Medical Magnetic Resonance Images
  14. Gbest-Guided Artificial Bee Colony Optimization Algorithm-Based Optimal Incorporation of Shunt Capacitors in Distribution Networks under Load Growth
  15. Graded Soft Expert Set as a Generalization of Hesitant Fuzzy Set
  16. Universal Liver Extraction Algorithm: An Improved Chan–Vese Model
  17. Software Effort Estimation Using Modified Fuzzy C Means Clustering and Hybrid ABC-MCS Optimization in Neural Network
  18. Handwritten Indic Script Recognition Based on the Dempster–Shafer Theory of Evidence
  19. An Integrated Intuitionistic Fuzzy AHP and TOPSIS Approach to Evaluation of Outsource Manufacturers
  20. Automatically Assess Day Similarity Using Visual Lifelogs
  21. A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering
  22. Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
  23. Self-Adaptive Mussels Wandering Optimization Algorithm with Application for Artificial Neural Network Training
  24. A Framework for Image Alignment of TerraSAR-X Images Using Fractional Derivatives and View Synthesis Approach
  25. Intelligent Systems for Structural Damage Assessment
  26. Some Interval-Valued Pythagorean Fuzzy Einstein Weighted Averaging Aggregation Operators and Their Application to Group Decision Making
  27. Fuzzy Adaptive Genetic Algorithm for Improving the Solution of Industrial Optimization Problems
  28. Approach to Multiple Attribute Group Decision Making Based on Hesitant Fuzzy Linguistic Aggregation Operators
  29. Cubic Ordered Weighted Distance Operator and Application in Group Decision-Making
  30. Fault Signal Recognition in Power Distribution System using Deep Belief Network
  31. Selector: PSO as Model Selector for Dual-Stage Diabetes Network
  32. Oppositional Gravitational Search Algorithm and Artificial Neural Network-based Classification of Kidney Images
  33. Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure
  34. Sparse Decomposition Technique for Segmentation and Compression of Compound Images
  35. Automatic Genetic Fuzzy c-Means
  36. Harmony Search Algorithm for Patient Admission Scheduling Problem
  37. Speech Signal Compression Algorithm Based on the JPEG Technique
  38. i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
  39. Prediction of User Future Request Utilizing the Combination of Both ANN and FCM in Web Page Recommendation
  40. Presentation of ACT/R-RBF Hybrid Architecture to Develop Decision Making in Continuous and Non-continuous Data
  41. An Overview of Segmentation Algorithms for the Analysis of Anomalies on Medical Images
  42. Blind Restoration Algorithm Using Residual Measures for Motion-Blurred Noisy Images
  43. Extreme Learning Machine for Credit Risk Analysis
  44. A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings
  45. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
  46. A One-Pass Approach for Slope and Slant Estimation of Tri-Script Handwritten Words
  47. Secure Communication through MultiAgent System-Based Diabetes Diagnosing and Classification
  48. Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
  49. Pythagorean Fuzzy Einstein Hybrid Averaging Aggregation Operator and its Application to Multiple-Attribute Group Decision Making
  50. Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content
  51. A Flame Detection Method Based on Novel Gradient Features
  52. Modeling and Optimization of a Liquid Flow Process using an Artificial Neural Network-Based Flower Pollination Algorithm
  53. Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
  54. A Grey Wolf Optimizer for Text Document Clustering
  55. Classification of Masses in Digital Mammograms Using the Genetic Ensemble Method
  56. A Hybrid Grey Wolf Optimiser Algorithm for Solving Time Series Classification Problems
  57. Gray Method for Multiple Attribute Decision Making with Incomplete Weight Information under the Pythagorean Fuzzy Setting
  58. Multi-Agent System Based on the Extreme Learning Machine and Fuzzy Control for Intelligent Energy Management in Microgrid
  59. Deep CNN Combined With Relevance Feedback for Trademark Image Retrieval
  60. Cognitively Motivated Query Abstraction Model Based on Associative Root-Pattern Networks
  61. Improved Adaptive Neuro-Fuzzy Inference System Using Gray Wolf Optimization: A Case Study in Predicting Biochar Yield
  62. Predict Forex Trend via Convolutional Neural Networks
  63. Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  64. A Novel Weakest t-norm based Fuzzy Fault Tree Analysis Through Qualitative Data Processing and Its Application in System Reliability Evaluation
  65. FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
  66. A Modified Jaya Algorithm for Mixed-Variable Optimization Problems
  67. An Improved Robust Fuzzy Algorithm for Unsupervised Learning
  68. Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems
  69. An Efficient Lossless ROI Image Compression Using Wavelet-Based Modified Region Growing Algorithm
  70. Predicting Automatic Trigger Speed for Vehicle-Activated Signs
  71. Group Recommender Systems – An Evolutionary Approach Based on Multi-expert System for Consensus
  72. Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
  73. A New Feature Selection Method for Sentiment Analysis in Short Text
  74. Optimizing Software Modularity with Minimum Possible Variations
  75. Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices
  76. Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach
  77. Feature Pair Index Graph for Clustering
  78. Tangramob: An Agent-Based Simulation Framework for Validating Urban Smart Mobility Solutions
  79. A New Algorithm Based on Magic Square and a Novel Chaotic System for Image Encryption
  80. Video Steganography Using Knight Tour Algorithm and LSB Method for Encrypted Data
  81. Clay-Based Brick Porosity Estimation Using Image Processing Techniques
  82. AGCS Technique to Improve the Performance of Neural Networks
  83. A Color Image Encryption Technique Based on Bit-Level Permutation and Alternate Logistic Maps
  84. A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
  85. Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language
  86. Trapezoidal Linguistic Cubic Fuzzy TOPSIS Method and Application in a Group Decision Making Program
  87. Histopathological Image Segmentation Using Modified Kernel-Based Fuzzy C-Means and Edge Bridge and Fill Technique
  88. Proximal Support Vector Machine-Based Hybrid Approach for Edge Detection in Noisy Images
  89. Early Detection of Parkinson’s Disease by Using SPECT Imaging and Biomarkers
  90. Image Compression Based on Block SVD Power Method
  91. Noise Reduction Using Modified Wiener Filter in Digital Hearing Aid for Speech Signal Enhancement
  92. Secure Fingerprint Authentication Using Deep Learning and Minutiae Verification
  93. The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code
  94. Non-word Attributes’ Efficiency in Text Mining Authorship Prediction
  95. Design and Evaluation of Outlier Detection Based on Semantic Condensed Nearest Neighbor
  96. An Efficient Quality Inspection of Food Products Using Neural Network Classification
  97. Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation
  98. M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
  99. Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
  100. Linear Regression Supporting Vector Machine and Hybrid LOG Filter-Based Image Restoration
  101. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
  102. Implementation of Improved Ship-Iceberg Classifier Using Deep Learning
  103. Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM
  104. Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory
  105. A 4D Trajectory Prediction Model Based on the BP Neural Network
  106. A Blind Medical Image Watermarking for Secure E-Healthcare Application Using Crypto-Watermarking System
  107. Discriminating Healthy Wheat Grains from Grains Infected with Fusarium graminearum Using Texture Characteristics of Image-Processing Technique, Discriminant Analysis, and Support Vector Machine Methods
  108. License Plate Recognition in Urban Road Based on Vehicle Tracking and Result Integration
  109. Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection
  110. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic
  111. Cloud Security: LKM and Optimal Fuzzy System for Intrusion Detection in Cloud Environment
  112. Power Average Operators of Trapezoidal Cubic Fuzzy Numbers and Application to Multi-attribute Group Decision Making
Heruntergeladen am 13.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jisys-2018-0117/html
Button zum nach oben scrollen