Home TA-ABC: Two-Archive Artificial Bee Colony for Multi-objective Software Module Clustering Problem
Article Open Access

TA-ABC: Two-Archive Artificial Bee Colony for Multi-objective Software Module Clustering Problem

  • Amarjeet EMAIL logo and Jitender Kumar Chhabra
Published/Copyright: May 4, 2017
Become an author with De Gruyter Brill

Abstract

Multi-objective software module clustering problem (M-SMCP) aims to automatically produce clustering solutions that optimize multiple conflicting clustering criteria simultaneously. Multi-objective evolutionary algorithms (MOEAs) have been a most appropriate alternate for solving M-SMCPs. Recently, it has been observed that the performance of MOEAs based on Pareto dominance selection technique degrades with multi-objective optimization problem having more than three objective functions. To alleviate this issue for M-SMCPs containing more than three objective functions, we propose a two-archive based artificial bee colony (TA-ABC) algorithm. For this contribution, a two-archive concept has been incorporated in the TA-ABC algorithm. Additionally, an improved indicator-based selection method is used instead of Pareto dominance selection technique. To validate the performance of TA-ABC, an empirical study is conducted with two well-known M-SMCPs, i.e. equal-size cluster approach and maximizing cluster approach, each containing five objective functions. The clustering result produced by TA-ABC is compared with existing genetic based two-archive algorithm (TAA) and non-dominated sorting genetic algorithm II (NSGA-II) over seven un-weighted and 10 weighted practical problems. The comparison results show that the proposed TA-ABC outperforms significantly TAA and NSGA-II in terms of modularization quality, coupling, cohesion, Pareto optimality, inverted generational distance, hypervolume, and spread performance metrics.

1 Introduction

The highly used commercial software systems frequently need to be modified in response to demand for change in customer, business, and technological requirements. These changes normally have to be performed in short deadlines and within limited budget. Hence, developers generally modify the systems without considering the design guidelines of the original software system. Such maintenance practices often degrade structural design of software systems [30]. For a software system, it becomes a very complicated task to make further changes in those whose structure quality deteriorated to the point where it is difficult to understand [34]. The modular structure with low cohesion and high coupling is one of the main reasons of structural design deterioration. The software module clustering technique, in which the software entities are organized into disjoint sets of cluster according to predefined criteria [43], is one of the successful methods to improve the modular structure of the complicated software systems.

The software module clustering process takes software modules with their dependency as an input and partitions the modules into several disjoint sets of clusters based on predefined rules so that the software become more understandable and maintainable [43]. The predefined rules can be various software structural design criteria [7, 8, 20] such as minimum coupling, maximum cohesion, etc. The decomposition of software modules into clusters based on some structural design criteria is defined as the software module clustering problem (SMCP) [15, 32, 34, 35, 43]. Many software module clustering approaches in research literature have been proposed to address the SMCPs [15, 36, 43, 46].

These approaches can be broadly divided into two main groups: (1) search-based software module clustering and (2) non-search-based software module clustering approach. In search-based approaches, a problem is transformed as a search-based optimization problem and solved using search-based meta-heuristic algorithms (e.g. genetic algorithms, GA, USA) [16, 45], while in non-search-based approach the problem is solved using deterministic algorithms (e.g. hierarchical clustering). The software partitioning problem (i.e. SMCP) is a class of an non-deterministic polynomial-time-hard (NP-hard) problem [43]; hence, deterministic algorithm cannot be a good alternate because exponential time is needed to solve it. This observation provides the motivation for the use of search-based meta-heuristics in solving the SMCPs. The meta-heuristic approaches do not ensure the generation of optimal solution since these approaches evaluate only a part of the feasible search space, but try to search the different part in the search space in an effective way to get a near-optimal solution in reasonable computation time and cost [41].

Mostly, search-based approaches first transform the software systems into a module dependency graph (MDG) [35] and then solve the SMCPs as a graph partitioning problem [34, 38, 43]. MDGs are directed graphs in which vertex and edges represent modules and their relationships, respectively. Based on the number of optimization criteria, SMCPs can be designated as a single-objective software module clustering problems (S-SMCPs) or multi-objective software module clustering problems (M-SMCPs) and can be formulated as single- or multi-objective optimization problem. The S-SMCPs have a single solution which optimizes single software quality criteria, while M-SMCPs have many solutions which optimize simultaneously more than one software quality criteria.

Most researchers [1, 2, 15, 32, 35, 37, 38] have formulated the SMCPs as a single-objective optimization problem and solved using different single-objective meta-heuristic algorithms [e.g. hill-climbing (HC) and GA]. The main limitation of the single-objective optimization approach is that it optimizes a single criterion and generates only a single solution at each run. Thus, little information can be provided to the decision makers about different aspects of the quality criterion. So formulation of SMCP as a multi-objective optimization problem, and solving using multi-objective evolutionary approach has recently become more practically useful.

The current search-based multi-objective software module clustering approaches [5, 28, 29, 43] have been applied successfully to solve the M-SMCPs. Most of the multi-objective software modules clustering approaches for M-SMCP problems are genetic-based multi-objective evolutionary algorithms (MOEAs) (e.g. [4, 5, 29, 43]). Even though these approaches have shown many advantages in solving the M-SMCPs, still there are some challenges corresponding to characteristics of MOEAs such as uncertainty, conflicting attribute, large number of quality criteria, etc that need to be addressed. In this paper, we address the following two main challenges:

  • The SMCP is naturally a multi-objective optimization problem; hence, MOEA has to solve it by optimizing multiple objective functions simultaneously. However, the performance of MOEAs, for instance, non-dominated sorting genetic algorithm II (NSGA-II) [13], gets deteriorated if the number of objectives functions increases by more than three.

  • Most of the MOEAs contain many control parameters that require to be set by users according to domain knowledge/experience and problem characteristics to achieve a satisfactory performance. However, it is the most challenging task for SMCPs because there are diverse categories of software problems and identifying control parameters for every different problem is very difficult and time consuming.

To address the above challenges, there is growing need for MOEAs that can solve M-SMCPs efficiently with a large number of objective functions. Recent works [12, 19, 40, 47, 48] in optimization literature have proposed several algorithms to address such kind of optimization problems using, for instance, preference based [19], objective reduction [40], reference based [12], decomposition based [47], and indicator based [48] approaches. However, to the best of our knowledge, these techniques have not yet been explored to solve the M-SMCPs. We propose a multi-objective software module clustering technique by integrating the two-archive [44] and indicator based selection [48] concepts into the original artificial bee colony (ABC) [26].

To assess the effectiveness of the proposed approach, we applied it over seven un-weighted and 10 weighted open software projects. We report the results of our proposed approach and compared it with existing MOEAs [two-archive algorithm (TAA) and NSGA-II] that have been used to solve the M-SMCPs by the previous researchers [5, 43]. The results indicate that our proposed approach significantly outperforms TAA and NSGA-II based approaches, in terms of modularization quality (MQ) [43], coupling [43], cohesion [43], Pareto optimality [43], inverted generational distance (IGD) [50], hypervolume (HV) [49], and spread performance [13] metrics.

The rest of this paper is organized as follows: Section 2 presents related research works. Section 3 briefly describes the SMCPs and problem formulation. Section 4 gives a short description of two-archive evolutionary algorithm and detailed description of two-archive based artificial bee colony (TA-ABC). Section 5 presents details of the experimental setup. Section 6 presents the results and compares them to the best performing algorithms from the existing literature to demonstrate the superiority of the TA-ABC algorithm. Section 7 discusses the implications of the results. Section 8 provides threats to validity. Finally, Section 9 gives the concluding remarks and future research directions.

2 Related Works

It is a commonly accepted fact that a software system comprising well-modularized structure is easier to design, develop, test, maintain, and evolve [6, 9]. However, maintaining a large software system becomes difficult, especially if their modular structure degrades and is not documented well [15, 24]. To address the SMCPs, various search-based and deterministic approaches have been proposed in the literature [3, 29, 36, 38, 43]. Our approach formulates the SMCPs as search-based M-SMCP and solves using multi-objective two-archive ABC algorithm; hence, related works are centered on search-based software module clustering literature.

Mancoridis et al. [35] were the first who proposed the search-based optimization approach to address the SMCPs. The work [35] formulated the characteristics of well-modularized software as objective functions; the evaluation of these objective functions directs the optimization process towards good clustering. Further, the works have developed an automated tool named Bunch for clustering the software systems. Following this search-based optimization concept for SMCPs, many other search-based optimization methods have been designed in previous works, such as GA, HC algorithm, simulated annealing (SA), and so forth [15, 21, 29, 32, 33, 37, 38, 43].

The work of Mancoridis et al. [34] applied the Bunch tool for the maintenance and architecture recovery of software systems. To guide the process of searching, the authors designed an objective function, namely, MQ. The MQ is a trade-off between interconnectivity and intraconnectivity and has been integrated into the Bunch tool [34]. Using the Bunch tool with a set of meta-heuristic clustering algorithms (GA, HC, and SA), the software system is partitioned as subsystems at a high level. The clustering results process resulted in software which possesses the best quality in terms of grouping and turns out to be effective in a medium as well as for large systems.

Several studies [15, 21, 38] have demonstrated that for module clustering the HC algorithm has outperformed the standard search techniques such as GA and SA in terms of both the execution time and solution quality. However, it is well known that the HC algorithm suffers from the problem of early convergence to local optima [32]. To overcome this problem, the authors [32] proposed a multiple HC approach to address the software module clustering.

Praditwong [42] proposed two evolutionary algorithms based on GA, namely, Grouping Genetic Algorithm (GGA) and Group Number Encoding (GNE), to solve SMCPs. The author also performed a comparative study of these two genetic-based algorithms over various real-world SMCPs in terms of the MQ quality measure. The results demonstrated that GGA produced high-quality solutions compared to the GNE approach. Further, the same authors [43] formulated the SMCP, a multi-objective search problem [namely, maximizing cluster approach (MCA) and equal-size cluster approach (ECA)] and use two-archive evolutionary algorithms [44].

Even after formulation of SMCP as a search-based optimization problem, some other widely used search-based algorithms (e.g. ABC [26] and Grey Wolf Algorithm [27]) have not gained much attention. The ABC algorithm has been demonstrated to be effective and well-situated to solve various optimization problems in the field of science and engineering [11, 23, 25, 31]. Recently, Dahiya et al. [11] demonstrated the applicability of ABC in software testing; however, the applicability and usefulness of the ABC algorithm have not been studied by any researcher till date to solve the SMCPs. This paper formulates SMCP as search-based multi-objective optimization problem and solves using ABC meta-heuristic algorithm.

3 Software Module Clustering Problems

SMCP is a problem of automatically grouping software modules into disjoint sets of clusters to improve software design structure [43]. The SMCPs is basically a graph partitioning problem which is a class of NP-hard problem [17, 43]. The SMCPs can be represented as a MDG which is defined as a graph G=(V, E), where V represents the set of modules and E is the set of relationships between modules. All modules need to be partitioned into k non-overlapping clusters C1, C2,…,Ck; that is, C1C2∪···∪Ck=V, Ci≠∅ and CiCj=∅, i, j=1, 2,∈k, and ij. A good partitioning of the MDG is regarded as a partition with minimum interconnection and maximum intraconnection. The number of ways to partition an MDG containing a set of n vertex into k nonempty clusters can be computed by using the Stirling numbers of the second kind, S(n, k) [22]. The searching for an optimal partition from an MDG becomes problematic as the number of modules increases. To solve such class of problems using deterministic or exhaustive methods requires very high computing time; hence, formulation of SMCPs as a search-based optimization problem is the best alternative to find a near-optimal solution. The search-based SMCPs can be formulated as single-objective or multi-objective optimization problem. The brief description of multi-objective optimization formulation for SMCP is given in the following subsection.

3.1 Multi-objective Formulation

In multi-objective SMCP, more than one and less than or equal to three objectives are optimized. It determines a clustering solution x* for which

(1)f(x)={min [f1(x), f2 (x), . . . , fM(x )]TM2gj(x)0j=1,...,Phk(x)=0k=1,...,QxiLxixiUi=1,...,n

where M and fi represent the number of objective functions and ith objective function, respectively. Q is the number of equality constraints; P is the number of inequality constraints, and xiU and xiLrepresent the upper and lower bounds of the decision variable xi.

3.2 Module Clustering Objective Functions

The main goal of software module clustering is to improve the quality of clustering by optimizing various conflicting software attributes. Praditwong et al. [43] have proposed two multi-objective formulations (i.e. ECA and MCA) that capture attributes of a well-clustered software system. Moreover, these formulations also help in guiding the optimization process towards good clustering. The objective functions defined under MCA and ECA formulations are as follows: (1) maximization of cohesion (i.e. sum of intracluster edges), (2) minimization of coupling (i.e. sum of intercluster edges), (3) minimization of number of clusters, (4) maximization of MQ, (5) minimization of the number of isolated clusters, and (6) minimization of the differences between maximum size cluster and minimum size cluster.

The MCA formulation includes the objective function numbered with 1, 2, 3, 4, and 5, and ECA formulation includes the objective function numbered with 1, 2, 3, 4, and 6. The computations of all identified objective functions except the MQ objective are straightforward. The computation of MQ is defined as follows:

(2)MQ=k=1mMFkwhereMFk={0,ifi=0ii+12j,ifi>0

where i is the number of intracluster edges and j is that of intercluster edges of cluster k for an un-weighted MDG, while for weighted MDG, i represents the total weight of intracluster edges and j represents total weight of intercluster edges of cluster k.

4 Two-Archive based Artificial Bee Colony

The basic ABC algorithm [26] was designed to solve the single-objective continuous optimization problem. However, the software module clustering is a natural multi-objective optimization problem where various conflicting quality criteria need to be optimized to obtain a good quality software structure. The M-SMCP can be designed as S-SMCP by aggregating all objective functions into a single objective function and further can be solved using the single-objective ABC algorithm. However, such formulation has the following shortcoming: as the population evolves, all individual solutions suffer earlier convergence to the local optima in very few generations. This may lead single-objective ABC algorithm towards production of the population with low diversity in successive generations [37]. Hence, for complex M-SMCPs, we propose TA-ABC algorithm adapting the concepts of two-archive approach which can produce a good clustering solution with good convergence, satisfactory diversity, and acceptable complexity.

4.1 The Basic Concept of ABC Algorithm

The ABC, a meta-heuristic algorithm based on the behavior of bees, has gained wide attention and has been demonstrated to be effective and well situated for solving the various types of optimization problems in science and engineering fields [11, 23, 25, 31]. The main steps of the basic ABC algorithm are as follows:

  • Population initialization phase: The initial population of the basic ABC algorithm is generated by a random process. Let vi={vi1, vi2, …, vin} represent the ith food source in the population with n number of decision variables. To initialize the population, each food source is generated as follows:

    (3)vij=vijmin+(vijmaxvijmin)×r,j=1,...,n;i=1,...,SN,

    where vijmaxand vijminrepresent the upper and lower bounds for the decision variable j, respectively, and r is used a uniform random number in [0, 1].

  • Employed bee phase: In the employed bee phase, the ith food source vij of the population is assigned to the ith employed bee, which generates a new neighboring solution around the assigned food source as follows.

    (4)vnewj=vij+U(1,1)×(vijvkj)

    where i∈{1,…,SN}, and k∈{1,…,SN}∧ki is a randomly chosen food source. After generating new solution vnew it is evaluated and compared to vi then the solution with the higher fitness value is selected.

  • Onlooker bee phase: The onlooker bees make a decision on food sources whether to select or not the food source selected by the employed bees. To perform this, the onlooker bees use the probability values, calculated using Eq. (9), to select the food source for discovering promising regions in the search space.

    (5)pi=fitii=nSNfiti

    where fiti is the fitness value of the ith food source.

  • Scout bee phase: If a food source cannot be further improved through a limited iteration, then the food source is supposed to be abandoned and a randomly produced food source will be replaced with it.

Algorithm 1:

The ABC Algorithm [26].

1. Input- Parameters values;22.   i←1, t←0; //Onlooker bee phase;
2.   NFS: Population size (i.e. number of food source);23. whilet<NFSdo
3.   NIC: Number of iterations;24. r←rand(0, 1); //random generation;
4.   NLMT: Maximum number of trials;25. ifr<pithen
5. Output-Optimal solution;26.  t←t+1;
6. begin27.  CSi ←a candidate solution by Eq. (3);
7.  fori=1 to NFSdo  //Generation of food sources for initial population;28.  f(CSi )←evaluate candidate solution;
8.   FSi ←generate food source i using Eq. (7);29. iff(CSi )<f(FSi ) then  //greedy selection;
9.   fi←f(FSi )  //calculate fitness function of food source i;30.  FSi←CSi
10.   Tr(i)←0 //initialize trial to zero;31.  f(FSi )←f(CSi )
11. Itr←1; //initialize iteration to one32.  Tr(i)←0;
12. whileItr<NICdo33. else
13.  fori=1 to NFSdo  //Employee bee phase;34.  Tr(i)←Tr(i)+1;
14.   CSi ←generate a candidate solution using Eq. (8)35.  i←(i+1)mod NFS
15.   f(CSi )←evaluate fitness function of candidate solution36. //Scout bee phase;
16.   iff(CSi )<f(FSi ) then  //greedy selection;37. ind={i: Tr(i)=max(Tr)};
17.    FSiCSi38. ifTr (ind)>NLMTthen
18.    f(FSi )←f(CSi )39.  FSind←random solution by Eq. (7);
19.    Tr(i)←0;40.  findf(FSind);
20.   elseTr(i)←Tr(i)+1;41.  Tr(ind)←0
21. Calculate each onlooker’s bee probability using Eq. (9);42. ItrItr+1

The above basic ABC algorithm was designed to solve the single-objective optimization problems that have the continuous decision variables; hence, the original form of the algorithm cannot be directly used for solving the combinatorial/discrete multi-objective optimization problems. The M-SMCP is a discrete multi-objective optimization problem; therefore, in this work, some alterations to the basic ABC algorithm have been done for making it suitable for the M-SMCPs.

4.2 Proposed TA-ABC Algorithm

This section presents a TA-ABC approach to solve M-SMCPs. The proposed TA-ABC has the following main features:

  • The approach can work efficiently for more than three objective functions.

  • The approach provides a good balance between exploration and exploitation.

  • The proposed approach can be easily implemented.

To impart the above features, the TA-ABC algorithm exploits the concepts of two-archive technique [44] and indicator based ranking [48]. The combination of both these concepts makes TA-ABC algorithm perform efficiently in case of more than three objective functions. On the other hand, use of ABC algorithm concepts makes the approach free from many parameters and perform good exploitation and exploration of the search space. The flow chart of TA-ABC is given in Figure 1.

Figure 1: Flow Chart of Proposed TA-ABC Algorithm.
Figure 1:

Flow Chart of Proposed TA-ABC Algorithm.

The working of TA-ABC method is divided into six parts: food source representation, Population initialization, Send employed bees, Send onlooker bees, Send scout bees, and Update the archive. The detailed explanations of these parts are provided in the subsequent subsections.

4.2.1 Food Source Representation

To solve M-SMCPs with the TA-ABC algorithm, its solution requires to be modeled in a proper way, so as it can be solved efficiently. In the search-based techniques, the solution is encoded as a string of (typically binary) numbers. In our TA-ABC approach, each module clustering solution is encoded as a string of integer numbers instead of binary numbers. In the integer encoding, a single integer perturbation can separate a module clustering solution into two distinct module clustering solutions, while binary representation requires a large number of perturbations. Hence, in the integer encoding, individual module clustering solutions are a smaller distance from one another, which significantly increases the power of exploration and exploitation [10].

Let {m1, m2,…mN} be the set of n number of modules in the software system. Then the solution is represented as a vector of nm=n integers (m=[m1, m2,…mN]). In this representation, the value 0<min of the ith module indicates the cluster to which the ith module is assigned. A clustering solution with the same value for all the modules means that all modules are grouped in the same cluster, while a clustering solution containing all possible values (from 1 to n) denotes that each cluster holds only a single module. To demonstrate it, let us consider a hypothetical software system depicted in Figure 2.

Figure 2: Representation of a Simple Food Source (i.e. Software Clustering Solution).
Figure 2:

Representation of a Simple Food Source (i.e. Software Clustering Solution).

In Figure 2 the clustering solution (i.e. food source) of software system contains eight modules (i.e. numbered with 1–8) distributed in three clusters, namely, C1, C2, and C3. Hence, it can be represented as a vector C=[1, 1, 2, 2, 2, 3, 3, 3], where modules 1 and 2 are in cluster C1, modules 3, 4, and 5 are in cluster C2, and modules 6, 7, and 8 are in cluster C3.

4.2.2 Population Initialization

The TA-ABC algorithm receives the population size (PS), MaxTrial, the number of dimensions (D), the number of scouts (Scouts), and the two external archives, namely, convergence archive (CA) and divergence archive (DA), each with variable size and constant total size equal to PS. The number of food sources (clustering solutions) is set as equal to PS. After initialization of basic parameters, the initial food sources are generated randomly, and their nectar amount (clustering fitness function) is determined. In multi-objective approach, instead of finding a single solution, a set of non-dominated solutions are collected. For this, non-dominated food sources are collected and stored in the two external archives CA and DA according to their updating rules (details are given in subsection 4.2.6). Algorithm 2 provides pseudo-code of population initialization.

Algorithm 2:

Pseudo-Code of the Initialization of Food Sources.

1. TA-ABC (Dataset, CA, DA, FoodNumber, MaxTrial)
2. Generate food sources c=(c1, c2, , cFoodNumber) randomly
3. For i=1 to FoodNumber
4.  For d=1 to D   /* D represents the dimension (i.e. total number of classes) */
5.    cid ←RandInt (UBd−LBd))   /* LB=1 and UB=Total number of classes */
6.   End For
7.  End For
8. Calculate each objective function of ci food source based on considered multi-objective formulation
9. Initiate Trial1, Trial2, , TrialFoodNumber by 0
10. Update the External Archives CA and DA

The RandInt (UBd−LBd)) generates a random number selected from a normal distribution in the range of 1 [i.e. Lower Bound (LB)] and number of classes [i.e. Upper Bound (UB)], and UBd and LBd are upper and lower bounds along the dth dimension, respectively.

4.2.3 Send Employed Bees

Algorithm 3 presents the pseudo-code of Send Employed Bee module of the TA-ABC algorithm. After random initialization of the food source (Population initialization), the employed bees are sent to search new food sources. For this, the employed bees use the history information stored in combined |CA+DA| archives. The main reason for using the external archive solutions is that it contains the best solutions found so far by the employed bees, and it may guide them towards better possible food sources. The main steps of the working process of employed bees are as follows: (1) Each of the employed bees searches a new food source with the help of food source stored in archives (Lines 1−4). (2) If the newly discovered food source is not the old food source, then the new food source is computed with the old food source using domination rank approach (Lines 5−6). (3) If the new food source dominates the old food source, then it replaces the old food source; otherwise, the old food source remains in the population, and its trial value is incremented by 1 (Lines 7−13).

Algorithm 3:

Pseudo-Code of the Send Employed Bees.

1. For i=1 to FoodNumber
2.  Select a random component d, d∈{1, 2,…, D} from food source ci ,
3.  Select a random food source k from archive |CA+DA|, ki∈{1, 2,…,(|CN+DA|)},
4.   vid =xkd ,
5.   If vici , then
6.    Calculate the objective functions of new food source: vi
7.    If the new food source vid dominates old food source ci
8.     Replace old food source ci with new food source vi
9.    Else
10.     Increment Triali by 1
11.    End If
12.  End If
13. End For

4.2.4 Send Onlooker Bees

Algorithm 4 presents the pseudo-code of the Send Onlooker Bees module. In the send employed bee module, all the employed bees search optimal food source using the information provided by the CA and DA external archives. After searching the optimal food source, all employed bees come to the hive and share their information about the newly discovered food source with onlooker bees waiting in the hive. The onlooker bees collect the information provided by the employed bee regarding the food sources. Based on the collected information, each onlooker bee needs to make a decision process for the selection of food sources. To perform this, the onlooker bees compute the selection probability pi of each food source ci using Eq. (10) for each food source provided by the corresponding employed bee.

(6)pi=fit(ci)m=1FoodNumberfit(cm)

The selection probability pi is the probability of the food source provided by the employed bee i which is proportional to the fitness of food source. To calculate the fitness of a food source advertised by employed bees, we use the quality indicator Iε+ given in IBEA [48]. Iε+ is an indicator that calculates the minimum distance that one food source (i.e. solution) requires in order to dominate other food sources in the objective space. The value of Iε+ between two solutions c1 and c2 is computed as follows:

(7)Iε+(c1,c2)=minε(fi(c1))εfi(c2),1im)

where m is the number of objectives. Using Eq. (7), we assign the fitness to each solution according to the following equation.

(8)fit(ci)=cjP\{xi}eIε+(cj,ci)/0.05

After computing the selection probability, the onlooker bees use the greedy technique to select a food source advertised by the employed bee. Further, each onlooker bee selects a food source from archive member randomly and performs the same steps as an employed bee has performed to update their current food source.

Algorithm 4:

Pseudo-Code of the Send Onlooker Bees.

1. Calculate probability value pi of each food source Ci based On Eq. (10)
2. For I=1 to FoodNumber
3.  If rand<pi , Then   /* Select ci employed bee to follow */
4.   Select a random component d, d∈{1, 2,…, D} from food source ci ,
5.   Select a random food source k from archive |CA+DA|, ki ∈ {1, 2,…,(|CA+DA|)},
6.    vid =xkd ,
7.    If vici , Then
8.     Calculate the objective functions of new food source: vi
9.     If the new food source vid dominates old food source ci
10.      Replace old food source Ci with new food source vi
11.     Else
12.      Increment Triali by 1
13.     End If
14.    End If
15.  End If
16. If i>FoodNumber, Then i=1  /* Reset the value of i */
17. End For

4.2.5 Send Scout Bees

At each cycle of the algorithm, the employed and onlooker bees search new food source around each old food source and evaluate them; if the old food source cannot be improved after a certain number of iterations, called MaxTrial, then the old food source is abandoned. In Send scout bee module, the algorithm sends scout bees for each abandoned food sources; the scout bees randomly search a new food source and replaces the abandoned food source if the newly generated food source dominates it. Otherwise, the old food source is kept in the population.

Algorithm 5:

Pseudo-Code of the Send Scout Bees.

1. If there exists some ci |{triali >t},
2.  Select one such ci randomly,
3.  For each component d, d∈{1, 2,…,D},
4.   vij =←RandInt (UBd−LBd))   /* LB=1 and UB=Total number of classes */
5.  End For
6.  Calculate the objective functions of new food source: vi
7.  Replace old food source ci with new food source vi
8.  Set triali =0,
9. End If

4.2.6 Update CA and DA Archives

To guide the employed and onlooker bees in a good direction, the TA-ABC algorithm uses the two external archives concepts inspired by the work presented in [44] to store the non-dominated solutions. These archives are convergence archive (CA) and diversity archive (DA) with variable size; however, the total size is fixed. Both CA and DA archives are updated as follows: (1) the algorithm first selects the non-dominated solutions from the population. (2) The selected non-dominated solutions are compared to the solutions stored in the CA and DA archives. (3) If the non-dominated solution is not dominated by any solution stored in CA or DA archive, then discard the solution (4). If the solution dominates any solution stored in CA or DA archives, then the dominated solution stored in CA and DA are removed. (5) If the solution is non-dominated with any solution stored in CA or DA archives, then add the solution in CA. (6) Finally, if the number of non-dominated solutions of both archive increases the total size of CA and DA, then delete the extra solutions from DA archive which have the minimal Euclidean distances to CA archive.

Algorithm 6:

Pseudo-Code of the Update the External Archive (CA and DA).

1. Collect FSnd the set of non-dominated food sources in current population   /* Addition Strategy */
2. fori=1 to|FSnd|do
3.  ifFSnd[i] is not dominated by any food source stored in either AC or DA archive, then
4.   ifFSnd[i] dominates any food source stored in either AC or DA archive, then
5.    The dominated food sources stored in AC and DA archive are removed
6.    Add the FSnd[i] to archive AC
7.   else
8.    Add the FSnd[i] to archive DA
9.   end if
10.   end if
11.  end if
12. end for
13. if |CA|+|DA|>limit then   /* Removal Strategy */
14.  Select a food source of DA with minimal Euclidean distances to CA archive.
15.  Delete the selected food source from DA archive.
16. end if

4.2.7 Termination

Each of the four modules (i.e. Send employed bee, Send onlooker bee, Send scout bee, and Update archive) of TA-ABC iterate cycle by cycle until the specified termination condition is reached. At the end of TA-ABC algorithm termination, the solutions stored in both CA and DA archives are returned as the output. In our implementation, the TA-ABC terminates after a predefined number of function evaluations same as in TAA and NSGA-II that have been used to solve M-SMCPs [43].

5 Experimental Setup

This section describes the experimental setup conducted to evaluate the proposed TA-ABC algorithm over 10 weighted and seven un-weighted MDGs with MCA and ECA multi-objective formulations. Further, an experiment is also performed to compare the results of the TA-ABC with the existing TAA algorithm and NSGA-II.

5.1 Test Problems

In this paper, varieties of MDGs of software systems with different characteristics are used. There are two types of MDGs (weighted and un-weighted) used to evaluate the proposed approach. Table 1 provides a brief description about the number of modules and links of MDGs of considered software systems. In un-weighted MDGs, each connection (link) represents the existence of a unidirectional variable or a method reference between two modules. In weighted MDGs each connection contains weights which are calculated according to the number of the unidirectional variables and method references between modules. Larger connection weights specify more interconnection strength between modules and increase probability that it should be placed in the same cluster.

Table 1:

Descriptions of Testing Problems [43].

Systems nameModulesLinks
Un-weighted
 Mtunis2057
 Ispell24103
 Rcs29163
 Bison37179
 Grappa86295
 Bunch116365
 Incl174360
Weighted
 Icecast60650
 gnupg88601
 inn90624
 bitchx971653
 xntp111729
 exim1181225
 Mod_ssl1351095
 ncurses138682
 lynx1481745
 nmh1983262

5.2 Research Questions

In our study, we evaluate the performance of our proposed TA-ABC approach for M-SMCPs by finding out whether it could generate the good modularization in terms of various structural quality metrics (i.e. MQ, coupling, and cohesion) compared to other existing algorithms. In addition to structural quality metrics, we also used the IGD [50], HV [49], spread performance metric [43], Pareto optimality [43], and execution time to compare the algorithms. The major goal of our study is to address the following research questions.

RQ1. MQ value as assessment criterion: How well does the proposed TA-ABC perform when compared against TAA and NSGA-II algorithms using the MQ as the assessment criterion?

RQ2. Coupling as assessment criterion: How well does the proposed TA-ABC perform when compared against TAA and NSGA-II algorithms in terms of coupling?

RQ3. Cohesion as assessment criterion: How well does the proposed TA-ABC perform when compared against TAA and NSGA-II algorithms in terms of cohesion?

RQ4. Pareto optimality as assessment criterion: How well does the TA-ABC algorithm perform at producing good approximations to the Pareto front compared to TAA and NSGA-II algorithms?

RQ5. IGD, hypervolume, and spread as assessment criterion: How well does the proposed TA-ABC perform when compared against TAA and NSGA-II algorithms in terms of IGD, HV, and spread as the assessment criterion?

Note that the IGD metric corresponds to the average Euclidean distance separating each reference solution (true Pareto front) from its closest non-dominated one (Pareto front obtained by the algorithm). For each studied software project, we use the set of Pareto optimal solutions produced by all algorithms over all runs as a true Pareto front.

5.3 Competitor Algorithms and Parameter Setup

This subsection provides a brief description about competitor algorithms with their parameter settings that have been used in this study. The TAA and NSGA-II are two popular algorithms which have been used to solve the M-SMCPs by the previous researchers [4, 5, 43]. In this paper, the results of the TA-ABC are compared with TAA and NSGA-II. The parameter settings of TAA and NSGA-II algorithms are the same as suggested in [5, 43]. Different search-based optimization approaches usually consume different amounts of fitness computations. To make a fair comparison between such meta-heuristic algorithms, an equal number of fitness function computations is allowed to each algorithm. The number of fitness evaluations (NFE) for the TA-ABC approach is computed using the following method: NFE≤(SN+SN+1)*MCN+SN, where SN and MCN is the number of onlooker bees and maximum number of iterations, respectively. The parameter for NSGA-II is the same as TAA. The parameter values of the algorithms are assigned according to the number of modules (N) in the problem instances. The crossover and mutation operators are single-point crossover and single-point mutation, respectively. The mutation probability is set as 0.004*log2 (N). The crossover probability is set as 0.8 for population size less than 100, otherwise 1.0. The maximum number of generations, population size, and total archive size is 200N, 10N, and 10N, respectively. The limit parameter for the TA-ABC is set as (D*PS)/2, where D and PS is the dimension of the problem and population size, respectively.

5.4 Collecting Results from Experiment

The search-based optimization algorithms are stochastic in nature; i.e. they can produce different values on each run. We collect the results of each algorithm on each MDG by executing 30 times, following the same approach as discussed in [5, 43].

6 Results and Analysis

This section illustrates the results obtained by TA-ABC for the solution of M-SMCPs and its comparison with current evolutionary multi-objective approaches (i.e. TAA and NSGA-II) that have already been used to solve the M-SMCPs. Each subsection addresses one of the four research questions given in Section 5.2.

6.1 The MQ Value as Assessment Criterion

This section presents the results of the experiments that answer the RQ1. To answer this research question, we compared the TA-ABC with TAA and NSGA-II algorithms over seven un-weighted and 10 weighted MDGs with MCA and ECA multi-objective formulations in terms of MQ values.

Table 2 presents the MQ values obtained by TA-ABC, TAA, and NSGA-II algorithms with MCA formulation.

Table 2:

Comparison of MQ Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with MCA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
 Mtunis2.3520.0122.2940.0132.1340.0870.012 [−]<0.001 [−]
 Ispell2.2580.0682.2690.0432.0750.0460.168 [≈]<0.001 [−]
 Rcs2.2960.0362.1450.0342.0620.034<0.001 [−]<0.001 [−]
 Bison2.2570.0512.4160.0382.1870.045<0.001 [+]<0.001 [−]
 Grappa12.8510.23511.5860.10610.4870.214<0.001 [−]<0.001 [−]
 Bunch11.7650.32112.1450.22510.6540.0250.013 [+]<0.001 [−]
 Incl12.8690.35611.8110.35110.5980.342<0.001 [−]<0.001 [−]
Weighted
 Icecast2.2160.0652.4010.0572.1580.054<0.001 [+]<0.001 [−]
 gnupg6.4180.0876.2590.0725.8640.044<0.001 [−]<0.001 [−]
 inn8.0260.0797.4210.0776.8750.053<0.001 [−]<0.001 [−]
 bitchx3.6020.0383.5720.0553.2540.0280.086 [≈]<0.001 [−]
 xntp6.8690.0616.4820.1106.1570.0890.034 [−]<0.001 [−]
 exim5.4580.1045.3160.1325.0240.067<0.001 [−]<0.001 [−]
 Mod_ssl9.8540.2548.8320.0978.7980.154<0.001 [−]<0.001 [−]
 ncurses11.5620.34610.2110.14510.1250.351<0.001 [−]<0.001 [−]
 lynx3.4810.0733.4470.0863.1450.0250.472 [≈]<0.001 [−]
 nmh6.9780.2576.6710.1776.6580.131<0.001 [−]<0.001 [−]

The 8th and 9th columns in the table denote the p-values (p-value below 0.05 is considered statistically significant). The symbols [−] denote that the result is significantly in favor of TA-ABC compared to corresponding approach, symbol [+] denotes opposite, and symbol [≈] is used when there is not a significant favor to any of the approaches. First, if we compare the MQ results of the TA-ABC approach with the TAA approach on un-weighted MDGs, the results show that the TA-ABC approach outperforms TAA approach in five MDGs out of seven MDGs. There are four cases where TA-ABC performs significantly better than TAA approach. Hence, there is good evidence to suggest that for un-weighted MDGs, the TA-ABC approach outperforms the TAA approach. Similarly, for weighted MDGs, the results provide sufficient evidence that the TA-ABC out performs the TAA approach over most of the cases except one case. That is, TA-ABC approach beats TAA approach in nine weighted MDGs, including seven in which the results are statistically significant. Second, if we compare the results of the TA-ABC approach with NSGA-II approach, the results show that the TA-ABC approach outperforms the NSGA-II approach for both weighted and un-weighted MDGs.

Table 3 presents the results obtained by TA-ABC, TAA, and NSGA-II with ECA formulation on weighted and un-weighted datasets. The results provided in Table 3 clearly indicate that the TA-ABC approach outperforms TAA and NSGA-II in most of the cases. The MQ results obtained by the TA-ABC approach and TAA approach over un-weighted MDG show that the TA-ABC approach outperforms the TAA approach in six MDGs out of seven MDGs. There are four cases where the TA-ABC approach performs significantly higher compared to TAA approach. However, for weighted software applications, the results indicate that the TA-ABC approach outperforms the TAA approach over most of the cases except one case. That is, TA-ABC approach performs significantly better than TAA approach in all seven software applications. The comparison results of the TA-ABC with NSGA-II approach shows that the TA-ABC approach outperforms NSGA-II approach in all cases for both weighted and un-weighted software applications.

Table 3:

Comparison of MQ Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with ECA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
 Mtunis2.1570.0212.3140.0001.7850.032<0.001 [−]<0.001 [−]
 Ispell2.3420.0362.3390.0221.9810.0840.127 [≈]<0.001 [−]
 Rcs2.1320.0122.2390.0221.7950.036<0.001 [−]<0.001 [−]
 Bison2.4580.0542.6480.0292.2350.054<0.001 [−]<0.001 [−]
 Grappa13.6870.14812.5780.05312.5210.052<0.001 [−]<0.001 [−]
 Bunch13.8970.34213.4550.08812.3250.2450.175 [≈]<0.001 [−]
 Incl13.4980.23413.5110.05912.6420.1340.103 [≈]<0.001 [−]
Weighted
 Icecast2.8420.0362.6540.0392.5610.061<0.001 [−]<0.001 [−]
 gnupg7.6210.0856.9050.0557.1560.035<0.001 [−]<0.001 [−]
 inn7.8370.0627.8760.0467.2640.0520.121 [≈]<0.001 [−]
 bitchx4.0360.0374.2670.0273.6470.026<0.001 [−]<0.001 [−]
 xntp8.9540.0648.1680.0768.2650.087<0.001 [−]<0.001 [−]
 exim6.3510.0766.3610.0845.8670.0680.108 [≈]<0.001 [−]
 Mod_ssl9.2580.0579.7490.0718.8710.102<0.001 [+]<0.001 [−]
 ncurses12.3250.12711.2970.13311.1350.141<0.001 [−]<0.001 [−]
 lynx4.9570.0794.6940.0604.5310.062<0.001 [−]<0.001 [−]
 nmh8.9640.1098.5920.1488.4390.075<0.001 [−]<0.001 [−]

6.2 Coupling as an Assessment Criterion

The coupling values obtained by TA-ABC, TAA, and NSGA-II approaches on un-weighted and weighted MDGs with MCA formulation are shown in Table 4 and with ECA formulation in Table 5. From Table 4 for un-weighted MDGs it can be observed that TA-ABC approach obtained higher values of coupling than TAA approach in all seven cases, out of which five cases are significantly in favor of TA-ABC. For weighted MDGs, the TA-ABC outperforms TAA in all 10 cases in which six cases are statistically significant. However, TA-ABC performs significantly better than NSGA-II in all cases for un-weighted as well as weighted MDGs.

Table 4:

Comparison of Coupling Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with MCA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
 Mtunis63.3913.25664.7334.18566.5613.4190.123 [≈]<0.001 [−]
 Ispell158.4581.025159.8006.440166.3811.0760.148 [≈]<0.001 [−]
 Rcs217.39112.361235.73330.669228.26112.9790.014 [−]<0.001 [−]
 Bison242.92517.564277.26716.463255.07118.442<0.001 [−]<0.001 [−]
 Grappa385.12519.261420.46722.380404.38120.224<0.001 [−]<0.001 [−]
 Bunch498.52513.256580.86716.648523.45113.9190.011 [−]<0.001 [−]
 Incl519.12522.153536.46728.048545.08123.261<0.001 [−]<0.001 [−]
Weighted
 Icecast7484.858368.2567636.200589.8437859.101386.6690.125 [≈]<0.001 [−]
 gnupg4191.188412.3575192.530335.6694400.747432.975<0.001 [−]<0.001 [−]
 inn5375.388389.2356176.730325.2605644.157408.697<0.001 [−]<0.001 [−]
 bitchx35837.36278.36535938.7005406.69737629.228292.2830.107 [≈]<0.001 [−]
 xntp3559.058356.1274460.400219.4453737.011373.9330.014 [−]<0.001 [−]
 exim11546.06835.54712347.4001127.56312123.363877.324<0.001 [−]<0.001 [−]
 Mod_ssl11137.16798.29412138.500621.96211694.018838.209<0.001 [−]<0.001 [−]
 ncurses2569.788124.5763071.130188.7852698.277130.805<0.001 [−]<0.001 [−]
 lynx22149.561234.56123150.9001726.01423257.0381296.2890.127 [≈]<0.001 [−]
 nmh19620.16562.37119921.500876.44020601.168590.4900.162 [≈]<0.001 [−]
Table 5:

Comparison of Coupling Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with ECA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
 Mtunis61.5572.23560.0000.00064.6352.3470.129 [≈]<0.001 [−]
 Ispell157.6241.234145.9335.595165.5051.296<0.001 [+]<0.001 [−]
 Rcs213.5579.345230.86715.719224.2359.812<0.001 [−]<0.001 [−]
Bison235.09116.237252.40012.434246.84617.049<0.001 [−]<0.001 [−]
Grappa376.29112.894387.66716.601395.10613.5390.122 [≈]<0.001 [−]
Bunch479.69118.239504.60010.611503.67619.151<0.001 [−]<0.001 [−]
Incl534.29115.765439.6007.673561.00616.5530.003 [+]<0.001 [−]
Weighted
Icecast7434.024325.1287569.670416.3787805.725341.384<0.001 [−]<0.001 [−]
gnupg4090.354384.2654413.670207.6604294.872403.478<0.001 [−]<0.001 [−]
inn4874.554368.8645046.200380.5265118.282387.307<0.001 [−]<0.001 [−]
bitchx34936.52356.19535546.8001266.13636683.346374.005<0.001 [≈]<0.001 [−]
xntp4358.224348.5623692.070109.0044576.135365.990<0.001 [+]<0.001 [−]
exim12145.22532.24812612.9001050.31012752.481558.8600.108 [≈]<0.001 [−]
Mod_ssl10136.321123.56111008.400488.34810643.1361179.739<0.001 [−]<0.001 [−]
ncurses2468.954132.6232607.270115.0302592.402139.254<0.001 [−]<0.001 [−]
lynx18148.721137.29520546.700956.03219056.1561194.160<0.001 [−]<0.001 [−]
nmh17819.32932.58418576.800473.56418710.286979.213<0.001 [−]<0.001 [−]

Table 5 presents the results comparing the performance of the TA-ABC approach and the TAA approach, and TA-ABC approach and NSGA-II approach with ECA formulation in terms of coupling measure. Coupling results shown in Table 5 for un-weighted MDGs indicate that the TA-ABC outperforms TAA in five cases, out of which three cases are statistically significant in favor of TA-ABC. For weighted MDGs, the TA-ABC outperforms TAA in nine cases out of which seven cases are statistically significant in favor of TA-ABC. However, the coupling results of the TA-ABC and NSGA-II show that the TA-ABC approach performs significantly better than NSGA-II in all cases for un-weighted and weighted MDGs.

6.3 Cohesion as an Assessment Criterion

This section compares TA-ABC algorithm with TAA and NSGA-II approach, i.e. how each of the multi-objective approaches performs in terms of cohesion as an assessment criterion using MCA and ECA formulations. Table 6 presents the cohesion results of the TA-ABC approach, TAA, and NSGA-II with MCA formulation. The cohesion results obtained from TA-ABC and TAA approaches over un-weighted MDGs given in Table 6 show that the TA-ABC outperforms TAA in all cases out of which five cases are significantly better. For weighted MDGs, the TA-ABC outperforms TAA in all cases out of which eight cases are significantly better. The cohesion results for TA-ABC and NSGA-II clearly show that the TA-ABC performs NSGA-II algorithm significantly better in all cases of weighted and un-weighted MDGs.

Table 6:

Comparison of Cohesion Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with MCA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
Mtunis25.3041.36524.6332.09223.7193.4190.112 [≈]<0.001 [−]
Ispell23.7712.15623.1003.22019.80951.0760.212 [≈]<0.001 [−]
Rcs54.30412.35445.13315.33548.86912.9790.013 [−]<0.001 [−]
Bison57.5383.85640.3678.23151.46518.442<0.001 [−]<0.001 [−]
Grappa102.4386.38984.76711.19092.8120.224<0.001 [−]<0.001 [−]
Bunch114.7385.68773.5678.324102.27513.9190.011 [−]<0.001 [−]
Incl100.43812.56891.76714.02487.4623.261<0.001 [−]<0.001 [−]
Weighted
Icecast1685.571158.3661609.900294.9211498.45386.6690.089 [≈]<0.001 [−]
gnupg1605.404135.8541104.733167.8341500.625432.975<0.001 [−]<0.001 [−]
inn1172.304201.361771.633162.6301037.92408.697<0.001 [−]<0.001 [−]
bitchx7695.3031532.5637644.6332703.3496799.369292.2830.102 [≈]<0.001 [−]
xntp1184.47188.356733.800109.7221095.495373.9330.017 [−]<0.001 [−]
exim3679.97512.3643279.300563.7813391.319877.324<0.001 [−]<0.001 [−]
Mod_ssl3412.403256.8452911.733310.9813133.974838.209<0.001 [−]<0.001 [−]
ncurses825.10469.325574.43394.392760.8595130.805<0.001 [−]<0.001 [−]
lynx2929.237632.8622428.567863.0072375.4981296.289<0.001 [−]<0.001 [−]
nmh2182.937363.5812032.267438.2201692.433590.490<0.001 [−]<0.001 [−]

Table 7 presents the results comparing the performance of {TA-ABC approach, TAA approach} and {TA-ABC approach, NSGA-II approach} in terms of cohesion measure using ECA formulation. First, if we compare the cohesion values of TA-ABC approach and TAA approach, the results shown in Table 7 indicate that the TA-ABC approach outperforms TAA in five cases out of the seven un-weighted MDGs, out of which three cases are significantly better. In weighted MDGs, the TA-ABC approach outperforms TAA in nine out of 10 cases, out of which seven cases are significantly better. Second, if we compare the TA-ABC approach and NSGA-II approach, the comparison results indicate that the TA-ABC approach significantly outperforms NSGA-II in all cases for weighted and un-weighted MDGs.

Table 7:

Comparison of Cohesion Values Obtained by TA-ABC, TAA, and NSGA-II Algorithm (with ECA Approach).

SystemsTA-ABCTAANSGA-IIp-Valuesp-Values
MeanSTDMeanSTDMeanSTDTA-ABC-TAATA-ABC-NSGA-II
Un-weighted
Mtunis26.2210.36127.0000.00024.6822.3470.106 [≈]<0.001 [−]
Ispell24.1882.23130.0332.79820.24751.296<0.001 [+]<0.001 [−]
Rcs56.22113.26547.5677.85950.8829.812<0.001 [−]<0.001 [−]
Bison61.4558.23652.8006.21755.577517.049<0.001 [−]<0.001 [−]
Grappa106.8559.356101.1678.30197.447513.5390.078 [≈]<0.001 [−]
Bunch124.15586.349111.7005.305112.162519.1510.007 [−]<0.001 [−]
Incl92.8556.348140.2003.83679.497516.5530.002 [+]<0.001 [−]
Weighted
Icecast1710.988123.4561643.167208.1891525.138341.384<0.001 [−]<0.001 [−]
gnupg1655.82188.3461494.167103.8301553.562403.478<0.001 [−]<0.001 [−]
inn1422.72136.1231336.900190.2631300.857387.307<0.001 [−]<0.001 [−]
bitchx8145.723563.2387840.600633.0687272.31374.0050.091 [≈]<0.001 [−]
xntp784.88855.2371117.96754.502675.9325365.990<0.001 [+]<0.001 [−]
exim3380.39235.6423146.567525.1553076.76558.8600.067 [≈]<0.001 [−]
Mod_ssl3912.823145.3263476.800244.1743659.4151179.739<0.001 [−]<0.001 [−]
ncurses875.52112.365806.36757.515813.797139.254<0.001 [−]<0.001 [−]
lynx4929.657213.0233730.633478.0164475.9391194.160<0.001 [−]<0.001 [−]
nmh3083.357142.6912704.600236.7822637.874979.213<0.001 [−]<0.001 [−]

6.4 Pareto Optimality as Assessment Criterion

This section compares the TA-ABC algorithm with TAA and NSGA-II in terms of how well each performs at producing good approximations to the Pareto front. Table 8 presents the dominance relationship for the results obtained from TA-ABC and TAA with both MCA and ECA formulations. This dominance relationship is used to compare any two solutions in multi-objective space. In this table, A denotes the TA-ABC with MCA, B denotes the TA-ABC with ECA, C denotes the TAA with MCA, and D denotes the TAA with ECA. The heading NXY indicates the number of solutions generated by approach X that are dominated by solutions produced by Y. In comparison, the approach X is better than approach Y if NXY is small and NYX is large.

Table 8:

Results of Dominated Comparison of TA-ABC Algorithm and TAA.

NABNBANACNCANADNDANBCNCBNBDNDB
Un-weighted
Mtunis28014231622030627
Ispell30025262421224525
Rcs30014211620025722
Bison30022141211524811
Grappa27021181716030427
Bunch26016131817425826
Incl30019151721521322
Weighted
Icecast30011181415822717
gnupg30017192221217621
inn30018252127129518
bitchx30014271722324826
xntp30021222221030918
exim300191716188241117
Mod_ssl30022252325127727
ncurses30016171722718528
lynx30014191214030526
nmh30021242024030224

Table 8 shows that the number of solutions produced by {TA-ABC with ECA} outperform {TA-ABC with MCA} in all of the problems studied. The {TA-ABC with MCA} outperforms {TAA with MCA} for un-weighted problems (four out of seven problems), while, in weighted systems, the {TA-ABC with MCA} outperforms the {TAA with MCA} in only one problem. The {TA-ABC with MCA} outperforms {TAA with ECA} for un-weighted problems (three out of seven problems), while in weighted systems, the {TA-ABC with ECA} outperforms the {TAA with MCA} in eight out of 10 problems. The {TA-ABC with ECA} outperforms {TAA with MCA} in all un-weighted and weighted problems. Similarly, the {TA-ABC with ECA} outperforms {TAA with ECA} in all un-weighted and weighted problems. These findings taken together indicate that the TA-ABC is better than TAA.

Table 9 presents the dominance relationship for the results obtained from TA-ABC and NSGA-II with both MCA and ECA formulations. In this table, P denotes the TA-ABC with MCA, Q denotes the TA-ABC with ECA, R denotes the NSGA-II with MCA, and S denotes the NSGA-II with ECA.

Table 9:

Results of Dominated Comparison of TA-ABC Algorithm and NSGA-II.

NPQNQPNPRNRPNPSNSPNQRNRQNQSNSQ
Un-weighted
Mtunis28026211523128725
Ispell30022242326325526
Rcs30015241522025728
Bison30023151013523516
Grappa27020171716027427
Bunch26016131817425822
Incl30018141721421624
Weighted
Icecast30012181216721619
gnupg30017192221217726
inn30019242127128514
bitchx30014271823524826
xntp300221824220301017
exim300121719186221117
Mod_ssl30022252014127725
ncurses30019172322722629
lynx30014191714130522
nmh30021242413030321

Table 9 shows that the number of solutions produced by {TA-ABC with ECA} outperforms {TA-ABC with MCA} in all of the problems studied. The {TA-ABC with MCA} outperforms {NSGA-II with MCA} for un-weighted problems (five out of seven problems), while in weighted systems, the {TA-ABC with MCA} outperforms {NSGA-II with MCA} in only two problems. The {TA-ABC with MCA} outperforms {NSGA-II with ECA} for un-weighted problems (two out of seven problems), while in weighted systems, the {TA-ABC with ECA} outperforms the {NSGA-II with MCA} in seven of 10 problems. The {TA-ABC with ECA} outperforms {NSGA-II with MCA} in all un-weighted and weighted problems. Similarly, the {TA-ABC with ECA} outperforms {NSGA-II with ECA} in all un-weighted and weighted problems. These findings taken together indicate that the TA-ABC is better than NSGA-II.

6.5 IGD, Hypervolume, and Spread as Assessment Criteria

In the previous sections, we compared our TA-ABC algorithm with other existing algorithms (i.e. TAA and NSGA-II) in terms of structural quality metrics (i.e. MQ, coupling, and cohesion) and Pareto optimality. In this section, we compare the TA-ABC algorithm with existing algorithms in terms of IGD, HV, and spread values for both MCA and ECA formulations. The symbol [−] denotes that the result is significantly in favor of TA-ABC compared to corresponding approach, symbol [+] denotes opposite, and symbol [≈] is used when there is not a significant favor to any of the approaches. Table 10 presents IGD values of the results obtained through TA-ABC, TAA, and NSGA-II over weighted and un-weighted software projects. Tables 11 and 12 report the statistics of the HV and spread of the results obtained through TA-ABC, TAA, and NSGA-II algorithms, respectively. For the MCA and ECA formulations, IGD statistics given in Table 10 indicates that TA-ABC outperforms other algorithms in most of the cases for weighted and un-weighted MDGs. Additionally, TAA seems to be better than NSGA-II in most of the cases. The results based on the HV metric show that TA-ABC performs better than A-TAA and NSGA-II in most of the cases. Results in Table 12 indicate that the spread values achieved with the Pareto front generated by the TA-ABC algorithm is lower than those of TAA and NSGA-II, and in most of the cases it has the significantly lower values.

Table 10:

The Statistics of IGD Metric Values Obtained at 30 Runs of TA-ABC, TAA, and NSGA-II Algorithms with MCA and ECA.

SystemsMCAECA
TA-ABCTAANSGA-IITA-ABCTAANSGA-II
Un-weighted
Mtunis2.737×10−42.741×10−4 [≈]3.261×10−4 [−]2.534×10−42.538×10−4 [≈]4.652×10−4 [−]
Ispell3.891×10−33.958×10−3 [−]4.184×10−3 [≈]3.653×10−33.738×10−3 [−]4.142×10−3 [−]
Rcs4.486×10−34.493×10−3 [≈]4.274×10−3 [+]4.278×10−34.376×10−3 [−]4.678×10−3 [−]
Bison4.103×10−44.194×10−4 [−]4.229×10−4 [−]4.103×10−44.104×10−4 [≈]4.106×10−4 [≈]
Grappa5.912×10−46.052×10−4 [−]5.232×10−4 [−]5.768×10−45.125×10−4 [−]5.212×10−4 [−]
Bunch6.192×10−46.365×10−4 [−]6.413×10−4 [−]6.192×10−46.365×10−3 [−]6.324×10−3 [−]
Incl5.987×10−36.172×10−3 [−]6.215×10−3 [−]6.001×10−36.218×10−3 [−]6.187×10−3 [−]
Weighted
Icecast7.786×10−37.945×10−3 [−]7.776×10−3 [≈]7.674×10−37.743×10−3 [−]8.242×10−3 [−]
gnupg5.476×10−35.489×10−3 [≈]5.271×10−3 [+]5.274×10−35.386×10−3 [−]5.671×10−3 [−]
inn4.103×10−44.194×10−4 [−]4.229×10−4 [−]5.526×10−35.533×10−4 [≈]6.261×10−4 [−]
bitchx6.912×10−47.062×10−4 [−]7.212×10−4 [−]7.683×10−37.734×10−3 [−]8.241×10−3 [−]
xntp6.192×10−36.365×10−3 [−]6.413×10−3 [−]5.276×10−35.386×10−3 [−]5.671×10−3 [−]
exim7.987×10−38.172×10−3 [−]8.215×10−3 [−]5.526×10−45.533×10−4 [≈]6.261×10−4 [−]
Mod_ssl5.737×10−45.740×10−4 [≈]6.283×10−4 [−]7.683×10−37.734×10−3 [−]8.241×10−3 [−]
ncurses6.128×10−46.736×10−4 [−]7.128×10−4 [−]6.122×10−46.647×10−4 [−]7.123×10−4 [−]
lynx7.008×10−37.692×10−3 [−]8.314×10−3 [−]8.123×10−38.612×10−3 [−]9.341×10−4 [−]
nmh5.121×10−35.734×10−3 [−]6.502×10−3 [−]5.012×10−35.398×10−3 [−]6.328×10−3 [−]
Table 11:

The Statistics of HV Metric Values Obtained at 30 Runs of TA-ABC, TAA, and NSGA-II Algorithms with MCA and ECA.

SystemsMCAECA
TA-ABCTAANSGA-IITA-ABCTAANSGA-II
Un-weighted
Mtunis0.27180.1515 [−]0.2755 [≈]0.52420.3286 [−]0.5215 [≈]
Ispell0.43350.3232 [−]0.2115 [−]0.45790.0821 [≈]0.6846 [+]
Rcs0.53810.3370 [−]0.6701 [+]0.72010.1429 [−]0.5677 [−]
Bison0.51220.0745 [−]0.3821 [−]0.39990.4939 [+]0.1717 [−]
Grappa0.51700.5155 [≈]0.1524 [−]0.49910.3267 [−]0.3483 [−]
Bunch0.28800.3168 [+]0.0393 [−]0.76160.6015 [−]0.3533 [−]
Incl0.50640.2263 [−]0.3094 [−]0.50650.3518 [−]0.0794 [−]
Weighted
Icecast0.62110.5946 [−]0.1373 [−]0.42040.4212 [≈]0.2488 [−]
gnupg0.53020.6758 [+]0.4217 [−]0.30850.1274 [−]0.0515 [−]
inn0.95590.1015 [−]0.0582 [−]0.87670.2050 [−]0.2077 [−]
bitchx0.64590.2439 [−]0.2925 [−]0.52720.1598 [−]0.1477 [−]
xntp0.74920.0465 [−]0.2305 [−]0.64130.5502 [−]0.7332 [+]
exim0.12440.2125 [+]0.2524 [+]0.65430.1295 [−]0.5680 [−]
Mod_ssl0.05640.0519 [≈]0.1041 [+]0.65620.4352 [−]0.2538 [−]
ncurses0.73730.3349 [−]0.3827 [−]0.35980.4141 [+]0.1107 [−]
lynx0.37030.2205 [−]0.1058 [−]0.71160.3318 [−]0.6337 [−]
nmh0.69660.6898 [≈]0.2562 [−]0.58190.3474 [−]0.3625 [−]
Table 12:

The Statistics of Spread Metric Values Obtained at 30 Runs of TA-ABC, TAA, AND NSGA-II Algorithms with MCA and ECA.

SystemsMCAECA
TA-ABCTAANSGA-IITA-ABCTAANSGA-II
Un-weighted
Mtunis0.01290.0646 [−]0.0939 [−]0.03320.1000 [−]0.3728 [−]
Ispell0.10810.2548 [−]0.2822 [−]0.08020.0778 [≈]0.1734 [−]
Rcs0.02490.0257 [≈]0.1897 [−]0.05100.0866 [−]0.1515 [−]
Bison0.18360.3951 [−]0.2836 [−]0.15210.8278 [−]0.6633 [−]
Grappa0.02580.0128 [+]0.1523 [−]0.35580.4608 [−]0.1117 [+]
Bunch0.08580.1044 [−]0.0145 [+]0.04280.1778 [−]0.1601 [−]
Incl0.34510.2906 [−]0.2747 [−]0.08740.3231 [−]0.1544 [−]
Weighted
Icecast0.16770.2452 [−]0.2873 [−]0.21670.7026 [−]0.3508
gnupg0.01590.0763 [−]0.0860 [−]0.07830.2372 [−]0.3109 [−]
inn0.12420.2371 [−]0.4045 [−]0.41040.7794 [−]0.8934 [−]
bitchx0.18270.2756 [−]0.1841 [≈]0.25400.6059 [−]0.4622 [−]
xntp0.29710.5802 [−]0.6537 [−]0.05300.4223 [−]0.1450 [−]
exim0.19200.5939 [−]0.3607 [−]0.10170.4426 [−]0.3881 [−]
Mod_ssl0.35690.7684 [−]0.4207 [−]0.04410.8191 [−]0.3010 [−]
ncurses0.17810.2960 [−]0.0386 [+]0.02920.6537 [−]0.4755 [−]
lynx0.06460.0613 [≈]0.0925 [−]0.15330.1994 [−]0.6796 [−]
nmh0.10440.2977 [−]0.6147 [−]0.20780.2594 [−]0.3948 [−]

7 Discussions

This section discusses the contributions and implications of our TA-ABC for M-SMCPs. The main contribution of the proposed TA-ABC approach with respect to the existing approaches on software module clustering (TAA and NSGA-II) is that this paper integrates the external archive concepts of TAA algorithm into the ABC algorithm so that balanced exploration and exploitation is achieved for more than three objective functions. The experimental results showed that the proposed TA-ABC approach performed better compared to existing approaches in terms of MQ, coupling, cohesion, Pareto optimality, and IGD values in most of the cases. In this study, we observed that the following helped in improving the quality of software systems in terms of MQ, coupling, cohesion, Pareto optimality, and IGD:

  • The original TAA fails to maintain the diversity in case of all the solutions in CA are on the true Pareto front and the size of the CA has reached the limit of the union of DA and CA. In such situation there is no space available for any additional member of DA. The reason is that the CA does not maintain the diversity; it only maintains the convergence. However, to achieve a good balance between the diversity and convergence, TA-ABC algorithm maintains the diversity in CA in the case when the CA has reached the limit.

  • Similarly, the updating strategy for CA and DA in TA-ABC algorithm generates good Pareto optimal solutions compared to TAA algorithm. The main reason for generating such good Pareto optimal solutions by the TA-ABC is that the approach is designed to produce good approximations to the Pareto front.

To conclude, we found that our approach produces good software clustering in terms of MQ, coupling, cohesion, and Pareto optimality in most of the cases compared to existing algorithms.

8 Threats to Validity

To explain the limitations and strengths of our proposed approach, we explore the factors that could affect the validity of the results obtained by TA-ABC. In this paper, we considered two major categories of threats (i.e. external validity and internal validity) that could affect the validation of results. External validity (or selection validity) concerns the degree to which the findings (i.e. results sample) of the approach can be generalized to the wider classes of problems. In search-based software engineering, this is a very important threat to the validity of findings because of a large number of diverse software systems available to any study. In our experimentation, this threat to validity has been mitigated by the fact that the proposed approach is concerned with MDG, an abstract representation of software systems. Since there is a many to one relation between the software systems and MDG (i.e. many individual software systems can map into a single MDG), the findings of a set of MDGs of a particular size is relevant to wider MDGs. In order to mitigate the possible external threats to validity, the experimentation uses the various size of MDGs, both un-weighted and weighted.

Internal validity is the degree to which conclusions can be drawn about the causal effect of independent variables on the dependent variables [18]. In this empirical study, the choice of statistical test (i.e. two-tailed t-test) was made to support the comparability with other existing studies [28, 43]. The t-test is more appropriate to data with normal distribution. However, studies [4, 14, 39] suggest that the t test is robust, even in the presence of non-normal distributed and significantly skewed data, if the sample sizes are sufficiently large as our empirical study.

9 Conclusions and Future Works

This paper presented a TA-ABC approach to address M-SMCPs. For this, the original ABC algorithm has been redesigned as multi-objective ABC algorithm by integrating the concept of external archives. The TA-ABC has been applied to solve M-SMCPs with two well-known multi-objective formulations of software clustering domain (ECA and MCA). The performance of the TA-ABC has been evaluated on two datasets obtained through different alternates: weighted MDGs and un-weighted MDGs. The results of the TA-ABC have been compared with the results reported in the literature. The five main quality criteria (i.e. MQ, coupling, cohesion, Pareto optimality, IGD, HV, and spread performance metric) have been used to assess the quality of the obtained clustering solutions. The results clearly reveal that TA-ABC is able to obtain better clustering solutions in terms of MQ, coupling, cohesion, Pareto optimality, IGD, HV, and spread performance metric. Hence, TA-ABC approach can be very useful to solve clustering problem of software and thus can help software managers in the better management of the software. In a future study, we will customize other meta-heuristic algorithms such as MOPSO, MODE, MOABC, MOSOS, etc. to address the M-SMCPs.

Bibliography

[1] H. Abdeen, S. Ducasse, H. A. Sahraoui and I. Alloui, Automatic package coupling and cycle minimization, in: Proceedings of the 16th Working Conference on Reverse Engineering, France, pp. 103–112, 2009.10.1109/WCRE.2009.13Search in Google Scholar

[2] P. Amarjeet and J. K. Chhabra, Harmony search based remodularization for object-oriented software systems, Comput. Lang. Syst. Struct. 47 (2017), 153–169.10.1016/j.cl.2016.09.003Search in Google Scholar

[3] P. Amarjeet and J. K. Chhabra, Improving modular structure of software system using lexical and structural dependencies. Inform. Software Tech. 82 (2017), 96–120.10.1016/j.infsof.2016.09.011Search in Google Scholar

[4] P. Amarjeet and J. K. Chhabra, Improving package structure of object-oriented software using multi-objective optimization and weighted class connections, J. King Saud U. Comput. Inf. Sci., in press. Available online 2 November 2015.10.1016/j.jksuci.2015.09.004Search in Google Scholar

[5] M. Barros, An analysis of the effects of composite objectives in multi-objective software module clustering, in: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation, USA, pp. 1205–1212, 2012.Search in Google Scholar

[6] V. R. Basil and A. J. Turner, Iterative enhancement: a practical technique for software development, IEEE T. Software Eng. 1 (1975), 390–396.10.1109/TSE.1975.6312870Search in Google Scholar

[7] J. K. Chhabra, K. K. Aggarwal and Y. Singh, Code and data spatial complexity: two important software understandability measures, Inform. Software Tech. 45 (2003), 539–546.10.1016/S0950-5849(03)00033-8Search in Google Scholar

[8] J. K. Chhabra, K. K. Aggarwal and Y. Singh, Measurement of object-oriented software spatial complexity, Inform. Software Tech. 46 (2004), 689–699.10.1016/j.infsof.2004.01.001Search in Google Scholar

[9] L. L. Constantine and E. Yourdon, Structured Design, Prentice Hall, USA, 1979.Search in Google Scholar

[10] E. Cotilla-Sanchez, P. D. H. Hines, C. Barrows, S. Blumsack and M. Patel, Multi-attribute partitioning of power networks based on electrical distance, IEEE T. Power Syst. 28 (2013), 4979–4987.10.1109/TPWRS.2013.2263886Search in Google Scholar

[11] S. S. Dahiya, J. K. Chhabra and S. Kumar, Application of artificial bee colony algorithm to software testing. in: 2010 21st Australian Software Engineering Conference, Auckland, pp. 149–154, 2010.10.1109/ASWEC.2010.30Search in Google Scholar

[12] K. Deb and H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints, IEEE T. Evolut. Comput.18 (2004), 577–601.10.1109/TEVC.2013.2281535Search in Google Scholar

[13] K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A fast and elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II, IEEE T. Evolut. Comput. 6 (2002), 182–197.10.1109/4235.996017Search in Google Scholar

[14] L. Devroye, Non-Uniform Random Variate Generation. Springer-Verlag, USA, 1986.10.1007/978-1-4613-8643-8Search in Google Scholar

[15] D. Doval, S. Mancoridis, B. S. Mitchell, Automatic clustering of software systems using a genetic algorithm, in: Proceedings of IEEE Conference on Software Technology and Engineering Practice, USA, pp. 73–81, 1999.Search in Google Scholar

[16] A. E. Ezugwu, N. A. Okoroafor, S. M. Buhari, M. E. Frincu, S. B. Junaidu, Grid resource allocation with genetic algorithm using population based on multisets, J. Intell. Syst. 26 (2017), 169–184.10.1515/jisys-2015-0089Search in Google Scholar

[17] A. Farrugia, Vertex-partitioning into fixed additive induced hereditary properties is NP-hard, Electron. J. Combin. 11 (2004), 1–9.10.37236/1799Search in Google Scholar

[18] M. Genero, J. Olivas, M. Piattini and F. Romero, Using metrics to predict OO information systems maintainability, in: Proceedings of the 13th International Conference on Advanced Information Systems Engineering (CAiSE’01), Springer-Verlag, London, UK, pp. 388–401, 2001.10.1007/3-540-45341-5_26Search in Google Scholar

[19] D. Gong, J. Sun and X. Ji, Evolutionary algorithms with preference polyhedron for interval multi-objective optimization problems. Inform. Sciences233 (2013), 141–161.10.1016/j.ins.2013.01.020Search in Google Scholar

[20] V. Gupta and J. K. Chhabra, Package level cohesion measurement in object-oriented software, J. Braz. Comput. Soc. 18 (2011), 251–266.10.1007/s13173-011-0052-4Search in Google Scholar

[21] M. Harman, R. Hierons and M. Proctor, A new representation and crossover operator for search-based optimization of software modularization, in: Proceedings of the Genetic and Evolutionary Computation Conference, USA, pp. 1351–1358, 2002.Search in Google Scholar

[22] J. Harris, J. Hirst and M. Mossinghoff, Combinatorics and Graph Theory, Springer, New York, pp. 212–237, 2000.10.1007/978-1-4757-4803-1Search in Google Scholar

[23] H. A. Hashim, B. O. Ayinde and M. A. Abido, Optimal placement of relay nodes in wireless sensor network using artificial bee colony algorithm, J. Netw. Comput. Appl. 64 (2016), 239–248.10.1016/j.jnca.2015.09.013Search in Google Scholar

[24] S. D. Hester, D. L. Parnas and D. F. Utter, Using documentation as a software design medium. Bell Syst. Technol. J. 60 (1981), 1941–1977.10.1002/j.1538-7305.1981.tb00304.xSearch in Google Scholar

[25] H. T. Jadhav and P. D. Bamane, Temperature dependent optimal power flow using g-best guided artificial bee colony algorithm, Int. J. Elec. Power Energy Syst. 77 (2016), 77–90.10.1016/j.ijepes.2015.11.026Search in Google Scholar

[26] D. Karaboga, An Idea Based on Honey Bee Swarm for Numerical Optimization, Technical Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department, 2005.Search in Google Scholar

[27] V. Kumar, J. K. Chhabra and D. Kumar, Grey wolf algorithm-based clustering technique, J. Intell. Syst. 26 (2017), 153–168.10.1515/jisys-2014-0137Search in Google Scholar

[28] A. C. Kumari and K. Srinivas, Hyper-heuristic approach for multi-objective software module clustering, J. Syst. Software117 (2016), 384–401.10.1016/j.jss.2016.04.007Search in Google Scholar

[29] A. C. Kumari, K. Srinivas and M. P. Gupta, Software module clustering using a hyper-heuristic based multi-objective genetic algorithm. Advance Computing Conference (IACC), 2013 IEEE 3rd International, Ghaziabad, pp. 813–818, 2013.10.1109/IAdCC.2013.6514331Search in Google Scholar

[30] M. M. Lehman, On understanding laws, evolution, and conservation in the large-program life cycle, J. Syst. Softw. 1 (1980), 213–22.10.1016/0164-1212(79)90022-0Search in Google Scholar

[31] X. Li and G. Yang, Artificial bee colony algorithm with memory, Appl. Soft Comput. 41 (2016), 362–372.10.1016/j.asoc.2015.12.046Search in Google Scholar

[32] K. Mahdavi, M. Harman and R. M. Hierons, A multiple hill climbing approach to software module clustering, in: Proceedings of the International Conference on Software Maintenance, Netherlands, pp. 315–324, 2003.Search in Google Scholar

[33] A. S. Mamaghani and M. R. Meybodi, Clustering of software systems using new hybrid algorithms, in: Proceedings of the Ninth IEEE International Conference on Computer and Information Technology (CIT’09), vol. 1, Bangladesh, 2009.10.1109/CIT.2009.111Search in Google Scholar

[34] S. Mancoridis, B. S. Mitchell, Y.-F. Chen and E. R. Gansner, Bunch: a clustering tool for the recovery and maintenance of software system structures, in: Proceedings of the IEEE International Conference on Software Maintenance, UK, pp. 50–59, 1999.10.1109/ICSM.1999.792498Search in Google Scholar

[35] S. Mancoridis, B. S. Mitchell, C. Rorres, Y. F. Chen and E. R. Gansner, Using automatic clustering to produce high-level system organizations of source code, in: Proceedings of the International Workshop on Program Comprehension, Italy, pp. 45–53, 1998.Search in Google Scholar

[36] O. Maqbool and H. A. Babri, Hierarchical clustering for software architecture recovery, IEEE T. Software Eng. 33 (2007), 759–780.10.1109/TSE.2007.70732Search in Google Scholar

[37] B. S. Mitchell, A heuristic search approach to solving the software clustering problem. PhD. dissertation, Drexel University, USA, 2002.Search in Google Scholar

[38] B. S. Mitchell and S. Mancoridis, Using heuristic search techniques to extract design abstractions from source code, in: Proceedings of the Genetic and Evolutionary Computation Conference, USA, pp. 1375–1382, 2002.Search in Google Scholar

[39] L. E. Moses, Think and Explain with Statistics, Addison-Wesley, USA, 1986.Search in Google Scholar

[40] M. Ó Cinnéide, L. Tratt, M. Harman, S. Counsell and I.-H. Moghadam, Experimental assessment of software metrics using automated refactoring, in: ESEM’12, Sweden, pp.49–58, 2012.10.1145/2372251.2372260Search in Google Scholar

[41] V. Plevris and M. Papadrakakis, A hybrid particle swarm – gradient algorithm for global structural optimization, Comput. Aided Civ. Inf. Eng. 26 (2011), 48–68.10.1111/j.1467-8667.2010.00664.xSearch in Google Scholar

[42] K. Praditwong, Solving software module clustering problem by evolutionary algorithms, in: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhon Pathom, pp. 154–159, 2011.10.1109/JCSSE.2011.5930112Search in Google Scholar

[43] K. Praditwong, M. Harman and X. Yao, Software module clustering as a multi-objective search problem, IEEE T. Software Eng. 37 (2011), 264–282.10.1109/TSE.2010.26Search in Google Scholar

[44] K. Praditwong and X. Yao, A new multi-objective evolutionary optimization algorithm: the two-archive algorithm, in: Proceedings of the International Conference on Computational Intelligence and Security, vol 1, Hong Kong, pp. 286–291, 2006.10.1109/ICCIAS.2006.294139Search in Google Scholar

[45] P. Prashanth, K. K Pattanaik and P. Singh, BAT and hybrid BAT meta-heuristic for quality of service-based web service selection. J. Intell. Syst. 26 (2017), 123–137.10.1515/jisys-2015-0032Search in Google Scholar

[46] A. Ramírez, J. R. Romero and S. Ventura, An approach for the evolutionary discovery of software architectures, Inform. Sciences305 (2015), 234–255.10.1016/j.ins.2015.01.017Search in Google Scholar

[47] Q. Zhang and H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE T. Evolut. Comput. 11 (2007), 712–731.10.1109/TEVC.2007.892759Search in Google Scholar

[48] E. Zitzler and S. Künzli, Indicator-Based Selection in Multi-objective Search, Parallel Problem Solving from Nature – PPSN VIII, pp. 832–842, Springer, Berlin, Germany, 2004.10.1007/978-3-540-30217-9_84Search in Google Scholar

[49] E. Zitzler and L. Thiele, Multiobjective optimization using evolutionary algorithms – a comparative case study, in: Conference on Parallel Problem Solving from Nature (PPSN V), pp. 292–301, 1998.10.1007/BFb0056872Search in Google Scholar

[50] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca and V. G. da Fonseca, Performance assessment of multi-objective optimizers: an analysis and review, IEEE T. Evolut. Comput. 7 (2003), 117–132.10.1109/TEVC.2003.810758Search in Google Scholar

Received: 2016-10-20
Published Online: 2017-05-04
Published in Print: 2018-10-25

©2018 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 29.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2016-0253/html
Scroll to top button