Home Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
Article Open Access

Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy

  • V Resmi EMAIL logo and S Vijayalakshmi
Published/Copyright: June 27, 2019
Become an author with De Gruyter Brill

Abstract

In the discipline of software development, effort estimation renders a pivotal role. For the successful development of the project, an unambiguous estimation is necessitated. But there is the inadequacy of standard methods for estimating an effort which is applicable to all projects. Hence, to procure the best way of estimating the effort becomes an indispensable need of the project manager. Mathematical models are only mediocre in performing accurate estimation. On that account, we opt for analogy-based effort estimation by means of some soft computing techniques which rely on historical effort estimation data of the successfully completed projects to estimate the effort. So in a thorough study to improve the accuracy, models are generated for the clusters of the datasets with the confidence that data within the cluster have similar properties. This paper aims mainly on the analysis of some of the techniques to improve the effort prediction accuracy. Here the research starts with analyzing the correlation coefficient of the selected datasets. Then the process moves through the analysis of classification accuracy, clustering accuracy, mean magnitude of relative error and prediction accuracy based on some machine learning methods. Finally, a bio-inspired firefly algorithm with fuzzy analogy is applied on the datasets to produce good estimation accuracy.

1 Introduction

The need for software project effort prediction has been increasing for the last 20 years. The predicted effort is used to find the overall cost and duration of the project. This prediction may lead to either underestimation or overestimation [5]. If it is over or under, it causes several problems in the business plans of the company. Especially, it causes several budgeting problems and schedule slippage [24].

The first thought of software effort estimation came with the presentation of the rule of thumb [13] during the 1950s. Thereafter in the 1960s, a new approach for software effort estimation was unveiled as the consequence of an expert judgment where domain experts applied their prior experiences to discern the effort of the new project [22]. The existing representations on linear equations and regression analysis were proposed [6] in 1965. The first automated tool for effort estimation was Interactive Productivity and Quality [13] established by the IBM researchers. Subsequently, Barry Boehm put forward a new mathematical model based on the regression analysis named COCOMO (COst COnstructive MOdel). This model predicts the software project effort based on the type of project. Ultimately, he propounded another model named COCOMO II which was an augmented version of COCOMO [5]. Furthermore, the models such as Putnam’s Software Lifecycle Management [24], Software Evaluation and Estimation of Resources – Software Estimating Model [6] and Function Point (FP) by Albrecht were also used for effort prediction [1]. Analogy-based estimation (ABE) was fostered in the year 1997 [27] as a comparative method.

Estimation by analogy is one form of expert judgment and it is also known as top-down estimating which mainly determines the duration to finish the project. Analogous estimating uses similar past projects’ historical data to estimate the duration or cost of a current project, thus the term used analogy.

ABE put the estimated project alongside with the already completed projects on the basis of a measure, as an uncomplicated process. It distinguished adjacent analogies on account of similarity [3], [16], [20]. It deploys distance measures. Distance measure bestows how closely one project is tantamount to other projects. Each of the attributes’ values taken for effort estimation is applied to the distance measure to descry how contiguous one object is to another. Henceforth, similar data objects are aggregated. Under the aegis of similar data objects, software project effort is estimated [10], [25].

The machine learning method of estimation has been popular for the last two decades, because machine–learning-based estimation gives more accurate results when compared with the previous two methods [12]. The machine learning method uses artificial intelligence-based techniques to give better results.

The software project effort estimation was really complicated throughout the rudimentary stages of software development. To provide more veracious results, the experience of the erstwhile project effort estimation attributes is taken into consideration. On these attributes, mining techniques are adapted to procure the effort prediction for the current project.

Predominantly, data mining bestows as a method to turn raw data into profitable and intelligent information. It has numerate functionalities [10], one among which is clustering. Clustering pertains to the grouping of data objects. Clustering follows unsupervised learning where class labels are not used. Preferably, it procreates labels for data objects. The objects are grouped or clustered based on the principle of minimizing the interclass similarity and maximizing the intraclass similarity. Particularly, all objects of that cluster are similar once it is formed. But data objects from other clusters are dissimilar. Clustering is otherwise known as data segmentation because clustering allocates large datasets into groups on the basis of their similarity [25]. Here how clusters of different methods improve the accuracy of the effort estimation is the main core of this paper.

2 Related Work

Estimation based on analogy compares the estimated project with the already completed projects based on some measures. Here the measurement is mostly distance measures. The distance measure is used to find how closely one project is related to other projects. In the initial stages of software development, software project effort estimation is very difficult. To get the more accurate results, the experience of the previous projects’ effort estimation attributes is taken into consideration. On these, attributes mining techniques are applied to get the effort prediction for the current project.

Scarcely there is any model which estimates the software project effort for all domains and all kinds of applications. It is on the basis of existing models that the new models are proposed. To bring forth the effort of the new one, analogy-based estimation confronts the completed projects. Eventually, Khatibi et al. [18] contemplated a novel idea of a framework to combine analogy-based effort estimation and neural networks to ameliorate the accuracy of effort prediction. Then Humayun and Gang [12] assured that machine learning methods give us more accurate effort estimation as compared to the traditional methods of effort estimation.

Malathi and Sridhar [21] proposed an approach based on fuzzy logic, linguistic quantifiers and analogy-based reasoning. Their main aim was to enhance the performance of the effort estimation in software projects while dealing with numerical and categorical data. Azzeh and Nassif [4] together proposed a new method to discover the most prudent set of analogies from dataset characteristics to support the different size of datasets that have a lot of categorical features. Also, Prabhakar and Dutta [23] advocated a comparative study on artificial neural network (ANN) and support vector machine for predicting the software effort.

Araujo et al. [2] presented a multilayer dilation-erosion-linear perceptron (MDELP) model to solve problems in effort estimation. They used hybrid morphological operator and a linear operator to solve problems. Kaushik et al. [15] combined fuzzy inference system and cuckoo optimization (COA-FIS) for showing improved accuracy in software cost estimation.

The performance of the cluster subtrees is better than cluster supertrees as congenial to the studies of Kocaguneli et al. [19]. The performance of the analogy-based effort estimation can be improved by selecting project data from regions with small variance.

The hybrid method was planned by Khatibi et al. [17] to reduce the inconsistent project which leads to attaining higher accuracy for effort estimation. The similar projects were obtained in the different clusters through the C-means clustering technique. These clusters comprise the reliable and appropriate projects to estimate the development effort which are suitable to be employed by the ABE and ANN methods. The Fuzzy-class point (FCP) approach was intended by Satapathy et al. [26] for evaluating the cost of different software projects. In order to attain better accuracy, the FCP approach employs the various adaptive regression techniques for effort estimation.

Borandag et al. [7] prepared a case study for the software size estimation through MK II FPA (MK II Function Point Analysis) and FP methods. They used MK II FPA and FP methods to estimate the size of the software product. They implemented the same software by different developers to study their size estimation process and the size of the developed software is compared.

Yücalar et al. [30] developed a new multiple linear regression analysis-based effort estimation method. They used the datasets of the 10 software projects developed by 4 well-established software companies in Turkey. The results of the proposed method were compared with the standard Use Case Point method and simple linear regression-based effort estimation method.

3 Proposed Work

Finding the accurate effort for the new software project based on the historical dataset is a burden for project managers as there is no such model to estimate the effort directly. They have to think in many ways to reach an appropriate estimation. Our work concentrates on how to improve the accuracy and also on which techniques and datasets, approaches yield good results. Cocomo81, Cocomonasa60, Cocomonasa93, Deshnaris, ALBRECHT, Kemerer, Miyazaki1 and MAXWELL datasets are selected for our analysis. Among those datasets, ALBRECHT and Kemerer are based on FPA. We proposed four steps to reach good estimation accuracy:

  1. Select the classifier.

  2. Find the best clusters by applying the selected classifier from the first step.

  3. Perform analogy and optimization together to reach optimal solutions using best clusters.

  4. Find the new effort with the help of optimal solutions.

The diagram in Figure 1 shows the model of our proposed work. It also presents the four steps in arriving better estimation accuracy.

Figure 1: Proposed Learning Approach Model.
Figure 1:

Proposed Learning Approach Model.

3.1 Select the Classifier

Here we have applied two classifiers on the selected datasets: multivariate linear regression and deep structured multilayer perceptron.

3.1.1 Multivariate Linear Regression

Linear regression [10] follows the equation of the line where slope, x-coordinate, y-coordinate and constant are replaced by weight, one or more independent variables, predictor variable (y) and regression coefficient. It takes the form of

(1) y=b+wx,

where b and w are regression coefficients. b is the Y-intercept and w is the slope of the line. These coefficients can be thought of as weights. So the above expression (1) can be rewritten as follows:

(2) y=w0+w1x.

Let D be the training set of tupels that contains |D| datasets of the form (x1, y1), (x2, y2), …, (x|D|, y|D|). The regression coefficients (15) can be calculated using the following equations:

(3) w1=i=1|D|(xix¯)(yiy¯)i=1|D|(xix¯),
(4) w0=y¯w1x¯,

where x¯ is the mean of x and ȳ is the mean of y.

Simple linear regression is based on only one explanatory variable. The next form of simple linear regression is multiple linear regression [6] which is based on more than one explanatory variable. Another very interesting form of linear regression is multivariate linear regression which relies on more than one predictor variable. For example, it can give more than one predictor. We have used multivariate linear regression for our work with the hope that in future there may be more than one predictor variables.

3.1.2 Deep Structured Multilayer Perceptron

A multilayer perceptron (MLP) is a feed-forward ANN model. It shows an association between given data and appropriate results. This model is represented as a directed graph which shows a set of nodes as each layer, thereby forming multiple layers. Each layer is fully connected to the next layer. The set of nodes in one layer where the input data are given is said to be the input layer. One or more nodes can be used to generate or predict output. Such nodes are in the output layer. The nodes in between the input layer and the output layer form a hidden layer. Except input nodes, all the other nodes use an activation function to generate the output. Input nodes accept only input data and just pass data to the next layer. All the other nodes use a nonlinear activation function [11] to reach the output. It is based on a supervised learning technique called backpropagation for training the network where the errors propagated in the backward direction till appropriate outputs are produced. In deep learning, each node is analyzed for more different parameters.

There are three layers in this model: an input layer, a hidden layer and an output layer. The data are given in the input layer. For all input attributes, there are nodes in the input layer. The nodes where the output is produced are in the output layer. The number of output nodes is to represent the number of classes. The nodes in between the input layer and the output layer are in the hidden layers. The link between nodes has a weight (a number) w and each node performs a weighted sum of its inputs and thresholds the result with the help of the activation function.

Figure 2 [9] shows the MLP neural network.

Figure 2: An Example of the Multilayer Perceptron Neural Network.
Figure 2:

An Example of the Multilayer Perceptron Neural Network.

Of the two classifiers, deep structured multilayer perceptron yields good results and it is selected for the next step where this selected classifier is applied to clusters. In our work, we modified each node in a way that it learns itself more and more properties for estimation.

3.2 Select the Best Clustering Technique

Two types of clustering techniques are analyzed. They are vector quantized k-means clustering and Probabilistic Model-Based Expectation-Maximization (EM) clustering.

3.2.1 Vector Quantized k-Means Clustering

The k-means clustering method is the simplest form of clustering [10], [25]. By exercising a partitioning algorithm, it organizes the data into groups. It also splits dataset D of n objects into k partitions or k-clusters, C1, C2, C3, …, Ck, where CiD and CiCj = ∅ for 1 ≤ i, jk. Each of the clusters is represented by its centroid. The centroid can be interpreted by the mean of the objects. And so forth this method is named k-means. In the initial case, some k objects are haphazardly chosen as the cluster center (centroid). Each object of every single iteration is compared with each centroid with the help of distance measures. The object which represents the lowest distance against one particular centroid belongs to the cluster of that centroid. When each iteration ends, the computed mean value for each cluster and new mean becomes the new cluster center of each cluster. And the process will reoccur till there is no change in the cluster center or the sum of squared errors between all objects in Ci and the centroid ci is minimum for all k partitions:

(5) E=i=1kpCidist(p,ci)2.

Instead of simple k-means, we opted for vector quantized k-means to reach better results.

3.2.2 Probabilistic Model-Based Expectation-Maximization Algorithm

The EM algorithm is one of the clustering techniques which compare the given data with some mathematical model. This algorithm is an extended version of k-means clustering. In k-means clustering, each object is assigned to a particular cluster based on the distance measure, whereas in EM, each object is assigned to a cluster based on the probability of membership of the object. The EM algorithm [10] is as follows:

  1. Expectation step: each object oi is to a cluster Ck with the probability

    (6) P(oiCk)=p(Ck|oi)
    (7) =p(Ck)p(oi|Ck)p(oi)

    where p(oi | Ck) = N(mk, Ek(oi)) follows the normal distribution around mean, mk, with expectation Ek. This step calculates the probability of cluster membership of object xi or the expected cluster membership of object oi.

  2. Maximization step: re-estimate the model parameters:

    (8) mk=1ni=1noip(oiCk)jp(oiCj)

    This step is the “maximization” of the likelihood of the distribution of the given data.

Of the two clusters, EM cluster improves the effort estimation accuracy based on the selected classifier deep structured multilayer perceptron. So EM clusters are used in the next step to perform optimization for generating optimal solutions.

3.3 Perform Optimization

Now we have good clusters of a dataset which can improve estimation accuracy. These clusters of data are used for optimization. Here we use fuzzy analogy and firefly optimization.

3.3.1 Original Firefly Algorithm

The firefly algorithm [28], [29] was developed by Yang in the year 2009. It is a nature-inspired algorithm based on flashing light behavior of real fireflies. Each firefly is attracted to other fireflies based on the light intensity by the process of bioluminescence. Fireflies with low flashing light are attracted toward fireflies with the high flashing light. It is based on three main principles:

  1. All fireflies are unisex. Attractions of fireflies are gender independent.

  2. The attractiveness of the firefly is proportional to the brightness of the firefly, i.e. less brighter fireflies are moved to the brighter one.

  3. The brightness of the firefly is determined by the objective function.

The main feature of the firefly is its attractiveness β, and it varies with respect to distance r between fireflies. It is defined as follows:

(9) β(r)=β0eΥrm

where β0 is attractiveness at r = 0, Υ is the light absorption coefficient and r is the distance between two fireflies xi and xj defined as the Cartesian distance.

(10) r=|xixj|=k=1d(xi,kxj,k)2

where d denotes the number of dimensions.

The movement of the firefly is updated with the help of the following equation:

(11) xi=xi+β0eΥri,j2(xjxi)+(rand12)

where xi is the current position of the firefly i, β0eΥri,j2 is the firefly’s attractiveness, α is the randomization parameter and rand is the random number generator between 0 and 1.

Firefly algorithm:

          1. Define the objective function f(x). x = (x1, x2, …, xd).
          2. Generate the initial population of fireflies xi (i=1, 2, 3, …, n).
          3. Set the light intensity Ii at xi by f(xi).
          4. Define the light absorption coefficient Υ.
          5. While (t < MaxGenerations)
                  For i = 1 to n fireflies
                         For j = 1 to n fireflies
                                        If (Ij > Ii)
                                               firefly i toward j.
                                               Vary attractiveness with respect to distance r via exp[Υr]
                                               Evaluate new solutions and update light intensity.
                         End for j.
                  End for i.
                  Rank all fireflies and find the current global best set of fireflies.
              End while

3.3.2 Fuzzy Analogy and Firefly Optimization

The fuzzy analogy is nothing but an analogy based on fuzzy logic. In analogy-based effort estimation, identical projects are identified from the historical dataset and these identified projects are used for effort estimation either by collecting opinion from the experts or by applying some mathematical model based on the similar projects. It consists of the case identification process, case retrieval process and case adaptation process.

There may be lots of instances in the historical data. So the process of finding matching cases is difficult. In the fuzzy analogy approach, all data are converted into fuzzy sets by applying fuzzy logic. Here all the variables are converted to linguistic variables by using membership functions. So categorical variables can be handled efficiently with the help of fuzzy logic. Once the fuzzy datasets are ready, our proposed work generates the fuzzy rules for the fuzzy dataset. From those fuzzy rules, optimal rules are generated with the help of the firefly optimization algorithm. In our work, three initial sets of solutions are formed with the set of flies (fuzzy rules), i.e. each solution consists of a set of flies. For each fly, the fitness value is computed. Here the fitness value is the mean magnitude of relative error (MMRE) value. For each solution, the summation of fitness values of all flies in that solution is calculated and the solution is ranked based on the minimum of that value. The solution with the minimum MMRE value is set aside and the remaining solutions are updated with other sets of rules. And this process is repeated n number of times till we get the best optimal solutions.

Once we reach optimal solutions, the next step in the fuzzy analogy is the identification of similar cases. This is achieved by finding the distance between projects p and pi2 by comparing each individual attribute of p1 and p2.

The next step in the fuzzy analogy is case adaptation. In this step estimate of the new project is derived from the effort values of similar projects.

4 Results and Discussions

For assessing the performances of the k-means clustering and EM algorithm, eight datasets have been selected from the PROMISE data repository. They are Cocomo81, Cocomonasa60 and Cocomonasa93, DESHARNAIS, ALBRECHT, Kemerer, Miyazaki1 and MAXWELL datasets. Cocomo81 has 63 instances and 17 attributes (all numeric: 15 for the effort multipliers, one for Lines of Code (LOC) and one for actual development effort), and there is no missing attribute. Cocomonasa60 has 60 instances and 17 attributes (15 discrete in the range very low to extra high). Cocomonasa93 has 93 instances and 24 attributes and DESHARNAIS has 81 instances and 12 attributes. For ALBRECHT, there are 24 instances and 8 attributes. Kemerer has 15 instances and 8 attributes. For Miyazaki1, there are 48 instances and 9 attributes. MAXWELL has 62 instances and 27 attributes. Among those datasets, ALBRECHT and Kemerer are FPA-based datasets and Miyazaki1 is a COBOL dataset.

Parameters for validation:

  1. Correlation coefficient: Correlation tells how much actual and predicted are related. Its value ranges from −1 to 1, where 0 is no relation, 1 is the very strong linear relation and −1 is an inverse linear relation.

  2. Mean magnitude of relative error (MMRE): There are many measures to predict the accuracy of the effort prediction models. But the commonly used measure is the MMRE.

    The MMRE can be measured by the following formula:

    (12) MMRE=1ni=1nMREi

    where MRE is the magnitude of relative error.

    (13) MRE=|acteffortesteffort||acteffort|

    MMRE ≤ 0.25 is the acceptable range [8].

  3. Prediction (PRED): This is also another measure to estimate the accuracy [14]:

    (14) PRED(0.25)=kn

    where k is the number of observations whose MRE is less than or equal to 0.25 and n is the number of observations.

  4. Classification accuracy: This is the percentage of the ratio of the number of projects classified correctly to the total number of projects within the dataset. A high value of classification accuracy leads to good accuracy.

  5. Clustering accuracy: This is the percentage of the ratio of the number of projects grouped correctly to the total number of projects within the dataset. A high value of clustering accuracy leads to good accuracy.

Table 1 shows the results of the four validation parameters for multivariate linear regression effort estimation.

Table 1:

Validation Parameters’ Values for Multivariate Linear Regression Effort Estimation.

Datasets Multivariate linear regression effort estimation
Correlation coefficient Classification accuracy MMRE Prediction
Cocomo81 0.706 70.6 0.265 59.3
Cocomonasa60 0.716 71.6 0.255 60.3
CocomoNasa93 0.72.7 72.7 0.245 61.3
Desharnais 0.737 73.7 0.235 63.3
ALBRECHT FPA 0.91 75.2 0.23 61.5
Kemerer FPA 0.37 40.2 0.55 30.2
Miyazaki1 COBOL 0.05 20 0.85 22.3
MAXWELL 0.81 65.4 0.20 69.3

From Table 1, it is noted that with the higher value of correlation coefficient the better prediction yields. For the better correlation coefficient values, the classification accuracy also increases.

Table 2 shows the results of the four validation parameters for deep structured multilayer perceptron effort estimation.

Table 2:

Validation Parameters’ Values for Deep Structured Multilayer Perceptron Effort Estimation.

Dataset Deep structured multilayer perceptron effort estimation
Correlation coefficient Classification accuracy MMRE Prediction
Cocomo81 0.767 70.6 0.205 66.3
Cocomonasa60 0.777 71.6 0.195 68.3
CocomoNasa93 0.787 72.7 0.185 69.3
Desharnais 0.797 73.7 0.175 71.9
ALBRECHT FPA 0.75 70.3 0.232 65.88
Kemerer FPA 0.35 34.4 0.595 30.23
Miyazaki1 COBOL 0.96 92.6 0.15 86.8
MAXWELL 0.76 73.1 0.195 69.4

From Table 2, it is also noted that higher the correlation coefficient value better the prediction and classification accuracy. But when we compare the above-mentioned two techniques based on MMRE and prediction, deep structured multilayer perceptron effort estimation performs better. So in the next step, we use deep structured multilayer perceptron classifier as the estimation model to estimate the effort on clustered data.

In Table 3, the values of validation parameters for deep structured multilayer perceptron effort estimation using vector quantized k-means clusters are tabulated.

Table 3:

Deep Structured Multilayer Perceptron Effort Estimation Using Vector Quantized k-Means Clusters.

Dataset Vector quantized k-means clustering and deep structured multilayer perceptron effort estimation
Clustering accuracy Classification accuracy MMRE Prediction
Cocomo81 71.3 75.7 0.215 64
Cocomonasa60 72.9 77.2 0.205 66
CocomoNasa93 74.5 79.2 0.195 67.9
Desharnais 76.3 80.7 0.185 69.9
ALBRECHT FPA 80 75.1 0.191 69.21
Kemerer FPA 48 39.91 0.43 43.66
Miyazaki1 COBOL 98 94.72 0.13 90.1
MAXWELL 79 76.01 0.15 74.44

Table 4 shows the values of validation parameters for deep structured multilayer perceptron effort estimation using probabilistic model-based EM clusters.

Table 4:

Deep Structured Multilayer Perceptron Effort Estimation Using Probabilistic Model-Based Expectation-Maximization (EM) Clusters.

Dataset Probabilistic model-based EM and deep structured multilayer perceptron effort estimation
Clustering accuracy Classification accuracy MMRE Prediction
Cocomo81 82.2 85.3 0.165 72.9
Cocomonasa60 85 87.8 0.155 74.9
CocomoNasa93 87.6 91 0.145 75.8
Desharnais 89.7 94 0.135 77.8
ALBRECHT FPA 83 78.2 0.16 75.12
Kemerer FPA 53 42.12 0.4 48.22
Miyazaki1 COBOL 99 96.48 0.10 94.12
MAXWELL 80 76.77 0.14 78.64

When we compare MMRE and prediction values in Tables 3 and 4, it is found that probabilistic model-based EM clusters give good accuracy in prediction and less MMRE values than vector quantized k-means clusters. So probabilistic model-based EM clusters are used by firefly optimization and fuzzy analogy for effort estimation. The result of this approach is shown in Table 5.

Table 5:

Effort Estimation Using Expectation-Maximization (EM) Clusters, Firefly Optimization and Fuzzy Analogy.

Dataset Probabilistic model-based EM, firefly optimization and fuzzy analogy effort estimation
Clustering accuracy Classification accuracy MMRE Prediction
Cocomo81 82.2 80.7 0.125 78.8
Cocomonasa60 85 81.7 0.115 79.8
CocomoNasa93 87.6 83.3 0.105 81.8
Desharnais 89.7 84.3 0.095 84.6
ALBRECHT FPA 88 81.12 0.10 85.88
Kemerer FPA 69 59.66 0.23 55.01
Miyazaki1 COBOL 99.9 98.88 0.010 97.76
MAXWELL 83 79.34 0.10 83.44

From Table 5, it is evident that prediction accuracy measures like MMRE and prediction values are much more improved than other previous methods used in steps 1 and 2 of the proposed method. Table 6 tabulates the performance measures of our proposed method with the other two existing methods (MDELP and COA-FIS).

Table 6:

Comparison of Proposed Work with Two Existing Methods.

Dataset Existing MDELP
Existing COA-FIS
Proposed method
MMRE Prediction MMRE Prediction MMRE Prediction
Cocomo81 0.325 50.5 0.245 61.2 0.125 78.8
Cocomonasa60 0.315 52.4 0.235 62.2 0.115 79.8
CocomoNasa93 0.305 53.4 0.225 64.2 0.105 81.8
Desharnais 0.295 55.4 0.215 65.3 0.095 84.6
ALBRECHT FPA 0.243 55.25 0.241 62.6 0.10 85.88
Kemerer FPA 0.29 44.4 0.331 46.7 0.23 55.01
Miyazaki1 COBOL 0.29 56.6 0.2 66.4 0.010 97.76
MAXWELL 0.20 58.14 0.19 69 0.10 83.44

Figures 3 and 4 graphically show that MMRE values of the proposed method for the four selected datasets are less than the MMRE values of the existing methods. Also prediction values increase when compared with two methods. Hence, the proposed method improves the accuracy of the effort estimation.

Figure 3: MMRE Comparison between the Existing and Proposed Method.
Figure 3:

MMRE Comparison between the Existing and Proposed Method.

Figure 4: Prediction Comparison between the Existing and Proposed Method.
Figure 4:

Prediction Comparison between the Existing and Proposed Method.

From our experiments, it is evident that the MMRE values of Cocomo81, Cocomonasa60, CocomoNasa93 and Desharnais datasets decreased by 62%, 63%, 66% and 68%, respectively, when compared with the existing MDELP and 49%, 51%, 53% and 56%, respectively, when compared with the existing COA-FIS. Also the MMRE values of ALBRECHT, Kemerer, Miyazaki1 and MAXWELL datasets decreased by 59%, 21%, 97% and 50%, respectively, when compared with the existing MDELP and 58%, 35%, 663% and 45%, respectively, when compared with the existing COA-FIS.

Similarly, the prediction values of Cocomo81, Cocomonasa60, CocomoNasa93 and Desharnais datasets increased by 56%, 52%, 53% and 53%, respectively, when compared with the existing MDELP and 29%, 28%, 27% and 30%, respectively, when compared with the existing COA-FIS. Also, the prediction values of ALBRECHT, Kemerer, Miyazaki1 and MAXWELL datasets increased by 55%, 24%, 73% and 44%, respectively, when compared with the existing MDELP and 37%, 18%, 47% and 21%, respectively, when compared with the existing COA-FIS.

5 Conclusion

Congenial to the results of several researchers, it is incontestable that there is no approach which is suitable for estimating software project effort for all domains and all kinds of applications. Thus, it is indispensable to use the prior experiences of the projects to estimate the effort of the current project. Analogy-based effort estimation is one among them. So it can be implemented on machine learning techniques to derive better analogies. In this paper, we gave emphasis to two learning approaches such as classification and clustering. Of the two clustering methods, EM clusters are more excelling than vector quantized k-means clusters contingent on MMRE and prediction values. So EM clusters are taken for fuzzy analogy and firefly optimization to get the optimal solutions. From the optimal solutions, the effort of the new project is derived with good accuracy. Different optimization techniques have been left for future for analyzing which optimization suits for which type of domain datasets. Different clustering and classification techniques can be considered for future work. And also, it could be better if some data pre-processing techniques are applied before the data applied for clustering and classification.

Bibliography

[1] A. J. Albrecht and J. A. Gaffney, Software function, source lines of codes, and development effort prediction: a software science validation, IEEE Trans. Softw. Engg. 9 (1983), 639–648.10.1109/TSE.1983.235271Search in Google Scholar

[2] R. de A. Araujo, A. L. I. Oliveira and S. Meira, A class of hybrid multilayer perceptrons for software development effort estimation problems, Expert Syst. Appl. 90 (2017), 1–12.10.1016/j.eswa.2017.07.050Search in Google Scholar

[3] M. Azzeh, A replicated assessment and comparison of adaptation techniques for analogy-based effort estimation, Empir. Softw. Eng. 17 (2012), 90–127.10.1007/s10664-011-9176-6Search in Google Scholar

[4] M. Azzeh and A. B. Nassif, Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics, IET Softw. 9 (2015), 39–50.10.1049/iet-sen.2013.0165Search in Google Scholar

[5] B. W. Boehm, Software engineering economics, Prentice Hall, Englewood Cliffs, NJ, 1981.Search in Google Scholar

[6] B. W. Boehm and R. Valerdi, Achievements and challenges in cocomo-based software resource estimation, IEEE Softw. 25 (2008), 74–83.10.1109/MS.2008.133Search in Google Scholar

[7] E. Borandag, F. Yucalar and S. Z. Erdogan, A case study for the software size estimation through MK II FPA and FP methods, Int. J. Comput. Appl. Technol. 53 (2016), 309–314.10.1504/IJCAT.2016.076777Search in Google Scholar

[8] S. D. Conte, H. E. Dunsmore and V. Y. Shen, Software engineering metrics and models, Benjamin-Cummings Publishing, Redwood City, 1986.Search in Google Scholar

[9] GitHub. https://dzone.com/articles/deep-learning-via-multilayer-perceptron-classifier.Search in Google Scholar

[10] J. Han and M. Kamber, Data mining concepts and techniques, 2nd ed., Elsevier, Amsterdam, The Netherlands, Reprinted 2008.Search in Google Scholar

[11] S. Haykin, Neural networks: a comprehensive foundation, 2nd ed. Prentice Hall, New York, 1998.Search in Google Scholar

[12] M. Humayun and C. Gang, Estimating effort in global software development projects using machine learning techniques, Int. J. Inform. Edu. Technol. 2 (2012), 208–211.10.7763/IJIET.2012.V2.111Search in Google Scholar

[13] C. Jones, Estimating software costs: bringing realism to estimating, 2nd ed., McGraw-Hill, New York, 2007.Search in Google Scholar

[14] M. Jørgensen, Experience with the accuracy of software maintenance task effort prediction models, IEEE Trans. Softw. Engg. 21 (1995), 674–681.10.1109/32.403791Search in Google Scholar

[15] A. Kaushik, S. Verma, H. J. Singh and G. Chhabra, Software cost optimization integrating fuzzy system and COA-cuckoo optimization algorithm, Int. J. Syst. Assur. Eng. Manage. 8 (2017), 1461–1471.10.1007/s13198-017-0615-7Search in Google Scholar

[16] J. Keung, B. Kitchenham and D. R. Jeffery, Analogy-X: providing statistical inference to analogy-based software cost estimation, IEEE Trans. Softw. Engg. 34 (2008), 471–484.10.1109/TSE.2008.34Search in Google Scholar

[17] B. V. Khatibi, D. N. A. Jawawi, S. Z. M. Hashim and E. Khatibi, Increasing the accuracy of software development effort estimation using projects clustering, IET Softw. 6 (2012), 461–473.10.1049/iet-sen.2011.0210Search in Google Scholar

[18] B. V. Khatibi, D. N. A. Jawawi and E. Khatibi, Increasing the accuracy of analogy based software development effort estimation using neural networks, Int. J. Comput. Commun. Engg. 2 (2013), 78–81.10.7763/IJCCE.2013.V2.142Search in Google Scholar

[19] E. Kocaguneli, T. Menzies and A. Bener, Exploiting the essential assumptions of analogy-based effort estimation, IEEE Trans. Softw. Eng. 38 (2012), 425–438.10.1109/TSE.2011.27Search in Google Scholar

[20] J. Z. Li, G. Ruhe, A. Al-Emran and M. M. Ritcher, A flexible method for software effort estimation by analogy, Empir. Softw. Eng. 12 (2007), 65–106.10.1007/s10664-006-7552-4Search in Google Scholar

[21] S. Malathi and S. Sridhar, Estimation of effort in software cost analysis for heterogeneous dataset using fuzzy analogy, Int. J. Comput. Sci. Inform. Security, 10 (2012), arXiv:1211.1136.Search in Google Scholar

[22] E. A. Nelson, Management handbook for the estimation of computer programming costs, System Developer Corp., Santa Monica, CA, USA, 1966.Search in Google Scholar

[23] Prabhakar and M. Dutta, Prediction of software effort using artificial neural network and support vector machine, Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3 (2013), 40–46.Search in Google Scholar

[24] L. H. Putnam, A general empirical solution to the macrosoftware sizing and estimating problem, IEEE Trans. Softw. Engg. 4 (1987), 345–361.10.1109/TSE.1978.231521Search in Google Scholar

[25] S. K. Sarangi and V. Jaglan, Performance comparison of machine learning algorithms on integration of clustering and classification techniques, Int. J. Emerg. Technol. Comput. Appl. Sci. (2013) 251–257.Search in Google Scholar

[26] S. M. Satapathy, M. Kumar and S. K. Rath, Fuzzy-class point approach for software effort estimation using various adaptive regression methods, CSI Trans. ICT 1 (2013), 367–380.10.1007/s40012-013-0035-zSearch in Google Scholar

[27] M. Shepperd and C. Schofield, Estimating software project effort using analogies, IEEE Trans. Softw. Engg. 23 (1997), 736–743.10.1109/32.637387Search in Google Scholar

[28] X. S. Yang, Nature-inspired metaheuristic algorithms, Luniver Press, London, 2008.Search in Google Scholar

[29] X. S. Yang, Firefly algorithms for multimodal optimization, in: Stochastic algorithms: foundations and applications, vol. 5792, pp. 169–178, SAGA. Lecture Notes in Computer Sciences, Springer, Heidelberg, Berlin, 2009.10.1007/978-3-642-04944-6_14Search in Google Scholar

[30] F. Yücalar, D. Kilinc, E. Borandag and A. Ozcift, Regression analysis based software effort estimation method, Int. J. Softw. Eng. Knowledge Eng. 26 (2016) 807–826.10.1142/S0218194016500261Search in Google Scholar

Received: 2019-01-15
Accepted: 2019-05-22
Published Online: 2019-06-27

©2020 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

  1. An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm
  2. Texture Feature Extraction Using Intuitionistic Fuzzy Local Binary Pattern
  3. Leaf Disease Segmentation From Agricultural Images via Hybridization of Active Contour Model and OFA
  4. Deadline Constrained Task Scheduling Method Using a Combination of Center-Based Genetic Algorithm and Group Search Optimization
  5. Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
  6. Distributed Multi-agent Bidding-Based Approach for the Collaborative Mapping of Unknown Indoor Environments by a Homogeneous Mobile Robot Team
  7. An Efficient Technique for Three-Dimensional Image Visualization Through Two-Dimensional Images for Medical Data
  8. Combined Multi-Agent Method to Control Inter-Department Common Events Collision for University Courses Timetabling
  9. An Improved Particle Swarm Optimization Algorithm for Global Multidimensional Optimization
  10. A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
  11. Pythagorean Hesitant Fuzzy Information Aggregation and Their Application to Multi-Attribute Group Decision-Making Problems
  12. Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data
  13. A Bayesian Multiresolution Approach for Noise Removal in Medical Magnetic Resonance Images
  14. Gbest-Guided Artificial Bee Colony Optimization Algorithm-Based Optimal Incorporation of Shunt Capacitors in Distribution Networks under Load Growth
  15. Graded Soft Expert Set as a Generalization of Hesitant Fuzzy Set
  16. Universal Liver Extraction Algorithm: An Improved Chan–Vese Model
  17. Software Effort Estimation Using Modified Fuzzy C Means Clustering and Hybrid ABC-MCS Optimization in Neural Network
  18. Handwritten Indic Script Recognition Based on the Dempster–Shafer Theory of Evidence
  19. An Integrated Intuitionistic Fuzzy AHP and TOPSIS Approach to Evaluation of Outsource Manufacturers
  20. Automatically Assess Day Similarity Using Visual Lifelogs
  21. A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering
  22. Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
  23. Self-Adaptive Mussels Wandering Optimization Algorithm with Application for Artificial Neural Network Training
  24. A Framework for Image Alignment of TerraSAR-X Images Using Fractional Derivatives and View Synthesis Approach
  25. Intelligent Systems for Structural Damage Assessment
  26. Some Interval-Valued Pythagorean Fuzzy Einstein Weighted Averaging Aggregation Operators and Their Application to Group Decision Making
  27. Fuzzy Adaptive Genetic Algorithm for Improving the Solution of Industrial Optimization Problems
  28. Approach to Multiple Attribute Group Decision Making Based on Hesitant Fuzzy Linguistic Aggregation Operators
  29. Cubic Ordered Weighted Distance Operator and Application in Group Decision-Making
  30. Fault Signal Recognition in Power Distribution System using Deep Belief Network
  31. Selector: PSO as Model Selector for Dual-Stage Diabetes Network
  32. Oppositional Gravitational Search Algorithm and Artificial Neural Network-based Classification of Kidney Images
  33. Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure
  34. Sparse Decomposition Technique for Segmentation and Compression of Compound Images
  35. Automatic Genetic Fuzzy c-Means
  36. Harmony Search Algorithm for Patient Admission Scheduling Problem
  37. Speech Signal Compression Algorithm Based on the JPEG Technique
  38. i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
  39. Prediction of User Future Request Utilizing the Combination of Both ANN and FCM in Web Page Recommendation
  40. Presentation of ACT/R-RBF Hybrid Architecture to Develop Decision Making in Continuous and Non-continuous Data
  41. An Overview of Segmentation Algorithms for the Analysis of Anomalies on Medical Images
  42. Blind Restoration Algorithm Using Residual Measures for Motion-Blurred Noisy Images
  43. Extreme Learning Machine for Credit Risk Analysis
  44. A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings
  45. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
  46. A One-Pass Approach for Slope and Slant Estimation of Tri-Script Handwritten Words
  47. Secure Communication through MultiAgent System-Based Diabetes Diagnosing and Classification
  48. Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
  49. Pythagorean Fuzzy Einstein Hybrid Averaging Aggregation Operator and its Application to Multiple-Attribute Group Decision Making
  50. Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content
  51. A Flame Detection Method Based on Novel Gradient Features
  52. Modeling and Optimization of a Liquid Flow Process using an Artificial Neural Network-Based Flower Pollination Algorithm
  53. Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
  54. A Grey Wolf Optimizer for Text Document Clustering
  55. Classification of Masses in Digital Mammograms Using the Genetic Ensemble Method
  56. A Hybrid Grey Wolf Optimiser Algorithm for Solving Time Series Classification Problems
  57. Gray Method for Multiple Attribute Decision Making with Incomplete Weight Information under the Pythagorean Fuzzy Setting
  58. Multi-Agent System Based on the Extreme Learning Machine and Fuzzy Control for Intelligent Energy Management in Microgrid
  59. Deep CNN Combined With Relevance Feedback for Trademark Image Retrieval
  60. Cognitively Motivated Query Abstraction Model Based on Associative Root-Pattern Networks
  61. Improved Adaptive Neuro-Fuzzy Inference System Using Gray Wolf Optimization: A Case Study in Predicting Biochar Yield
  62. Predict Forex Trend via Convolutional Neural Networks
  63. Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  64. A Novel Weakest t-norm based Fuzzy Fault Tree Analysis Through Qualitative Data Processing and Its Application in System Reliability Evaluation
  65. FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
  66. A Modified Jaya Algorithm for Mixed-Variable Optimization Problems
  67. An Improved Robust Fuzzy Algorithm for Unsupervised Learning
  68. Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems
  69. An Efficient Lossless ROI Image Compression Using Wavelet-Based Modified Region Growing Algorithm
  70. Predicting Automatic Trigger Speed for Vehicle-Activated Signs
  71. Group Recommender Systems – An Evolutionary Approach Based on Multi-expert System for Consensus
  72. Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
  73. A New Feature Selection Method for Sentiment Analysis in Short Text
  74. Optimizing Software Modularity with Minimum Possible Variations
  75. Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices
  76. Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach
  77. Feature Pair Index Graph for Clustering
  78. Tangramob: An Agent-Based Simulation Framework for Validating Urban Smart Mobility Solutions
  79. A New Algorithm Based on Magic Square and a Novel Chaotic System for Image Encryption
  80. Video Steganography Using Knight Tour Algorithm and LSB Method for Encrypted Data
  81. Clay-Based Brick Porosity Estimation Using Image Processing Techniques
  82. AGCS Technique to Improve the Performance of Neural Networks
  83. A Color Image Encryption Technique Based on Bit-Level Permutation and Alternate Logistic Maps
  84. A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
  85. Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language
  86. Trapezoidal Linguistic Cubic Fuzzy TOPSIS Method and Application in a Group Decision Making Program
  87. Histopathological Image Segmentation Using Modified Kernel-Based Fuzzy C-Means and Edge Bridge and Fill Technique
  88. Proximal Support Vector Machine-Based Hybrid Approach for Edge Detection in Noisy Images
  89. Early Detection of Parkinson’s Disease by Using SPECT Imaging and Biomarkers
  90. Image Compression Based on Block SVD Power Method
  91. Noise Reduction Using Modified Wiener Filter in Digital Hearing Aid for Speech Signal Enhancement
  92. Secure Fingerprint Authentication Using Deep Learning and Minutiae Verification
  93. The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code
  94. Non-word Attributes’ Efficiency in Text Mining Authorship Prediction
  95. Design and Evaluation of Outlier Detection Based on Semantic Condensed Nearest Neighbor
  96. An Efficient Quality Inspection of Food Products Using Neural Network Classification
  97. Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation
  98. M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
  99. Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
  100. Linear Regression Supporting Vector Machine and Hybrid LOG Filter-Based Image Restoration
  101. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
  102. Implementation of Improved Ship-Iceberg Classifier Using Deep Learning
  103. Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM
  104. Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory
  105. A 4D Trajectory Prediction Model Based on the BP Neural Network
  106. A Blind Medical Image Watermarking for Secure E-Healthcare Application Using Crypto-Watermarking System
  107. Discriminating Healthy Wheat Grains from Grains Infected with Fusarium graminearum Using Texture Characteristics of Image-Processing Technique, Discriminant Analysis, and Support Vector Machine Methods
  108. License Plate Recognition in Urban Road Based on Vehicle Tracking and Result Integration
  109. Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection
  110. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic
  111. Cloud Security: LKM and Optimal Fuzzy System for Intrusion Detection in Cloud Environment
  112. Power Average Operators of Trapezoidal Cubic Fuzzy Numbers and Application to Multi-attribute Group Decision Making
Downloaded on 9.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2019-0023/html
Scroll to top button