Home Circular convolution-based feature extraction algorithm for classification of high-dimensional datasets
Article Open Access

Circular convolution-based feature extraction algorithm for classification of high-dimensional datasets

  • Rupali Tajanpure EMAIL logo and Akkalakshmi Muddana
Published/Copyright: October 22, 2021
Become an author with De Gruyter Brill

Abstract

High-dimensional data analysis has become the most challenging task nowadays. Dimensionality reduction plays an important role here. It focuses on data features, which have proved their impact on accuracy, execution time, and space requirement. In this study, a dimensionality reduction method is proposed based on the convolution of input features. The experiments are carried out on minimal preprocessed nine benchmark datasets. Results show that the proposed method gives an average 38% feature reduction in the original dimensions. The algorithm accuracy is tested using the decision tree (DT), support vector machine (SVM), and K-nearest neighbor (KNN) classifiers and evaluated with the existing principal component analysis algorithm. The average increase in accuracy (Δ) is 8.06 for DT, 5.80 for SVM, and 18.80 for the KNN algorithm. The most significant characteristic feature of the proposed model is that it reduces attributes, leading to less computation time without loss in classifier accuracy.

1 Introduction

1.1 Overview

The data generated every day due to Internet-based applications are multidimensional. Data originate from various sources in different forms [46]. It is a very cumbersome task to analyze such extensive data. The accuracy and speed of data analysis depend on the features of the data involved. Features describe instances of data. Features can be constructed, discretized, transformed, or selected from massive data to classify data more accurately. There are three main kinds of features: categorical, ordinal, and quantitative. There is a need for feature transformation to improve the usefulness of features by changing, removing, or adding information. The feature transformation may turn the set of features into another set of features having fewer features. Selecting a subset of a given set of features speeds up learning and helps protect from overfitting [1].

1.2 Dimensionality reduction

Different features of data contain information about the target. If more features characterize data, then information carried by features is more, and hence data analytics should lead to better results. But this is not always true. The presence of irrelevant or redundant features does not contribute to data classification, known as the curse of dimensionality [33,48]. Here the need for data reduction comes. Data reduction is nothing but removing irrelevant or redundant data before going for analysis. Data-reduction algorithms should reduce the size of data without loss of properties of data [7]. Different data-reduction techniques such as dimensionality reduction, data transformation, and numerosity reduction exist. Among these techniques, dimensionality-reduction methods such as principal component analysis (PCA) and discrete wavelet transforms are widespread [2,7,47].

The feature selection technique selects the crucial features of the original data based on some criteria. Feature selection tries to find the best subset of features among competing 2 n candidate subsets by applying some evaluation criteria. Since this feature selection process attempts to find only the best subset, it becomes exhaustive, leading to more cost. Some stopping criteria need to be there for other methods based on heuristic algorithms [28]. The necessary steps for the feature selection algorithm are as follows:

1. generate the next candidate feature subset,

2. evaluate the candidate feature subset using some evaluation function,

3. apply some stopping criteria, and

4. apply some validity check for validation of candidate feature subset [38,39].

In the literature, feature selection is categorized based on different strategies. It can be supervised, semi-supervised, or unsupervised based on the use of training data. According to the relationship between learning methods, feature selection is classified into the filter, wrapper, and hybrid models. Proper feature selection algorithms can improve learning accuracy and reduce the time required to analyze data [3,28]. The filter method uses the relationship between features and the class label. Techniques used for the filter method are information gain, chi-square, and reliefF. Wrapper methods work on the relevance of features and optimal feature subset selection. The embedded/hybrid approach uses a learning algorithm that selects features in the training process [4,5,6]. These algorithms work well, with different classifiers playing a dominant role in assessing feature reduction algorithms. Feature extraction transforms original data into features with strong pattern-recognition ability. PCA is one of the well-known feature extraction techniques. It generates new features using a linear combination of initial features [7]. Convolutional neural network (CNN) is used to reduce the size of an image without the loss of valuable features. With CNN transformation, the processing of images becomes easy. Here kernel filter is convolved with the input features to generate the feature map. The output features are local and short term. Authors in [51] proposed CNN-based gated recurrent unit model for human activity recognition.

1.3 Motivation

Feature selection, feature extraction, and feature optimization are the different forms of dimensionality reduction. Dimensionality reduction is today’s research topic as it plays a vital role in analyzing data in various important fields like sentiment analysis [8]. A lot of research is done on sentiment analysis using machine learning algorithms and different word embedding techniques in a wide range of applications such as massive open online courses (MOOC) course assessment, product reviews, and probable question topic extraction [9,10,11]. Turkish and bibliometric data analysis is also done using sentiment analysis with feature selection techniques [12,13]. Humanoid robots have a wide application area in today’s world. Feature selection plays a vital role in robotics too. The real challenge here is to get push recovery for the humanoid robot and non-linearity associated with the motion. To know the neurological disorders and problem identification of gait abnormality, human motion study is very important [49,51]. Human gait is unique for every person. The human walk is described with different joint trajectories [14,15,16]. This push recovery data classification is also done in the literature by using deep learning techniques. Features play an essential role in developing a computing module for push and gait recovery [17,18].

Feature reduction is also necessary for natural language processing. Machines do not have the capability like humans to understand the language with proper meaning. For natural language processing, research is carried out with different dimensions by many authors for the sentiment analysis of the comments, reviews on social media or websites [19], context-based queries [20], teacher assessment reviews [21], and text classification using keyword extraction [22,23,24]. Thus, feature reduction has become an important part of widely applicable data analytics. One can apply feature reduction to directly impact classifier accuracy, computational time, and space requirements.

1.4 Research contribution

This study contributes toward a feature extraction method that very effectively uses the convolution technique for the first time in the literature for feature reduction.

The highlights of this research are as follows:

  1. The proposed method shows a significant reduction in features.

  2. We are modeling the convolution technique to reduce the features.

  3. Performance analysis: Testing on different dimensional benchmark datasets is done with different classifiers. All combinations give better results than the existing method over accuracy and time.

  4. Due to notable reduction in features, reduction in computation time and space is observed.

1.5 Organization of the article

The rest of the article is schematized as follows. The second section describes the literature survey in the dimensionality reduction domain. The section also describes the work done in different domains related to dimensionality reduction. The third section gives the proposed methodology. The detailed algorithms are discussed in this section. The fourth section discusses the experimental setup. The fifth section elaborates on the results obtained with different classifiers and their analysis. The last section presents the concluding remark and future directions of the research.

2 Literature survey

Heart disease is one of the major killers among other diseases, which has caused 7.4 million deaths worldwide in 2015 [25]. Other diseases like cancer, kidney disease, hepatitis are other significant killers worldwide. In the medical field, disease detection is challenging, as the features under consideration may be irrelevant and redundant, causing a reduction in classification accuracy. This irrelevant and redundant feature elimination is essential to reduce classification efforts, reduce the risk of overfitting, and improve classification accuracy. Reduction in the feature set is inversely proportional to classification accuracy and directly proportional to classification time. In the literature, preprocessing steps before the actual data classification include feature reduction, feature selection, feature extraction, and feature optimization. These are popular techniques used for dimensionality reduction [26].

Vipin Kumar et al. discussed different feature selection approaches, such as filters, wrappers, and embedded methods, and their application in the real world. The features could be highly dependent on one another, or there could be too many features. Hence, different feature reduction techniques are used frequently by researchers in different areas. The author described different application areas of feature selection in the real world, such as remote sensing, text categorization, intrusion detection, and image retrieval. Some of the challenges in feature selection, such as large dimensional data, scalability, and stability, are mentioned in this study [27]. A database of 50 subjects is created for human gait analysis [49] by Vijay et al. The author represented data using 24 attributes per instance, which is preprocessed by using the Kalman filter. Different combinations of deep learning and hybrid deep learning classifiers are used to experiment, which shows a significant increase in accuracy (99.34%). The author also worked on a biped robot. The data regarding different walking styles are collected through inertial measurement unit sensors and walking patterns are analyzed [50].

Some of the nature-inspired algorithms for feature selection, such as the binary bat algorithm, particle swarm optimization, and modified cuckoo search algorithms, are discussed by Shrivastava et al. [28,29]. An integrated filter and wrapper method for a sequential search procedure improves classifier performance to tackle the overfitting problem and chances of getting a local optimal solution [30]. Xie and Wu proposed a feature selection algorithm based on association rules, which discovers the class attribute features as per the association analysis theory. Yet, this algorithm’s time complexity is relatively high due to the Apriori algorithm [31]. Alessio Ferone proposed a novel approach of feature selection based on rough set theory [32]. Jinghua Liua et al. proposed a feature selection method based on the distinguishing ability of features. He used the maximum nearest neighbor concept to discriminate against the nearest neighbor of samples and evaluated the quality of features [33]. Tajanpure and Jena [38] put forth a multistage classifier system approach, where the feature decimation according to their level of processing need is done. The first-level features are processed first, and as per the first classifier’s output, further level processing decision is taken. Dua et al. worked on human activity recognition with a proposed deep neural network-based CNN and gated recurrent unit. The proposed method performs features extraction and classification too [51].

Saul Solorio-Fernandez et al. proposed a new unsupervised spectral feature selection method. In many practical problems, the dataset under study is described through numerical and nonnumerical features, that is, mixed datasets. The spectral feature selection-based proposed method uses a kernel and a new spectrum-based feature evaluation measure for deciding the relevance of features. K-nearest neighbor (KNN), Naïve Bayes, and support vector machine (SVM) classifiers are used to find the proposed algorithm’s performance, and their accuracy is compared [39]. Much research is done on the disease diagnosis system based on different approaches, such as learning vector quantization and artificial neural networks [40] and classification algorithms [41,42]. Many dimensionality reduction algorithms are developed by researchers for different applications to work with some advantages and some limitations like overfitting, higher time complexity, and so on. The common base to evaluate these techniques is classifier accuracy and other parameters like specificity, sensitivity, F-measure, and so on [43].

Wei Song proposed an effective content-based feature selection approach to improve clustering performance for genetic algorithms. The conventional genetic algorithms face the problem of slow learning and local minimum due to a high-dimensional exploration space. The proposed approach is a parametric and a nonparametric algorithm to adjust the genetic algorithm operators properly [44].

In the brute-force feature selection method, all possible combinations of the input features are evaluated to find the best subset. Here the computational cost is high, with the considerable danger of overfitting. An important aspect of feature selection techniques is the evaluation of a candidate feature subset and searching through the feature space. If there exist at least two instances with the same feature values but with different class labels, the feature subset is classified as inconsistent [45].

3 Proposed feature reduction system design

Convolution is a way of converting two sequences into another sequence. Each value in the input sequence is viewed as scaling and shifting of unit impulse or delta function in digital signal processing (δ(n)). The output of convolution is simply expressed as shifted and scaled version of the input impulse. From this, it is very clear that the dimensions that are dominant in the input will result in output with the same dominance so the accuracy of the output is improved despite feature reduction [52]. This research proposes a feature extraction method based on convolution. Convolution is one of the basic operations of analog signals. There are two forms of convolution: linear convolution (LC) and circular convolution (CC). LC gives a linear overlapping result of the two sequences, which is nothing but the system’s output when triggered with input x 1(n) having transformation function x 2(n). LC relates an output sequence with a given input sequence and impulse response, as shown in equation (1). LC is computed over all relevant values of n, that is, from −∞ to +∞. In mathematical form, if k is any positive integer, convolution is expressed [35] as

(1) Y ( n ) = x 1 ( n ) x 2 ( n ) = n = x 1 ( k ) x 2 ( n k ) ,

whereas circular convolution gives the output when two sequences are circularly overlapped. Table 1 shows the comparison between LC and CC.

Table 1

Comparison between LC and CC

LC CC
Shifting of samples Linear shifting Circular shifting
No. of samples in the convolution result for inputs x 1(n) and x 2(n) L = length(x 1(n)) + length(x 2(n)) − 1 N = nearest 2 n of [max (length(x 1(n)), length(x 2(n)))]

One can get the output of CC same as LC [34] if we follow

(2) N = L = length ( x 1 ( n ) ) + length ( x 2 ( n ) ) 1 .

Here we focus more on the CC since we can reduce more features with it, as shown in the second row of Table 1. From the table, one can reduce only one feature per LC application, whereas one can find the nearest 2 n value concerning the maximum size of the first or second sequence for CC [34]. For getting the same number of values in both the input sequences, that is, N, both sequences are zero-padded as per need.

CC is one of the important properties of discrete Fourier transform (DFT). Equation (3) represents the mathematical form. If L is number of values in Y(L), that is, output sequence, n is number of values in x 1(n) and x 2(n), and N is the nearest 2 n of [max (length(x 1(n)), length(x 2(n)))], then CC is mathematically expressed [35] as

(3) Y ( L ) = n = 0 N 1 x 1 ( n ) x 2 ( L n ) N for L = 0 , 1 , 2 , , N 1 .

The above convolution equation involves index ((Ln)) N , known as CC. It is an important property of DFT. The multiplication of a DFT of two sequences corresponds to the CC of two sequences in the time domain [17].

As the proposed algorithm uses CC, we have different methods to find CC.

Consider x 1(n) and x 2(n) are the two sequences used to convolve linearly with the number of samples “m” and “n,” respectively, to get the output sequence y(l), containing “l” number of samples.

3.1 Method 1: CC by linear convolution equivalence (LCE) method

One can get the result of CC equivalent to that of LC by applying zero paddings to make the number of elements in the input sequence equal to (m + n − 1). According to the principle of LC, after convolution, one value will be less in the output sequence than the addition of the number of values in the input sequence. It is the base of the proposed attribute reduction concept.

3.2 Method 2: CC by DFT/inverse discrete fourier transform (IDFT) method

To find the CC of two sequences of length “m” and “n,” condition m = n should meet, and the length of two sequences should be close to the upper 2 n value. To meet this condition, one has to apply a zero-padding technique to the input sequence and then find the CC of the input sequences.

The architecture of the proposed system based on CC is as follows:

As shown in Figure 1, the input dataset undergoes data-preprocessing techniques. Then normalized data is divided into two groups, with elements nearly equal to the closest 2 n in each group. These two sets of features act as input to CC to extract a reduced set of features at the output. Afterwards, this reduced set of features is tested with a classifier to judge the proposed feature reduction algorithm’s performance.

Figure 1 
                  Proposed feature reduction system using FrbyCC.
Figure 1

Proposed feature reduction system using FrbyCC.

According to the above-mentioned method 1 and method 2 for finding CC, there are two variants of the FrbyCC method.

3.2.1 LCE method (FrbyLCE)

In this method, zero padding is applied to x 1(n) and x 2(n) so that each sequence will contain L (L = m + n − 1) number of elements. With this method, at a time, one feature is reduced. Here to reduce the number of attributes, we have to apply FrbyLCE repeatedly on the input.

Algorithm 1

Feature reduction using the LC equivalence method.

Input: F(n) – set of input features.

Output: Y(L) – reduced set of features.

Begin

  1. Divide the number of input features F(n) in two parts x 1(n 1) and x 2(n 2) … n 1n 2

  2. Find the number of elements in output, that is,

    L = n 1 + n 2 − 1.

  3. Perform zero padding

    x 1(L) = [x 1(n 1), zeropadding (1, Ln 1)]

    x 2(L) = [x 2(n 2), zeropadding (1, Ln 2)]

  4. Find CC of x 1(L) and x 2(L),

      X 1(K) = DFT(x 1(L)),

      X 2(K) = DFT(x 2(L)),

      Y(K) = X 1(K) ∗ X 2(K), and

      Y(L) = IDFT (Y(K))

End

3.2.2 DFT IDFT circular convolution method (FrbyCC)

Here, depending upon the number of points/samples, that is, N expected in DFT, zero padding is applied. The number of points in DFT could be 2, 4, 8, 16, 32, 64,…

Algorithm 2

Feature reduction using the DFT and IDFT CC method.

Input: F(n) – set of input features.

Output: Y(L) – reduced set of features.

Begin

  1. Divide the number of input features F(n) in two parts x 1(n 1) and x 2(n 2) … n 1n 2

  2. Select number of elements in output, that is, N

    N = nearest higher DFT point in the range of 2 n , which is 2, 4, 8, 16, 32, 64,…

  3. Perform zero padding

    x 1(n) = [x 1(n 1), zeropadding (1, Nn 1)] and

    x 2(n) = [x 2(n 2), zeropadding (1, Nn 2)].

  4. Find CC of x 1(n) and x 2(n),

      X 1(K) = DFT(x 1(n)),

      X 2(K) = DFT(x 2(n)),

      Y(K) = X 1(K) ∗ X 2(K), and

      Y(L) = IDFT (Y(K))

End

4 Experimental setup

The experiment is carried out on Intel core i5-7200U CPU @ 2.50GHz with 8 GB RAM and 64-bit Windows 10 Operating System laptop. The feature reduction and classification algorithms are implemented in Matlab 2015b.

4.1 Dataset description and preprocessing

From the UC irvine machine learning repository, nine datasets are used to evaluate the FrbyCC algorithm [36]. Out of these nine datasets, a good combination of less than 50 features, around 200 features, and more than 700 features dataset is selected. The datasets are preprocessed first. The missing values are replaced by the average of the column values and normalization is applied. FrbyLCE is assessed on the Cleveland heart disease dataset, whereas large attribute datasets evaluate FrbyCC. The purpose of selecting a large attribute dataset is to dominantly show the reduction in features. The output of FrbyCC is given to different classifiers for evaluation. Here tenfold cross-validation is applied to evaluate the classifier performance. Table 2 shows the basic information of the datasets.

Table 2

Datasets under consideration

Dataset Feature reduction algorithm No. of tuples No. of features
Parkinson’s disease FrbyCC 756 754
Arrhythmia FrbyCC 452 280
Internet ads FrbyCC 3,279 782
QSAR androgen receptor data set FrbyCC 1,687 1,024
Self-care activities dataset (SCADI) FrbyCC 70 206
Heart disease FrbyLCE 303 13
Kidney FrbyCC 402 25
Hepatitis FrbyCC 132 19
Breast cancer FrbyCC 699 9

5 Experimental evaluation

The proposed algorithm is applied to the dataset for feature reduction. The percent feature reduction is calculated to get the effectiveness of the proposed algorithm. Equation (4) finds percent feature reduction, which is the ratio of dimensions reduced after application of proposed algorithm to the original dimensions present in the dataset.

(4) PFR FrbyCC = f Original f FrbyCC f Original .

The reduced dataset is classified using SVM, K-nearest neighbor, and decision tree (DT) classifiers. A summary of an effect of the proposed feature reduction algorithm on different datasets is given in Table 3.

Table 3

Datasets under consideration and impact of FrbyCC method

Dataset under consideration No. of tuples No. of features Extracted features with FrbyCC Percent feature reduction
Parkinson’s disease 756 754 512 32.10
Arrhythmia 452 280 128 54.29
Internet ads 3,279 782 512 34.53
QSAR 1,687 1,025 512 50.05
SCADI 70 206 64 68.93
Kidney 402 25 16 36
Heart dataset 303 13 8 38.46
Hepatitis 132 19 16 15.78
Breast cancer 699 9 8 11.11

Table 3 shows that the FrbyCC algorithm has >30% reduction for large attribute datasets. Feature reduction also reduces the storage space and execution time of an algorithm. It is always useful to process a dataset containing a large volume of attributes. Metrics, such as accuracy, specificity, sensitivity, F-measure, and so on, are used to assess imbalanced datasets [37]. The evaluation of the proposed system is done by using the most frequently used accuracy metric on different datasets.

FrbyLCE method reduces one attribute per execution of the routine. One has to apply the FrbyLCE five times with the output of the previous stage applied as input to the next stage for reduction by five features. Figure 2 shows a two-stage FrbyLCE in a cascade fashion.

Figure 2 
               Feature reduction in a two-stage FrbyLCE method.
Figure 2

Feature reduction in a two-stage FrbyLCE method.

After the desired reduction in attributes, a classifier evaluates the results. Here two classifiers such as DT and SVM are used to evaluate the proposed method. Table 4 shows the results of the LCE–FrbyLCE method with DT classifier.

Table 4

Results of FrbyLCE method with DT classifier on Cleveland dataset

Algorithm DT No. of features for evaluation Accuracy (%) Sensitivity Specificity Precision Recall F-measure Confusion matrix
Without FrbyLCE 13 86.13 85.92 86.31 83.45 85.92 84.67 145 19
23 116
FrbyLCE reduction by one feature 12 89.769 91.538 88.439 85.612 91.538 88.476 153 11
20 119
FrbyLCE reduction by two features 11 90.099 93.6 87.64 84.173 93.6 88.636 156 8
22 117
FrbyLCE reduction by three features 10 91.419 90.647 92.073 90.647 90.647 90.647 151 13
13 126
FrbyLCE reduction by four features 9 90.429 88.194 92.453 91.367 88.194 89.753 147 17
12 127
FrbyLCE reduction by five features 8 90.429 92.308 89.017 86.331 92.308 89.219 154 10
19 120

Comparing Tables 4 and 5, it is clear that the FrbyLCE method’s use shows improvement in the accuracy of the DT classifier. The LCE method combines the two sets of input attributes to produce a new set of attributes, making DT classifier to classify more correctly on it. Hence, as we reduce features one by one, the DT classifier’s accuracy increases, and from stage III, accuracy remains constant for further reduction in features as observed from Table 4. Also, it is clear in Table 5 that the SVM classifier cannot select decision boundaries as features generated after FrbyLCE are a combination of two sets of input attributes. Here the accuracy decreases as we reduce the number of attributes.

Table 5

Results of FrbyLCE method with SVM classifier on Cleveland dataset

Algorithm SVM No. of features for evaluation Accuracy (%) Sensitivity Specificity Precision Recall F-measure Confusion matrix
Without FrbyLCE 13 93.72 93.47 93.93 92.80 93.47 93.14 155 9
10 129
FrbyLCE reduction by one feature 12 93.06 94.03 92.308 90.647 94.03 92.308 156 8
13 126
FrbyLCE reduction by two features 11 88.45 91.935 86.034 82.014 91.935 86.692 154 10
25 114
FrbyLCE reduction by three features 10 73.59 86.42 68.919 50.36 86.42 63.636 153 11
69 70
FrbyLCE reduction by four features 9 79.21 75.676 82.581 80.576 75.676 78.049 128 36
27 112
FrbyLCE reduction by five features 8 78.54 75.342 81.529 79.137 75.342 77.19 128 36
29 110

Now, Table 6 shows FrbyCC accuracy evaluated with DT, KNN, and SVM classifiers. The results are taken on nine benchmark datasets from the UCI repository. Accuracy is taken as the performance measure for evaluation. The results are compared with the existing PCA feature reduction algorithm. If the accuracy of a system is represented by A, the accuracy improvement with the proposed FrbyCC method is represented by Δ [53] and calculated as shown in equation (5),

(5) Δ DT = A FrbyCC A PCA A PCA × 100

Table 6

Accuracy of FrbyCC and PCA method with DT, KNN, and SVM classifier

Sr. No. Dataset Original features Extracted attributes with FrbyCC DT SVM KNN
PCA FrbyCC ΔDT PCA FrbyCC ΔSVM PCA FrbyCC ΔKNN
1 Parkinson’s disease 754 512 72.2 79.76 10.47 74.6 74.6 0.00 74.34 82.9 11.51
2 Arrhythmia 280 128 61.28 62.39 1.81 54.2 54.2 0.00 54.2 60.18 11.03
3 Internet ads 782 512 90.11 100 10.98 91 90.88 −0.13 46.35 97.62 110.61
4 QSAR 1,025 512 85.36 85.54 0.21 88.14 88.14 0.00 88.2 90.1 2.15
5 SCADI 206 64 95.71 98.57 2.99 77.14 77.14 0.00 77.14 87.14 12.96
6 Kidney 25 16 97.25 97.25 0.00 88.25 97 9.92 87.5 95.75 9.43
7 Heart dataset 13 8 94.71 94.73 0.02 57.1 79.21 38.72 79.21 84.48 6.65
8 Hepatitis 19 16 64.88 94.69 45.95 98.48 98.69 0.21 63.64 65.15 2.37
9 Breast cancer 9 8 94.85 94.92 0.07 93.42 96.71 3.52 94.42 96.71 2.43
Average Δ 8.06 5.80 18.80

The assessment of Table 6 leads to the fact that classification performance is improved by an average 8% with DT classifier and 5% for SVM classifier. The 18% average accuracy improvement is shown with KNN classifier. The accuracy improvement achieved is purely due to feature extraction done by FrbyCC algorithm. FrbyCC multiplies and adds the results of multiplication; that is, it maintains the importance of features in extracted features too. The features extracted are convolved output of the input features with important features influencing the extracted features. Despite of feature reduction, the impact of important features remains in the extracted output. Also, the difference in the improvement of FrbyCC with different classifiers is observed due to the impact of individual classifier behavior on the respective dataset. This proves that the FrbyCC method was very effective in accuracy terms. FrbyCC reduces more features in a single application on the dataset under consideration.

Table 6 shows the results with tenfold cross-validation. Observations show no drop in the accuracy of the classifier, despite feature reduction. In comparison with PCA, the FrbyCC method gives improved accuracy.

When we compare the FrbyLCE with FrbyCC, FrbyCC seems to be the most effective method. It reduces a significant number of features in one application, whereas FrbyLCE is to be applied in cascade fashion n times to reduce the “n” number of features. To verify accuracy in different datasets, FrbyCC is applied to different datasets, such as Parkinson’s dataset, Arrhythmia, Internet ads, QSAR, and SCADI dataset, and compared with the PCA algorithm. As seen in Table 3, the first five datasets are high dimensional. We get a reduction in features up to 60% with the FrbyCC algorithm. The rest of the datasets have fewer dimensions as compared to the first five. Our focus is on high-dimensional datasets, in which the reduction in features leads to a decrease in execution time and storage space. The best reduction is observed in features if half of the total number of features is near any 2 n .

The following graphs show each dataset’s accuracy with DT, KNN, and SVM, which implies that all the classifiers work differently on each dataset with improved accuracy.

It is observed from Figure 3 that DT and KNN work well, showing increased accuracy with the FrbyCC method compared to PCA. SVM gives accuracy in line with the PCA algorithm. Since PCA is also a feature extraction technique based on a linear combination of input attributes like FrbyCC, both algorithms work with similar accuracy for SVM.

Figure 3 
               Comparison of accuracy of FrbyCC and PCA with DT KNN SVM classifier.
Figure 3

Comparison of accuracy of FrbyCC and PCA with DT KNN SVM classifier.

In Matlab, time is measured using function tic and toc, which gives execution time in seconds. It is clear from Table 7 that the execution time of the FrbyCC algorithm is less than that of the PCA algorithm as FrbyCC is implemented using a DFT algorithm. Figures 46 show the graphical execution time comparison of FrbyCC and PCA feature reduction algorithm for different classifiers.

Table 7

Execution time comparison of FrbyCC and PCA method with DT, KNN, and SVM classifier (time measured in seconds)

Name of dataset FrbyCC PCA
DT time KNN time SVM time DT time KNN time SVM time
Parkinson’s disease 5.84 2.04 3.42 9.57 2.23 3.62
Arrhryathmia 2.76 1.39 2.17 3.8 1.44 2.37
Internet ads 2.68 12.69 25.21 7.62 16.71 45.79
QSAR 8.83 5.25 13.22 9.35 8.03 24.69
SCADI 0.23 1.54 1.79 0.26 1.61 1.94
Figure 4 
               Comparison of the execution time of FrbyCC and PCA with DT classifier.
Figure 4

Comparison of the execution time of FrbyCC and PCA with DT classifier.

Figure 5 
               Comparison of the execution time of FrbyCC and PCA with KNN classifier.
Figure 5

Comparison of the execution time of FrbyCC and PCA with KNN classifier.

Figure 6 
               Comparison of the execution time of FrbyCC and PCA with SVM classifier.
Figure 6

Comparison of the execution time of FrbyCC and PCA with SVM classifier.

FrbyCC is also tested with Naïve Bayes classifier, which is a probabilistic classifier. Naïve Bayes purely works on the independence assumption between the input features. Hence Naïve Bayes algorithm shows poor accuracy on FrbyCC results. Here FrbyCC algorithm extracts the reduced features by overlapping of original features. Hence Naïve Bayes does not classify with good accuracy.

6 Conclusion

Feature reduction and accuracy are crucial concerns in data classification. High data dimensionality is the main issue in data classification. The problem of high-dimensional data handling is addressed in this research. We proposed a feature extraction method based on convolution to reduce features without loss of classification accuracy. In the first technique described in this article, feature reduction by LCE reduces one feature per application. It works well with the DT algorithm. The second method, FrbyCC, proves very effective in dimensionality reduction. Experiements show that it works well with DT and KNN. For SVM, accuracy is seen inline with the PCA for most of the datasets. It is observed that the feature reduction in each of the dataset depends on its proximity with 2 n value. The average increase in accuray (Δ) achieved with DT, SVM, and KNN is 8.06, 5.80, and 18.80, respectively, on different benchmark datasets. The proposed algorithm reduces execution time with the use of the DFT/IDFT [52]. Overall in FrbyCC algorithm, with feature reduction, we relish the accuracy with less storage space and execution time.

  1. Conflict of interest: The authors do not have any type of conflict of interest.

  2. Future research direction: The future scope of this research is to explore and evaluate the proposed algorithm using deep learning and hybrid deep learning techniques.

References

[1] Flach P . Index. Machine learning: the art and science of algorithms that make sense of data. Cambridge: Cambridge University Press; 2012.10.1017/CBO9780511973000Search in Google Scholar

[2] Han J , Kamber M . Data mining: concepts and techniques. 3rd edn. Waltham: Morgan Kaufmann Publishers; 2006.Search in Google Scholar

[3] Cai J , Luo J , Wang S , Yang S . Feature selection in machine learning: a new perspective. Neurocomputing. 2018 July;300:70–9.10.1016/j.neucom.2017.11.077Search in Google Scholar

[4] Saeys Y , Inza I , Larrañaga P . A review of feature selection techniques in bioinformatics. Bioinformatics. 2007 Oct 1;23(19):2507–17. 10.1093/bioinformatics/btm344.Search in Google Scholar PubMed

[5] Lu Q , Li X , Dong Y . Structure preserving unsupervised feature selection. Neurocomputing. 2018;301:36–45.10.1016/j.neucom.2018.04.001Search in Google Scholar

[6] Jain D , Singh V . Feature selection and classification systems for chronic disease prediction: a review. Egypt Inform J. 2018;19:179–89.10.1016/j.eij.2018.03.002Search in Google Scholar

[7] Keerthi Vasan K , Surendiran B . Dimensionality reduction using principal component analysis for network intrusion detection. Perspect Sci. 2016;8:510–2.10.1016/j.pisc.2016.05.010Search in Google Scholar

[8] Onan A , Korukoglu S . A feature selection model based on genetic rank aggregation for text sentiment classification. J Inf Sci. 2015;43(1):25–38. 10.1177/0165551515613226.Search in Google Scholar

[9] Onan A . Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Comput Appl Eng Educ. 2020;29:572–89. 10.1002/cae.22253.Search in Google Scholar

[10] Onan A . Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr Comput Pract Exp. 2020 June 29. 10.1002/cpe.5909.Search in Google Scholar

[11] Onan A , Tocoglu MA . Weighted word embeddings and clustering‐based identification of question topics in MOOC discussion forum posts. Comput Appl Eng Educ. 2020;29:675–89. 10.1002/cae.22252.Search in Google Scholar

[12] Onan A . Sentiment analysis in Turkish based on weighted word embeddings. 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey: IEEE; 2020. p. 1–4. 10.1109/SIU49456.2020.9302182.Search in Google Scholar

[13] Onan A . Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access. 2019;7:145614–33. 10.1109/ACCESS.2019.2945911.Search in Google Scholar

[14] Semwal VB , Singha J , Sharma P , Chauhan A , Behera B . An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification. Multimed Tools Appl. 2017;76:24457–75. 10.1007/s11042-016-4110-y.Search in Google Scholar

[15] Gupta A , Semwal VB . Multiple task human gait analysis and identification: ensemble learning approach. In: Mohanty SN , editor. Emotion and information processing. Cham: Springer; 2020. 10.1007/978-3-030-48849-9_12.Search in Google Scholar

[16] Raj M , Semwal VB , Nandi GC . Bidirectional association of joint angle trajectories for humanoid locomotion: the restricted Boltzmann machine approach. Neural Comput Appl. 2018;30:1747–55. 10.1007/s00521-016-2744-3.Search in Google Scholar

[17] Semwal VB , Mondal K , Nandi GC . Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach. Neural Comput Appl. 2017;28:565–74. 10.1007/s00521-015-2089-3.Search in Google Scholar

[18] Semwal VB , Gaud N , Nandi GC . Human gait state prediction using cellular automata and classification using ELM. In: Tanveer M , Pachori R , editors. Machine intelligence and signal analysis. Advances in intelligent systems and computing. Vol. 748. Singapore: Springer; 2019. 10.1007/978-981-13-0923-6_12.Search in Google Scholar

[19] Onan A , Toçoğlu MA . A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access. 2021;9:7701–22. 10.1109/ACCESS.2021.3049734.Search in Google Scholar

[20] Singh U , Kedas S , Prasanth S , Kumar A , Semwal VB , Tikkiwal VA . Design of a recurrent neural network model for machine reading comprehension. Proc Comput Sci. 2020;167:1791–800. 10.1016/j.procs.2020.03.388. ISSN 1877-0509.Search in Google Scholar

[21] Onan A . Mining opinions from instructor evaluation reviews: a deep learning approach. Comput Appl Eng Educ. 2020;28:117–38.10.1002/cae.22179Search in Google Scholar

[22] Onan A , Korukoglu S , Bulut H . Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Appl. 2016;57:232–47. 10.1016/j.eswa.2016.03.045. ISSN 0957-4174.Search in Google Scholar

[23] Onan A , Korukoğlu S , Bulut H . A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Inf Process Manag. 2017;53(4):814–33. 10.1016/j.ipm.2017.02.008. ISSN 0306-4573.Search in Google Scholar

[24] Kontonatsios G , Spencer S , Matthew P , Korkontzelos I . Using a neural network-based feature extraction method to facilitate citation screening for systematic reviews. Expert Syst Appl. 2020;6:100030.10.1016/j.eswax.2020.100030Search in Google Scholar

[25] Key facts about heart disease, the World Health Organization (WHO). Cardiovascular disease; June 2017 [Online]. Available: http://www.who.int/mediacentre/factsheets/fs317/en/.Search in Google Scholar

[26] Vivekanandan T , Ch Sriman Narayana Iyengar N . Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput Biol Med. 2017;90:125–36.10.1016/j.compbiomed.2017.09.011Search in Google Scholar PubMed

[27] Kumar V . Feature selection: a literature review. Smart Comput Rev. 2014;4:211–29. 10.6029/smartcr.2014.03.007.Search in Google Scholar

[28] Shrivastava P , Shukla A , Vepakomma P , Bhansali N , Verma K . A survey of nature-inspired algorithms for feature selection to identify Parkinson’s disease. Comput Methods Prog Biomed. 2017;139:171–9.10.1016/j.cmpb.2016.07.029Search in Google Scholar PubMed

[29] Sudarson J , Balasaheb T . Improved artificial neural network (ANN) with aid of artificial bee colony (ABC) for medical data classification. Int J Bus Intell Data Min. 2017;1:1. 10.1504/IJBIDM.2017.10010713.Search in Google Scholar

[30] Peng Y , Wu Z , Jiang J . A novel feature selection approach for biomedical data classification. J Biomed Inform. 2010;43:15–23.10.1016/j.jbi.2009.07.008Search in Google Scholar PubMed

[31] Xie J , Wu J . Feature selection algorithm based on association rule mining method. Eighth IEEE/ACIS ICCIS; 2009.10.1109/ICIS.2009.103Search in Google Scholar

[32] Ferone A . Feature selection based on the composition of rough sets induced by feature granulation. Int J Approx Reason. 2018;101:276–92.10.1016/j.ijar.2018.07.011Search in Google Scholar

[33] Liua J , Lin Y . Feature selection based on the quality of information. Neurocomputing. 2017;225:11–22.10.1016/j.neucom.2016.11.001Search in Google Scholar

[34] Oppenheim AV , Schafer RW . Digital signal processing. 1st edn. The University of Michigan, Pearson; Jan 12 1975.10.21236/ADA110902Search in Google Scholar

[35] Proakis JG , Manolakis DK . Digital signal processing: principles, algorithms, and applications. 3rd edn. South Asia: Pearson Publications; 1996.Search in Google Scholar

[36] Dua D , Graff C . UCI Machine learning repository. Irvine, CA: The University of California, School of Information and Computer Science; 2019. http://archive.ics.uci.edu/ml.Search in Google Scholar

[37] He H , Garcia EA . Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21:9.10.1109/TKDE.2008.239Search in Google Scholar

[38] Tajanpure RR , Jena S . Diagnosis of disease using feature decimation with multiple classifier system. In: Dash S , Das S , Panigrahi B , editors. International Conference on Intelligent Computing and Applications. Advances in Intelligent Systems and Computing, 632. Singapore: Springer; 2018.10.1007/978-981-10-5520-1_7Search in Google Scholar

[39] Solorio-Fernández S , Martínez-Trinidad J , Ariel Carrasco-Ochoa J . A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognit. 2017;72:314–26.10.1016/j.patcog.2017.07.020Search in Google Scholar

[40] Alkım E , Gürbüz E , Kılıç E . A fast and adaptive automated disease diagnosis method with an innovative neural network model. Neural Netw. 2012;33:88–96.10.1016/j.neunet.2012.04.010Search in Google Scholar

[41] Jain D , Singh V . Feature selection and classification systems for chronic disease prediction: a review. Egypt Inf J. 2018;19:179–89.10.1016/j.eij.2018.03.002Search in Google Scholar

[42] Cura T . Use of support vector machines with a parallel local search algorithm for data classification and feature selection. Expert Syst Appl. 2020;145:113133.10.1016/j.eswa.2019.113133Search in Google Scholar

[43] Yasmin G , Das AK , Nayak J , Pelusi D , Ding W . Graph based feature selection investigating boundary region of rough set for language identification. Expert Syst Appl. 2020;158:113575.10.1016/j.eswa.2020.113575Search in Google Scholar

[44] Song W , Wang ST , Li CH . Parametric and nonparametric evolutionary computing with a content-based feature selection approach for parallel categorization. Expert Syst Appl. 2009;36:11934–43.10.1016/j.eswa.2009.03.068Search in Google Scholar

[45] Dash M , Liu H . Consistency-based search in feature selection. Artif Intell. 2003;151:155–76.10.1016/S0004-3702(03)00079-1Search in Google Scholar

[46] Weitschek E , Felici G , Bertolazzi P . Clinical data mining: problems, pitfalls and solutions. 2013 24th International Workshop on Database and Expert Systems Applications, 1529-4188/13 $2600@2013. IEEE; 2013. 10.1109/DEXA.2013.42.Search in Google Scholar

[47] Dash M , Liub H . Consistency-based search in feature selection. Artif Intell. 2003;151:155–76.10.1016/S0004-3702(03)00079-1Search in Google Scholar

[48] Ferone A . Feature selection based on composition of rough sets induced by feature granulation. Int J Approx Reason. 2018;101:276–92.10.1016/j.ijar.2018.07.011Search in Google Scholar

[49] Vijay Bhaskar S , Gupta A , Lalwani P . An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition. J Supercomput. 2021;103:1–24.Search in Google Scholar

[50] Vijay Bhaskar S , Neha G , Praveen L , Vishwanath B , Abhay Kumar A . Pattern identification of different human joints for different human walking styles using inertial measurement unit (IMU) sensor. Artif Intell Rev. 2021;1–21.Search in Google Scholar

[51] Dua N , Singh SN , Semwal VB . Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing. 2021;103:1461–78.10.1007/s00607-021-00928-8Search in Google Scholar

[52] Smith SW . The scientist and engineer’s guide to digital signal processing. San Diego, Calif: California Technical Publishing; 1997.Search in Google Scholar

[53] Tarle B . Integrating multiple methods to enhance medical data classification. Evol Syst. 2020;11:133–42. 10.1007/s12530-019-09272-x.Search in Google Scholar

Received: 2020-06-22
Revised: 2021-08-29
Accepted: 2021-09-08
Published Online: 2021-10-22

© 2021 Rupali Tajanpure and Akkalakshmi Muddana, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Research Articles
  2. Best Polynomial Harmony Search with Best β-Hill Climbing Algorithm
  3. Face Recognition in Complex Unconstrained Environment with An Enhanced WWN Algorithm
  4. Performance Modeling of Load Balancing Techniques in Cloud: Some of the Recent Competitive Swarm Artificial Intelligence-based
  5. Automatic Generation and Optimization of Test case using Hybrid Cuckoo Search and Bee Colony Algorithm
  6. Hyperbolic Feature-based Sarcasm Detection in Telugu Conversation Sentences
  7. A Modified Binary Pigeon-Inspired Algorithm for Solving the Multi-dimensional Knapsack Problem
  8. Improving Grey Prediction Model and Its Application in Predicting the Number of Users of a Public Road Transportation System
  9. A Deep Level Tagger for Malayalam, a Morphologically Rich Language
  10. Identification of Biomarker on Biological and Gene Expression data using Fuzzy Preference Based Rough Set
  11. Variable Search Space Converging Genetic Algorithm for Solving System of Non-linear Equations
  12. Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling
  13. Crowd counting via Multi-Scale Adversarial Convolutional Neural Networks
  14. Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews
  15. Simulation of Human Ear Recognition Sound Direction Based on Convolutional Neural Network
  16. Kinect Controlled NAO Robot for Telerehabilitation
  17. Robust Gaussian Noise Detection and Removal in Color Images using Modified Fuzzy Set Filter
  18. Aircraft Gearbox Fault Diagnosis System: An Approach based on Deep Learning Techniques
  19. Land Use Land Cover map segmentation using Remote Sensing: A Case study of Ajoy river watershed, India
  20. Towards Developing a Comprehensive Tag Set for the Arabic Language
  21. A Novel Dual Image Watermarking Technique Using Homomorphic Transform and DWT
  22. Soft computing based compressive sensing techniques in signal processing: A comprehensive review
  23. Data Anonymization through Collaborative Multi-view Microaggregation
  24. Model for High Dynamic Range Imaging System Using Hybrid Feature Based Exposure Fusion
  25. Characteristic Analysis of Flight Delayed Time Series
  26. Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French
  27. Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text
  28. MAPSOFT: A Multi-Agent based Particle Swarm Optimization Framework for Travelling Salesman Problem
  29. Research on target feature extraction and location positioning with machine learning algorithm
  30. Swarm Intelligence Optimization: An Exploration and Application of Machine Learning Technology
  31. Research on parallel data processing of data mining platform in the background of cloud computing
  32. Student Performance Prediction with Optimum Multilabel Ensemble Model
  33. Bangla hate speech detection on social media using attention-based recurrent neural network
  34. On characterizing solution for multi-objective fractional two-stage solid transportation problem under fuzzy environment
  35. Deep Large Margin Nearest Neighbor for Gait Recognition
  36. Metaheuristic algorithms for one-dimensional bin-packing problems: A survey of recent advances and applications
  37. Intellectualization of the urban and rural bus: The arrival time prediction method
  38. Unsupervised collaborative learning based on Optimal Transport theory
  39. Design of tourism package with paper and the detection and recognition of surface defects – taking the paper package of red wine as an example
  40. Automated system for dispatching the movement of unmanned aerial vehicles with a distributed survey of flight tasks
  41. Intelligent decision support system approach for predicting the performance of students based on three-level machine learning technique
  42. A comparative study of keyword extraction algorithms for English texts
  43. Translation correction of English phrases based on optimized GLR algorithm
  44. Application of portrait recognition system for emergency evacuation in mass emergencies
  45. An intelligent algorithm to reduce and eliminate coverage holes in the mobile network
  46. Flight schedule adjustment for hub airports using multi-objective optimization
  47. Machine translation of English content: A comparative study of different methods
  48. Research on the emotional tendency of web texts based on long short-term memory network
  49. Design and analysis of quantum powered support vector machines for malignant breast cancer diagnosis
  50. Application of clustering algorithm in complex landscape farmland synthetic aperture radar image segmentation
  51. Circular convolution-based feature extraction algorithm for classification of high-dimensional datasets
  52. Construction design based on particle group optimization algorithm
  53. Complementary frequency selective surface pair-based intelligent spatial filters for 5G wireless systems
  54. Special Issue: Recent Trends in Information and Communication Technologies
  55. An Improved Adaptive Weighted Mean Filtering Approach for Metallographic Image Processing
  56. Optimized LMS algorithm for system identification and noise cancellation
  57. Improvement of substation Monitoring aimed to improve its efficiency with the help of Big Data Analysis**
  58. 3D modelling and visualization for Vision-based Vibration Signal Processing and Measurement
  59. Online Monitoring Technology of Power Transformer based on Vibration Analysis
  60. An empirical study on vulnerability assessment and penetration detection for highly sensitive networks
  61. Application of data mining technology in detecting network intrusion and security maintenance
  62. Research on transformer vibration monitoring and diagnosis based on Internet of things
  63. An improved association rule mining algorithm for large data
  64. Design of intelligent acquisition system for moving object trajectory data under cloud computing
  65. Design of English hierarchical online test system based on machine learning
  66. Research on QR image code recognition system based on artificial intelligence algorithm
  67. Accent labeling algorithm based on morphological rules and machine learning in English conversion system
  68. Instance Reduction for Avoiding Overfitting in Decision Trees
  69. Special section on Recent Trends in Information and Communication Technologies
  70. Special Issue: Intelligent Systems and Computational Methods in Medical and Healthcare Solutions
  71. Arabic sentiment analysis about online learning to mitigate covid-19
  72. Void-hole aware and reliable data forwarding strategy for underwater wireless sensor networks
  73. Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language
  74. An optimization of color halftone visual cryptography scheme based on Bat algorithm
  75. Identification of efficient COVID-19 diagnostic test through artificial neural networks approach − substantiated by modeling and simulation
  76. Toward agent-based LSB image steganography system
  77. A general framework of multiple coordinative data fusion modules for real-time and heterogeneous data sources
  78. An online COVID-19 self-assessment framework supported by IoMT technology
  79. Intelligent systems and computational methods in medical and healthcare solutions with their challenges during COVID-19 pandemic
Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2020-0064/html
Scroll to top button