Home A New Feature Selection Method for Sentiment Analysis in Short Text
Article Open Access

A New Feature Selection Method for Sentiment Analysis in Short Text

  • H. M. Keerthi Kumar EMAIL logo and B. S. Harish
Published/Copyright: December 4, 2018
Become an author with De Gruyter Brill

Abstract

In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.

MSC 2010: 68U15

1 Introduction

The popularity of micro-blog applications in the recent decade generates enormous amount of short textual information. Millions of users make use of micro-blog sites to express their opinion or sentiment related to a product, topic, or events which take place in day to day life [39]. An opinion may be regarded as statements in which the opinion holder makes specific claim about a topic using certain sentiment [8], [24]. Many marketing companies use micro-blog textual information to identify sentiments related to the product or an event [10], [58]. The information retrieved from micro-blogs may involve at least two specific issues: firstly, use of formal languages, all in electronic word-of-mouth, which may lead to misspellings and use of slang words. Secondly, the limited characters which may tend to shortened words or sentences making analysis difficult. The detection and analysis of sentiments in short texts is an attractive topic, for many researchers and practitioners, to classify text into different polarities or classes.

A sentiment analysis is a process of automatically extracting opinions or emotions from text, especially in user-generated textual content. Sentiment analysis is considered as a classification task which classifies text into positive, negative, or neutral classes [4], [7], [16], [35], [55]. In order to create an automated system that performs an effective sentiment analysis, several researchers [23], [32], [33], [34], [50] came up with two main approaches: semantic orientation [3], [50] and machine learning method [33], [52]. Semantic orientation-based approach for sentiment analysis encompasses lexicon-based [47] and linguistic methods [46]. It has been claimed that lexicon-based and linguistic methods do not perform well on sentiment classification, due to the nature of an opinionated text, which requires more understanding of the text [48]. In addition to lexicon-based and linguistic methods, machine learning methods have been widely used for sentiment analysis [20]. In literature [6], [14], [18], machine learning-based approaches yield better predictive performance for sentiment analysis compared to lexicon-based methods. Generally, sentiment analysis based on machine learning algorithms can be performed using five steps viz., preprocessing, feature extraction and selection, representation, classification or clustering, and evaluation [17], [20]. Sentiment analysis on short texts needs to deal with high dimensionality of the features, due to low occurrence rate of feature across short texts. Most of the features are irrelevant and lead to poor performance of the classifier [36]. Therefore, selecting relevant features reduces the size of the feature space without sacrificing the performance of the sentiment classification.

In sentiment analysis, feature selection is a method to identify a subset of features to achieve various goals: firstly, to reduce computational cost, secondly, to avoid over fitting, and thirdly, to enhance the classification accuracy of the model [54]. Feature selection methods can be broadly divided into three categories, as filter methods, wrapper methods, and embedded methods [40]. The filter method assesses the optimal subset of features by looking only at the underlying properties of the data. The feature relevance scores are calculated, and low-scoring features are eliminated. The optimal subsets of features are presented to the classifier [42]. Wrapper method evaluates these subsets of features by detecting the possible interactions between features and learning model, i.e. wrapped around features and learning model to get an optimal subset of feature [25]. Embedded methods make use of both filter and wrapper method to select the optimal subset of features which increases the performance of the classifier. The most popular feature selection methods reported in the literature are chi-square ( χ2), entropy, information gain (IG), and mutual information (MI). Further, the selected features are used for the subsequent training of the machine learning classifiers.

The conventional feature selection methods consider the distribution of the short texts containing the feature between the classes. However, they do not take into account the frequency of the features within the classes. Hence, it is noted that a feature that is characteristic of a class must frequently appear in greater numbers in short texts belonging to the class than in other classes. This motivated us to propose a new, simple yet effective feature selection method. The proposed feature selection method considers class-wise features by computing the relevant features from each class. To determine the efficacy of proposed feature selection method, the proposed feature selection method is compared with conventional feature selection methods such as chi-square ( χ2) [45], entropy [44], IG [57], and MI [5]. The proposed method is evaluated using classification accuracy obtained from SVM, KNN, and RF classifiers on two publically available datasets: Stanford Twitter dataset [12] and Ravikiran Janardhana dataset [37].

The remainder of this paper is organized as follows: Section 2 reviews the related work on feature selection methods for sentiment analysis. Section 3 describes methodologies and proposed work with illustration. Section 4 contains experimental results and discussion. Section 5 concludes along with future work.

2 Related Work

Recently, micro-blogs data like tweets, Facebook posts, and reviews are growing at an unprecedented rate [15], [28]. The vast amount of user-generated short textual information has made micro-blogs the largest data source of public opinion. In micro-blogs, users make spelling mistakes and use slang words while expressing their views or opinions. Moreover, these short texts contain enormous amount of noisy data like url, punctuation, and special symbols that need to be preprocessed. The major challenges of sentiment analysis on micro-blogs are limited text, slang terms, high dimensionality, and sparsity. The curse of dimensionality and sparsity are a major concern in sentiment analysis where noisy, irrelevant features are present in feature space. In order to deal with these challenges, many researchers [13], [21], [22], [26], [27], [31], [43], [49], [51] explored the various machine learning approaches and concentrated their studies to the curse of dimensionality and used feature selection methods to reduce high dimensional feature space. Zhang et al. [56] proposed a feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. The method uses latent semantic analysis to find the latent structure of “topics” or “concepts” in a text corpus. The entropy-based feature selection method is used to rank the features related to the topic. The maximum entropy classifiers are used to evaluate the performance while curtailing the feature space significantly.

Zheng et al. [59] explored the effects of feature selection on sentiment analysis on Chinese online reviews. The N-char-grams and N-POS-grams are used to select potential sentimental features. The feature subsets are selected by using improved document frequency method, and feature weights are calculated by adopting Boolean weighting method. The chi-square test is carried out to test the significance of experimental results. The result suggests that low order N-char-grams can achieve a better performance than higher order N-char-grams when taking N-char-grams as features. Omar et al. [30] conducted a series of experimental comparisons on various feature selection methods for Arabic sentiment classification. The performance of various feature selection methods like IG, principal components analysis, Relief-F, Gini index, uncertainty, and chi-square feature selection methods were studied. The naive Bayes (NB), support vector machine (SVM) and K nearest neighbor (KNN) classifiers were used to classify Arabic documents into different polarities. The experimental result shows that the use of feature selection method increases the performance of the classifier. The SVM classifier performed better compared to other classifiers for all feature selection methods.

Various experimental comparisons were conducted on prominent feature extraction for English review analysis in Agarwal and Mittal [2]. The features were extracted using unigram, bi-gram, bi-tagged feature, and dependency parsing tree-based features. Further, IG and minimum redundancy maximum relevancy feature selection methods were used to eliminate the noisy and irrelevant features from the feature vector. SVM and multinomial NB classifiers were used to classify the review document into positive or negative class. The result showed that the multinomial NB performs better than SVM in terms of accuracy and execution time for binary sentiment classification. Wu et al. [53] proposed an improved text feature selection, based on text word frequency information. The method modifies the expected cross entropy algorithm using the following aspects: the frequency distribution within category and the frequency distribution among different categories. The experimental result shows that feature selection method based on occurrence of terms within different classes is essential in reducing feature space and in improving the performance of the classifier.

In literature, many researchers developed various feature selection methods for sentiment analysis. The existing feature selection methods consider the distribution of the short texts containing the feature between the classes. However, they do not take into account the frequency of the features within the classes. Hence, it is noted that a feature which is characteristic of a class must frequently appear in greater numbers in short texts belonging to the class than in other classes. The proposed method selects the frequently distributed features related to each class. This feature selection method is based on class-wise information to identify the relevant feature related to each class. The proposed feature selection method is evaluated using classification accuracy on three classifiers: SVM, KNN, and random forest (RF) classifiers.

3 Proposed Methodology

This section presents a detailed description of the methodology used for sentiment classification. Section 3.1 describes various preprocessing techniques used to eliminate less informative data from the dataset. Section 3.2 briefs text representation used in the proposed method. Section 3.3 gives a detailed description of the proposed feature selection method with an illustration. Finally, Section 3.4 briefs the classifiers used to classify sentiments into positive, negative, and neutral classes.

3.1 Preprocessing

Preprocessing involves the elimination of trivial or less informative data, which does not contribute to the sentiment classification. We used eight preprocessing techniques to process tweets, which are the following:

In tweets, user posted a URL along with text to provide supporting information about the text such as “http://bit.ly/IMXUM”, which does not contribute to sentiment analysis. Hence, URL is replaced with white space.

Usually, tweets consists of username (@), which implies or indicates the user. This username does not contribute much to the sentiment present in the tweets. Hence, we replace username with white space.

Hashtag (#) is associated with the particular topic and opinion expressed by the user in the tweets. We removed only the symbol “#”, retaining the contents.

Negations play a vital role in sentiment classification; the co-occurrence of the negative word e.g. “not”, “n’t”, etc., changes the orientation of text into different polarity. Hence, negation handling is used to expand short terms such as “don’t”, “can’t”, “n’t”, etc., terms to “do not”, “cannot”, “not”, etc.

Usually, tweets contain exaggeration of terms such as “looovvvveee”, and it is necessary to deal with these words to make them more formal. Hence, characters normalization is applied to replace consecutive characters, such as a character that appears more than three times to a single character.

Punctuation symbols such as “,”, “ ’ ”, “$”, “?”, “!”, etc. do not contribute to the sentiment of tweets and are thus removed from the tweets. Finally, stop-words are eliminated and stemming is applied on each tweet.

3.2 Representation

The preprocessed short texts are represented in machine understandable forms. The preprocessed short texts are generally represented as vectors of terms using a bag of words [41] and n-gram (unigram and bigram) [38]. The work of [4] suggests that unigram with term frequency (tf) performs well on sentiment analysis for micro-blogging data. Hence, we have used unigram representation model, which is similar to Bag of Words model. Each word is considered as a term, and term frequency schema is used to calculate the frequency of terms appearing in each short text. The term weights are calculated by term frequency (tfti) schema, i.e. tfti =number of times term ti appeared in a short text, where ti represent the terms (features) present in the short text.

3.3 The Proposed Features Selection Method

In this section, we propose a novel feature selection method based on class-wise information. The class-wise feature selection method comprises three steps: firstly, the class-related short texts are grouped; secondly, the sum of the frequency of each feature corresponding to class are calculated. The obtained weights of feature values are sorted in descending order, and low weighted features are eliminated by fixing the threshold value. Here, threshold value is fixed empirically which indicates the number of features selected from each class. These processes are repeated for each class. Finally, the selected subsets of features from each class are combined to get overall features. These features are used for the subsequent training of the classifiers.

Let there be j number of classes and each class contains k number of short texts. The short texts are described by N dimensional term frequency vector (feature vector). The term document matrix, say S of size (jk×N), is constructed such that each row represents a short text related to class Cj and each column represents a feature F, say F={f1,f2,,fN}.

Firstly, we compute the sum of the frequency of each feature fi corresponding to class Cj, i.e.

(1) ClassTermFrequency(Cj,fi)=st=1kFrequency

where Frequency(Sst,fi) is the frequency of occurrence of the features fi in short text Sst and k is the number of short texts in the class Cj.

The size of the resultant ClassTermFrequency(Cj,fi) matrix will be 1×N for each class. Further, we sort the values of ClassTermFrequency in descending order to get the most frequent occurrences of terms within the classes. A subset N′ features are selected by fixing threshold values. Here, threshold value is fixed based on empirical evaluation. The selected features are F1={f1,f2,,fN}, where N<N and F1 are features corresponding to a class. Similarly, we repeat the procedure for each class. Further, we apply union function to feature sets obtained from each class i.e.

(2) F=F1F2Fj

The obtained F′ features from the above computation are used for the subsequent training of the classifiers.

3.4 Illustration

In this section, a detailed illustration of individual steps involved in the proposed method is explained on the term document matrix S. Initially, short texts are preprocessed using various preprocessing techniques and represented using unigram representation model with term frequency (tfti) schema. Table 1 presents an example of term document matrix say S for k = 6, j = 2, t = 15. Here, k denotes the number of short texts, j denotes the number of classes, and t denotes the number of terms as features.

Table 1:

Term Document Matrix (S).

Short text t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
1 2 3 4 0 1 2 3 4 2 1 3 4 1 2 1 C1
2 1 4 2 1 0 2 2 1 0 1 0 0 2 1 2
3 1 1 0 2 3 3 1 0 1 0 0 0 0 0 0
4 4 0 1 2 3 2 1 0 0 0 1 0 0 1 0 C2
5 2 3 1 0 2 3 4 1 2 0 0 0 1 0 1
6 1 2 3 1 4 1 0 1 1 0 0 1 1 0 1

In the first step, the class-related short texts are grouped to compute the relevant features which contribute towards classes. The short texts S1, S2, and S3 represent the term document matrix related to class C1. Similarly, short texts S4, S5, and S6 present the term document matrix related to class C2.

In the next step, ClassTermFrequency is computed for each class.

The ClassTermFrequency gives the sum of the frequency of features appearing in each class. Tables 2 and 3 give the results of the computation. The obtained matrix will be in the form 1×N, which consists of N dimensional feature vector.

Table 2:

Computation of ClassTermFrequency for Class 1.

Class t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
1 4 8 6 3 4 7 6 5 3 2 3 4 3 3 3
Table 3:

Computation of ClassTermFrequency for Class 2.

Class t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
2 7 5 5 3 9 6 5 2 3 0 1 1 2 1 2

Further, we arrange the computed ClassTermFrequency in descending order based on the weight of features associated with the classes. Tables 4 and 5 show the results of the computations. The resultant matrix gives the highly relevant features that contribute to the classes.

Table 4:

Arranging ClassTermFrequency in Descending Order in Class 1.

Class t2 t6 t3 t7 t8 t1 t5 t12 t4 t9 t11 t13 t14 t15 t10
1 8 7 6 6 5 4 4 4 3 3 3 3 3 3 2
Table 5:

Arranging ClassTermFrequency in Descending Order in Class 2.

Class t5 t1 t6 t2 t3 t7 t4 t9 t8 t13 t15 t11 t12 t14 t10
2 9 7 6 5 5 5 3 3 2 2 2 1 1 1 0

In the next step, we select the threshold value for N′ for each class. The threshold value is arrived at through multiple iterations of considering different values. We considered different values and arrived at the threshold value N′. N=5 is observed as the best value for the given example, where N′ is less than N. The resultant matrix will be the relevant feature vector for each class. Tables 6 and 7 give the N′ feature vector where N<N. The selected features for class 1 are F1={t2,t6,t3,t7,t8} and for class 2 are F2={t5,t1,t6,t2,t3}.

Table 6:

Selecting N=5 for Class 1.

Class t2 t6 t3 t7 t8
1 8 7 6 6 5
Table 7:

Selecting N=5 for Class 2.

Class t5 t1 t6 t2 t3
2 9 7 6 5 5

Further, F′ is composed of the union of the first N′ selected feature vector for each class i.e. F=F1F2. The selected features are F={t1,t2,t3,t5,t6,t7,t8}.

Finally, the selected features F′ are used for the subsequent training of the classifiers.

3.5 Classification

In order to evaluate the performance of the proposed feature selection method, we used three classifiers: SVM, KNN, and RF. SVM is a widely used classifier in sentiment classification tasks. It can effectively conduct classification tasks in higher-dimensional feature space [29]. On the other hand, the objective of KNN classifier is to classify based on majority vote of its neighbors, with the object being assigned to the class most common among its KNN. Here “K” indicates the number of neighbors taken into account in determining the class [1]. RF operates by constructing a multitude of decision trees at training time and outputting the class based on the decision of individual trees [9].

4 Experimental Evaluation

In this section, we present experimentation of the proposed method, and results are compared with existing feature selection methods.

4.1 Dataset Description

The experimentation was conducted on two publicly available datasets: Stanford Twitter Sentiment test dataset (Dataset 1) [12] and Ravikiran Janardhana dataset (Dataset 2) [37]. Dataset 1 contains 498 tweets that come with labels of 182 positive, 177 negative, and 139 neutral tweets. The total number of features obtained after preprocessing (as explained in Section 3.1) is 1586 features. Dataset 2 consists of 9666 positive, 9667 negative, and 2271 neutral tweets, which are combinations of [19] and [11] publicly available twitter message datasets. Here, we randomly choose 6000 tweets from Dataset 2 as overall data, where equal proportions of positive, negative, and neutral tweets are taken for experimentation. We obtained 10,349 features after preprocessing (as explained in Section 3.1) technique.

4.2 Experimental Setup

In this section, we compared the performance of the proposed feature selection method with chi-square ( χ2), entropy, IG, and MI feature selection methods. The experimentation is conducted under three splits of 50:50, 60:40, and 70:30 proportions of training and testing data. In the experiments, 10-fold cross-validation method is utilized. The evaluation of the feature selection methods is based on the classification accuracy obtained from SVM, KNN, and RF classifiers. The experiment was conducted using statistical computing toolkit R language version R-3.1.3. In this experiment, we have used linear kernel SVM classifier, which is considered as the basic form of SVM to classify the text corpus to different polarities or classes. In KNN, K value is fixed empirically as 3, which gives higher accuracy than any other values. RF produces multi-altitude decision trees at input phase, and the output is generated in the form of multiple decision trees. Here, the number of trees (ntree = 100) is considered empirically, which gives higher accuracy as compared to other values.

4.2.1 Dataset 1 (Stanford Twitter dataset – 498 tweets)

Before performing the classification task, the short texts are preprocessed. The total number of features obtained after preprocessing was 1586 distinct features. The proposed feature selection method was applied by selecting threshold values as 100, 300, and 500 related to each class based on the empirical evaluation. The total number of features obtained for 100 is 220 features, for 300 is 678 features, and for 500 is 1154 features. Further increase in the threshold values achieved the original features i.e. 1586 features. Similarly, decrease in the threshold value led to a very small feature set which would not yield good results. Hence, we restricted the threshold values to be between 100 and 500. The obtained feature subsets from each threshold values are taken for comparison with chi-square ( χ2), entropy, IG, and MI feature selection methods. The experimental results are presented in Table 8.

Table 8:

Classification Accuracy on Dataset 1.

Training vs. testing For all 1586 features
Features Feature selection
SVM KNN RF Chi-square
Entropy
Information gain
Mutual information
Proposed method
SVM KNN RF SVM KNN RF SVM KNN RF SVM KNN RF SVM KNN RF
50:50 77.60 78.40 77.20 220 72.00 72.80 70.80 69.33 66.40 76.53 71.73 69.60 79.20 71.73 68.53 78.93 76.40 79.20 78.40
678 74.00 67.20 72.40 71.73 61.60 75.20 69.60 57.33 77.33 68.53 58.40 75.73 80.00 80.00 81.60
1154 75.60 68.00 71.20 69.06 58.13 75.46 68.00 62.66 74.66 66.13 58.40 76.26 80.80 75.60 80.40
60:40 79.00 81.00 83.50 220 73.40 77.50 74.00 67.33 66.33 78.33 75.33 69.33 80.00 76.76 70.00 80.66 82.50 80.50 83.50
678 76.00 79.00 74.50 69.66 60.66 78.66 68.66 57.66 77.00 67.33 58.33 78.00 83.00 82.50 83.50
1154 80.50 71.00 81.00 66.66 58.33 78.00 67.33 63.66 77.66 65.66 58.66 76.33 82.00 80.00 83.00
70.30 81.45 85.00 83.50 220 74.83 81.45 75.49 72.00 65.77 78.33 74.66 70.22 80.88 73.77 70.66 80.00 78.80 84.10 90.00
678 82.11 84.10 82.11 66.22 61.77 78.66 69.33 59.11 77.33 68.00 59.11 78.66 81.45 85.43 87.41
1154 76.15 80.79 79.47 65.22 59.11 78.00 66.66 65.33 76.88 66.22 59.11 78.22 86.09 86.75 82.11

In the first set of experiments (50:50 split), the classification accuracy obtained for the original 1586 features using SVM is 77.60%, 78.40% using KNN and 77.20% using RF. Table 8 presents the classification accuracy using the feature selection methods χ2, entropy, IG, and MI and proposed feature selection method using SVM, KNN, and RF classifier on varying feature subsets. From the observations, it is noted that the RF classifier achieves maximum accuracy of 81.60% for 678 features compared to the other two classifiers.

In the second set of experiments (60:40 split), the proposed feature selection method using RF classifier exhibits the same classification accuracy of 83.50% for 220 and 678 features. However, in Table 8 the classification accuracy of RF classifier on the proposed feature selection method increases by 9.5%, 5.17%, 3.5%, and 2.84% for 220 features compared to χ2, entropy, IG, and MI, respectively. On the other hand, for the classification accuracy of the proposed feature selection method using RF classifier for 678 features, there is an increase of 9%, 4.84%, 6.5%, and 5.5% as compared to χ2, entropy, IG, and MI, respectively. Hence, RF classifier performs better for the second set of experimentation.

In the third set of experiments (70:30 split), the classification accuracy obtained for original features using SVM is 81.45%, 85.00% using KNN, and 83.50% using RF. From Table 8, we can observe that the proposed feature selection method achieves a maximum classification accuracy of 86.09% and 86.75% for 1154 features using SVM and KNN classifiers, respectively. The proposed feature selection method achieves better classification accuracy of 90% for 220 features using RF classifier compared to the other classifiers.

4.2.2 Dataset 2 (Ravikiran Janardhana dataset)

Initially, short texts were preprocessed using various preprocessing techniques and represented using the unigram model. The total number of features obtained after preprocessing are 10,349 distinct features. We applied the proposed feature selection method by selecting the threshold values of 3500, 4000, 4500, 5000, and 5500 related to each class based on empirical evaluation. The total number of features obtained was 7200, 7922, 8649, 9359, and 10,084 features, respectively, for each of the respective threshold values. Further, by increasing the threshold values, we achieved the original features i.e. 10,349 features. Hence, we restricted the threshold value to 5500. Similarly, by decreasing the threshold value, we achieved very few feature sets which would not yield good results. Hence, we restricted the threshold value between 3500 and 5500. The obtained feature sets were evaluated using chi-square ( χ2), entropy, IG, and MI feature selection methods. The experimental results are tabulated in Table 9.

Table 9:

Classification Accuracy on Dataset 2.

Training vs. testing For all 10,349 features
Features Feature selection
SVM KNN RF Chi-square
Entropy
Information gain
Mutual information
Proposed method
SVM KNN RF SVM KNN RF SVM KNN RF SVM KNN RF SVM KNN RF
50:50 74.30 62.90 78.70 7200 68.90 63.70 77.95 74.10 65.00 78.55 70.51 58.84 78.97 76.57 65.42 79.35 74.35 68.33 78.53
7922 73.40 60.00 77.82 74.00 64.90 78.17 70.62 67.28 79.13 76.60 65.33 79.77 75.00 64.95 78.86
8649 71.60 60.90 77.86 74.60 64.70 78.64 71.37 67.42 79.00 76.35 64.60 80.26 74.88 68.30 78.71
9359 74.00 62.40 72.60 74.40 64.60 78.43 70.22 64.22 79.40 76.00 64.06 79.71 74.46 65.00 78.93
10084 74.00 62.80 73.86 74.20 63.10 78.48 75.64 63.86 78.37 76.02 64.04 79.15 74.90 64.70 79.46
60:40 74.70 65.00 84.36 7200 74.50 64.20 78.00 74.60 66.10 84.25 78.47 70.11 84.69 76.91 66.77 84.52 75.40 63.90 84.63
7922 74.80 60.90 77.52 75.10 66.00 84.36 77.16 69.83 84.66 76.33 66.66 85.00 74.80 66.10 84.61
8649 74.50 62.30 78.19 75.20 65.70 84.63 77.08 69.72 84.33 77.02 66.55 85.08 74.83 65.70 84.55
9359 74.60 64.20 73.69 75.30 65.60 84.18 76.08 66.11 84.44 76.02 66.08 84.75 74.80 65.60 84.72
10084 72.60 64.90 77.80 75.20 65.00 84.22 75.50 65.44 84.50 75.50 65.61 84.55 75.02 65.00 84.69
70:30 74.90 66.30 84.63 7200 76.10 64.90 78.37 75.90 71.10 84.37 78.03 68.44 85.51 77.37 71.77 58.29 73.81 68.14 85.44
7922 74.50 61.10 78.55 75.40 71.10 85.11 77.70 70.85 85.14 77.74 71.66 85.66 75.92 71.30 85.44
8649 74.90 63.00 74.40 75.20 70.80 84.40 76.70 70.51 85.22 77.18 71.07 85.07 75.50 71.00 85.14
9359 75.90 65.40 74.40 75.70 70.30 84.96 77.25 67.66 85.29 77.51 67.59 85.14 75.20 70.51 84.92
10084 74.80 66.20 77.22 75.60 70.00 84.07 76.55 66.74 84.88 76.07 67.00 85.37 75.51 70.03 84.81

In the first set of experiments (50:50 split), the classification accuracy obtained for the original features using SVM is 74.30%, 62.90% using KNN, and 78.70% using RF. Table 9 presents the classification accuracy using SVM, KNN, and RF classifiers, respectively. From Table 9, it can be observed that the proposed feature selection method achieves 68.33% classification accuracy for 7200 features using KNN classifiers. On the other hand, RF classifier achieves maximum accuracy of 79.46% for 10,084 features with increase of 5.6%, 0.98%, 1.09%, 0.31% for chi-square ( χ2), entropy, IG, and MI feature selection methods, respectively. From the observations, it is noted that the RF classifier achieves better results when compared to the other two classifiers.

In the second set of experiments (60:40 split), the classification accuracy obtained for the original features using SVM is 74.70%, 65.00% using KNN, and 84.36% using RF. From Table 9, it is evident that the IG feature selection method achieves a maximum accuracy for 7200 features. However, from Table 9 we can observe that the proposed feature selection method achieves 84.69% for 10,084 features with an increase of 6.89% for χ2, 0.47% for entropy, 0.19% for IG, and 0.14% for MI features selection method using RF classifier.

In the third set of experiments (70:30 split), Table 9 depicts that the IG feature selection method gives better result for 7200 features using SVM and RF classifier. However, the proposed feature selection method achieves a classification accuracy of 70.51% for 9359 features using KNN classifier with increase in 5.11%, 0.21%, 2.85%, and 2.92% for χ2, entropy, IG, and MI features selection methods, respectively. The proposed feature selection method performs competently similarly in terms of classification accuracy to IG and MI feature selection methods in most of the feature subsets using RF classifier.

4.3 Discussion

It is evident from Table 8 that the proposed feature selection method performs better than the chi-square χ2, entropy, IG, and MI feature selection methods using SVM, KNN, and RF classifiers on the Stanford Twitter Sentiment dataset. From Table 8, we can infer that, in terms of classification accuracy, RF performed better compared to the other classifiers. On the other hand, the proposed feature selection method was also experimented on in the Ravikiran Janardhana dataset. The proposed feature selection method considers the frequency of the features distributed within the class rather than the frequency distribution between the classes. The χ2 score was calculated based on the term independent from the class. Thus, the proposed feature selection method performs better than the chi-square feature selection, on both datasets. Entropy measures the uncertainty of a distribution, which expresses the average amount of information contained in a text. In the proposed feature selection method, features corresponding to classes are considered to select the most relevant feature. Thus, the proposed feature selection method performs better using entropy feature selection method on both datasets.

On the other hand, the IG and MI scores are calculated based on the probabilities of terms or features occurrences in the classes. In IG, the scores are computed based on conditional probability of a class for a given term and entropy. IG considers presence or absence of the term or feature in a given input text. Dataset 1 consists of fewer numbers of presence or absence of features compared to Dataset 2. However, the proposed method purely depends on the frequency of the features distributed within the classes. Therefore, the proposed feature selection method on Dataset 2 performs reasonably good compared to IG in most of the feature subsets than Dataset 1. Similarly, MI is strongly influenced by the marginal probabilities of the features where it measures the dependencies between random terms or features. It can be observed from Table 9 that Dataset 2 consists of higher number of features than Dataset 1. The proposed method depends on the frequency of features to select discriminative features rather than probabilities of two random features. Therefore, the proposed feature selection method performs competitively better compared to MI in most of the feature subsets on Dataset 2. The overall results show that the proposed feature selection method outperforms other feature selection methods in terms of classification accuracy on Dataset 1. On the other hand, the proposed feature selection method on Dataset 2 significantly outperforms chi-square and entropy feature selection methods. In case of IG and MI feature selection methods, competitive result can be found in most of the feature subsets in terms of classification accuracy.

5 Conclusion and Future Work

Sentiment analysis on short text is a recent and active area of research. In short text, there are many challenges that need to be addressed i.e. use of formal language, misspellings, and shortened form of words, which leads to high dimensionality and sparsity. To deal with these challenges, in this paper, we proposed a novel, simple, and yet effective feature selection method based on frequently distributed features related to each class. The experimental results of the proposed feature selection method are compared with chi-square ( χ2), entropy, IG, and MI feature selection methods using SVM, KNN, and RF classifiers, on two publically available datasets. The experimental result shows that the proposed feature selection method outperforms other feature selection methods in terms of classification accuracy on Dataset 1. On the other hand, the proposed feature selection method performs competently similarly in terms of classification accuracy to IG and MI feature selection methods in most of the feature subsets on Dataset 2.

In future, we would like to amalgamate (a) the statistical methods for calculating threshold values and (b) the n-gram representation (bigrams and trigrams) on the proposed feature selection using different classifiers which could further enhance the classification performance.

Acknowledgments

H. M. Keerthi Kumar has been financially supported by UGC under Rajiv Gandhi National Fellowship (RGNF) Letter no. F1-17.1/2016-17/RGNF-2015-17-SC-KAR-6370/(SA-III Website), JSSRF (University of Mysore), Karnataka, India.

Bibliography

[1] D. A. Adeniyi, Z. Wei and Y. Yongquan, Automated web usage data mining and recommendation system using K-nearest neighbor (KNN) classification method, Appl. Comput. Inform. 12 (2016), 90–108.10.1016/j.aci.2014.10.001Search in Google Scholar

[2] B. Agarwal and N. Mittal, Prominent feature extraction for review analysis: an empirical study, J. Exp. Theor. Artif. Intell. 28 (2016), 485–498.10.1080/0952813X.2014.977830Search in Google Scholar

[3] B. Agarwal and N. Mittal, Semantic orientation-based approach for sentiment analysis, in: Prominent Feature Extraction for Sentiment Analysis, pp. 77–88, Springer, Cham, Switzerland, 2016.10.1007/978-3-319-25343-5_6Search in Google Scholar

[4] A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R. Passonneau, Sentiment analysis of twitter data, in: Proceedings of the Workshop on Languages in Social Media, pp. 30–38, Association for Computational Linguistics, Portland, Oregon, 2011.Search in Google Scholar

[5] D. Agnihotri, K. Verma and P. Tripathi, Variable Global Feature Selection Scheme for automatic classification of text documents, Expert Syst. Appl. 81 (2017), 268–281.10.1016/j.eswa.2017.03.057Search in Google Scholar

[6] A. Al-Saffar, S. Awang, H. Tao, N. Omar, W. Al-Saiagh and M. Al-bared, Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm, PLoS One 13 (2018), e0194852.10.1371/journal.pone.0194852Search in Google Scholar PubMed PubMed Central

[7] R. K. Amplayo and M. Song, An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews, Data Knowl. Eng. 110 (2017), 54–67.10.1016/j.datak.2017.03.009Search in Google Scholar

[8] M. R. Bouadjenek, H. Hacid and M. Bouzeghoub, Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms, Inform. Syst. 56 (2016), 1–18.10.1016/j.is.2015.07.008Search in Google Scholar

[9] A. Bouaziz, C. Dartigues-Pallez, C. da Costa Pereira, F. Precioso and P. Lloret, Short text classification using semantic random forest, in: International Conference on Data Warehousing and Knowledge Discovery, pp. 288–299, Springer, Cham, Switzerland, 2014.10.1007/978-3-319-10160-6_26Search in Google Scholar

[10] M. S. Checkley, D. Añón Higón and H. Alles, The hasty wisdom of the mob: how market sentiment predicts stock market behavior, Expert Syst. Appl. 77 (2017), 256–263.10.1016/j.eswa.2017.01.029Search in Google Scholar

[11] Corpus, Sanders-Twitter Sentiment, http://www.sananalytics.com/lab/twitter-sentiment/sanders-twitter-0.2.zip. Accessed 10 October, 2017.Search in Google Scholar

[12] [Dataset], Sentiment140 corpus, http://help.sentiment140.com/for-students/. Accessed 10 November, 2018.Search in Google Scholar

[13] M. del Pilar Salas-Zarate, M. A. Paredes-Valverde, J. Limon, D. A. Tlapa and Y. A. Báez, Sentiment classification of spanish reviews: an approach based on feature selection and machine learning methods, J. Univers. Comput. Sci. 22 (2016), 691–708.Search in Google Scholar

[14] M. D. Devika, C. Sunitha and A. Ganesh, Sentiment analysis: a comparative study on different approaches, Procedia Comput. Sci. 87 (2016), 44–49.10.1016/j.procs.2016.05.124Search in Google Scholar

[15] C. Francalanci and A. Hussain, Influence-based Twitter browsing with NavigTweet, Inform. Syst. 64 (2017), 119–131.10.1016/j.is.2016.07.012Search in Google Scholar

[16] G. Ganu, Y. Kakodkar and A. Marian, Improving the quality of predictions using textual information in online user reviews, Inform. Syst. 38 (2013), 1–15.10.1016/j.is.2012.03.001Search in Google Scholar

[17] G. Gautam and D. Yadav, Sentiment analysis of twitter data using machine learning approaches and semantic analysis, in: Contemporary Computing (IC3), 2014 Seventh International Conference on, pp. 437–442, IEEE, Noida, India, 2014.10.1109/IC3.2014.6897213Search in Google Scholar

[18] G. Gezici, B. Yankoğlu, D. Tapucu and Y. Saygn, New features for sentiment analysis: do sentences matter? in: CEUR Workshop Proceedings, Bristol, UK, 2012.Search in Google Scholar

[19] A. Go, R. Bhayani and L. Huang, Twitter sentiment classification using distant supervision, CS224N Project Report, Stanford 1 (2009), 12.Search in Google Scholar

[20] E. Haddi, X. Liu and Y. Shi, The role of text pre-processing in sentiment analysis, Procedia Comput. Sci. 17 (2013), 26–32.10.1016/j.procs.2013.05.005Search in Google Scholar

[21] B. S. Harish and M. B. Revanasiddappa, A comprehensive survey on various feature selection methods to categorize text documents, Int. J. Comput. Appl. 164 (2017), 1–7.10.5120/ijca2017913711Search in Google Scholar

[22] C. Huang, J. Zhu, Y. Liang, M. Yang, G. Pui, C. Fung and J. Luo, An efficient automatic multiple objectives optimization feature selection strategy for internet text classification, Int. J. Mach. Learn. Cyb. 9 (2018), 1–13.10.1007/s13042-018-0793-xSearch in Google Scholar

[23] C. Hung, Word of mouth quality classification based on contextual sentiment lexicons, Inform. Process. Manag. 53 (2017), 751–763.10.1016/j.ipm.2017.02.007Search in Google Scholar

[24] S.-M. Kim and E. Hovy, Determining the sentiment of opinions, in: Proceedings of the 20th International Conference on Computational Linguistics, p. 1367, Association for Computational Linguistics, Geneva, Switzerland, 2004.10.3115/1220355.1220555Search in Google Scholar

[25] R. Kohavi and G. H. John, Wrappers for feature subset selection, Artif. Intell. 97 (1997), 273–324.10.1016/S0004-3702(97)00043-XSearch in Google Scholar

[26] S. Kübler, C. Liu and Z. A. Sayyed, To use or not to use: feature selection for sentiment analysis of highly imbalanced data, Nat. Lang. Eng. 24 (2018), 3–37.10.1017/S1351324917000298Search in Google Scholar

[27] A. Kumar and R. Khorwal, Firefly algorithm for feature selection in sentiment analysis, in: Computational Intelligence in Data Mining, pp. 693–703, Springer, Singapore, 2017.10.1007/978-981-10-3874-7_66Search in Google Scholar

[28] B. Li, K. C. C. Chan, C. Ou and S. Ruifeng, Discovering public sentiment in social media for predicting stock movement of publicly listed companies, Inform. Syst. 69 (2017), 81–92.10.1016/j.is.2016.10.001Search in Google Scholar

[29] H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng. 17 (2005), 491–502.10.1109/TKDE.2005.66Search in Google Scholar

[30] N. Omar, M. Albared, T. Al-Moslmi and A. Al-Shabi, A comparative study of feature selection and machine learning algorithms for Arabic sentiment classification, in: Asia Information Retrieval Symposium, pp. 429–443, Springer, Charm, Singapore, 2014.10.1007/978-3-319-12844-3_37Search in Google Scholar

[31] A. Onan and S. Korukoğlu, A feature selection model based on genetic rank aggregation for text sentiment classification, J. Inf. Sci. 43 (2017), 25–38.10.1177/0165551515613226Search in Google Scholar

[32] A. C. Pandey, D. S. Rajpoot and M. Saraswat, Twitter sentiment analysis using hybrid cuckoo search method, Inform. Process. Manag. 53 (2017), 764–779.10.1016/j.ipm.2017.02.004Search in Google Scholar

[33] B. Pang, L. Lee and S. Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, pp. 79–86, Association for Computational Linguistics, Philadelphia, 2002.10.3115/1118693.1118704Search in Google Scholar

[34] I. Penalver-Martinez, F. Garcia-Sanchez, R. Valencia-Garcia, M. A. Rodriguez-Garcia, V. Moreno, A. Fraga and J. L. Sanchez-Cervantes, Feature-based opinion mining through ontologies, Expert Syst. Appl. 41 (2014), 5995–6008.10.1016/j.eswa.2014.03.022Search in Google Scholar

[35] D.-H. Pham and A.-C. Le, Learning multiple layers of knowledge representation for aspect based sentiment analysis, Data Knowl. Eng. 114 (2017), 26–39.10.1016/j.datak.2017.06.001Search in Google Scholar

[36] R. H. W. Pinheiro, G. D. C. Cavalcanti, R. F. Correa and T. I. Ren, A global-ranking local feature selection method for text categorization, Expert Syst. Appl. 39 (2012), 12851–12857.10.1016/j.eswa.2012.05.008Search in Google Scholar

[37] J. Ravikiran, Twitter sentiment analysis and opinion mining, Data Mining Project Report, 2010.Search in Google Scholar

[38] Y. Ren, R. Wang and D. Ji, A topic-enhanced word embedding for Twitter sentiment classification, Inform. Sci. 369 (2016), 188–198.10.1016/j.ins.2016.06.040Search in Google Scholar

[39] F. Riquelme and P. González-Cantergiani, Measuring user influence on Twitter: a survey, Inform. Process. Manag. 52 (2016), 949–975.10.1016/j.ipm.2016.04.003Search in Google Scholar

[40] Y. Saeys, I. Inza and P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007), 2507–2517.10.1093/bioinformatics/btm344Search in Google Scholar

[41] G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, Inform. Process. Manag. 24 (1988), 513–523.10.1016/0306-4573(88)90021-0Search in Google Scholar

[42] N. Sánchez-Maroño, A. Alonso-Betanzos and M. Tombilla-Sanromán, Filter methods for feature selection – a comparative study, in: Intelligent Data Engineering and Automated Learning-IDEAL 2007, 178–187, 2007.10.1007/978-3-540-77226-2_19Search in Google Scholar

[43] R. Shahid, S. T. Javed and K. Zafar, Feature selection based classification of sentiment analysis using Biogeography optimization algorithm, in: Innovations in Electrical Engineering and Computational Technologies (ICIEECT), 2017 International Conference on, pp. 1–5, IEEE, Karachi, Pakistan, 2017.10.1109/ICIEECT.2017.7916549Search in Google Scholar

[44] C. E. Shannon, A mathematical theory of communication, ACM SIGMOBILE Mob. Comput. Commun. Rev. 5 (2001), 3–55.10.1145/584091.584093Search in Google Scholar

[45] F. Song, S. Liu and J. Yang, A comparative study on text representation schemes in text categorization, Pattern Anal. Appl. 8 (2005), 199–209.10.1007/s10044-005-0256-3Search in Google Scholar

[46] M. Taboada, Sentiment analysis: an overview from linguistics, Annu. Rev. Linguist. 2 (2016), 325–347.10.1146/annurev-linguistics-011415-040518Search in Google Scholar

[47] M. Taboada, J. Brooke, M. Tofiloski, K. Voll and M. Stede, Lexicon-based methods for sentiment analysis, Comput. Linguist. 37 (2011), 267–307.10.1162/COLI_a_00049Search in Google Scholar

[48] M. Thelwall, K. Buckley and G. Paltoglou, Sentiment in Twitter events, J. Assoc. Inform. Sci. Technol. 62 (2011), 406–418.10.1002/asi.21462Search in Google Scholar

[49] A. Tommasel and D. Godoy, A Social-aware online short-text feature selection technique for social media, Inform. Fusion 40 (2018), 1–17.10.1016/j.inffus.2017.05.003Search in Google Scholar

[50] P. D. Turney and M. L. Littman, Measuring praise and criticism: inference of semantic orientation from association, ACM Trans. Inform. Syst. (TOIS) 21 (2003), 315–346.10.1145/944012.944013Search in Google Scholar

[51] A. K. Uysal and Y. L. Murphey, Sentiment classification: feature selection based approaches versus deep learning, in: Computer and Information Technology (CIT), 2017 IEEE International Conference on, pp. 23–30, IEEE, Helsinki, Finland, 2017.10.1109/CIT.2017.53Search in Google Scholar

[52] D. Vilares, M. A. Alonso and C. Gómez-Rodrguez, Supervised sentiment analysis in multilingual environments, Inform. Process. Manag. 53 (2017), 595–607.10.1016/j.ipm.2017.01.004Search in Google Scholar

[53] G. Wu, L. Wang, N. Zhao and H. Lin, Improved expected cross entropy method for text feature selection, in: Computer Science and Mechanical Automation (CSMA), 2015 International Conference on, pp. 49–54, IEEE, Hangzhou, China, 2015.10.1109/CSMA.2015.17Search in Google Scholar

[54] A. Yousefpour, R. Ibrahim and H. N. Abdel Hamed, Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis, Expert Syst. Appl. 75 (2017), 80–93.10.1016/j.eswa.2017.01.009Search in Google Scholar

[55] N. Zainuddin and A. Selamat, Sentiment analysis using support vector machine, in: Computer, Communications, and Control Technology (I4CT), 2014 International Conference on, pp. 333–337, IEEE, Langkawi, Malaysia, 2014.10.1109/I4CT.2014.6914200Search in Google Scholar

[56] Z. Zhang, X.-H. Phan and S. Horiguchi, An efficient feature selection using hidden topic in text categorization, in: Advanced Information Networking and Applications-Workshops, 2008. AINAW 2008. 22nd International Conference on, pp. 1223–1228, IEEE, Okinawa, Japan, 2008.10.1109/WAINA.2008.137Search in Google Scholar

[57] D. M. Zhang, S. Li, C. Zhu, X. Niu and L. Song, A comparison study of multi-class sentiment classification for Chinese reviews, in: Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on, 5, pp. 2433–2436, IEEE, Yantai, China, 2010.10.1109/FSKD.2010.5569300Search in Google Scholar

[58] B. Zhao, Z. Zhang, W. Qian and A. Zhou, Identification of collective viewpoints on microblogs, Data Knowl. Eng. 87 (2013), 374–393.10.1016/j.datak.2013.05.003Search in Google Scholar

[59] L. Zheng, H. Wang and S. Gao, Sentimental feature selection for sentiment analysis of Chinese online reviews, Int. J. Mach. Learn. Cyb. 9 (2015), 1–10.10.1007/s13042-015-0347-4Search in Google Scholar

Received: 2018-03-19
Published Online: 2018-12-04

©2020 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

  1. An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm
  2. Texture Feature Extraction Using Intuitionistic Fuzzy Local Binary Pattern
  3. Leaf Disease Segmentation From Agricultural Images via Hybridization of Active Contour Model and OFA
  4. Deadline Constrained Task Scheduling Method Using a Combination of Center-Based Genetic Algorithm and Group Search Optimization
  5. Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
  6. Distributed Multi-agent Bidding-Based Approach for the Collaborative Mapping of Unknown Indoor Environments by a Homogeneous Mobile Robot Team
  7. An Efficient Technique for Three-Dimensional Image Visualization Through Two-Dimensional Images for Medical Data
  8. Combined Multi-Agent Method to Control Inter-Department Common Events Collision for University Courses Timetabling
  9. An Improved Particle Swarm Optimization Algorithm for Global Multidimensional Optimization
  10. A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
  11. Pythagorean Hesitant Fuzzy Information Aggregation and Their Application to Multi-Attribute Group Decision-Making Problems
  12. Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data
  13. A Bayesian Multiresolution Approach for Noise Removal in Medical Magnetic Resonance Images
  14. Gbest-Guided Artificial Bee Colony Optimization Algorithm-Based Optimal Incorporation of Shunt Capacitors in Distribution Networks under Load Growth
  15. Graded Soft Expert Set as a Generalization of Hesitant Fuzzy Set
  16. Universal Liver Extraction Algorithm: An Improved Chan–Vese Model
  17. Software Effort Estimation Using Modified Fuzzy C Means Clustering and Hybrid ABC-MCS Optimization in Neural Network
  18. Handwritten Indic Script Recognition Based on the Dempster–Shafer Theory of Evidence
  19. An Integrated Intuitionistic Fuzzy AHP and TOPSIS Approach to Evaluation of Outsource Manufacturers
  20. Automatically Assess Day Similarity Using Visual Lifelogs
  21. A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering
  22. Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
  23. Self-Adaptive Mussels Wandering Optimization Algorithm with Application for Artificial Neural Network Training
  24. A Framework for Image Alignment of TerraSAR-X Images Using Fractional Derivatives and View Synthesis Approach
  25. Intelligent Systems for Structural Damage Assessment
  26. Some Interval-Valued Pythagorean Fuzzy Einstein Weighted Averaging Aggregation Operators and Their Application to Group Decision Making
  27. Fuzzy Adaptive Genetic Algorithm for Improving the Solution of Industrial Optimization Problems
  28. Approach to Multiple Attribute Group Decision Making Based on Hesitant Fuzzy Linguistic Aggregation Operators
  29. Cubic Ordered Weighted Distance Operator and Application in Group Decision-Making
  30. Fault Signal Recognition in Power Distribution System using Deep Belief Network
  31. Selector: PSO as Model Selector for Dual-Stage Diabetes Network
  32. Oppositional Gravitational Search Algorithm and Artificial Neural Network-based Classification of Kidney Images
  33. Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure
  34. Sparse Decomposition Technique for Segmentation and Compression of Compound Images
  35. Automatic Genetic Fuzzy c-Means
  36. Harmony Search Algorithm for Patient Admission Scheduling Problem
  37. Speech Signal Compression Algorithm Based on the JPEG Technique
  38. i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
  39. Prediction of User Future Request Utilizing the Combination of Both ANN and FCM in Web Page Recommendation
  40. Presentation of ACT/R-RBF Hybrid Architecture to Develop Decision Making in Continuous and Non-continuous Data
  41. An Overview of Segmentation Algorithms for the Analysis of Anomalies on Medical Images
  42. Blind Restoration Algorithm Using Residual Measures for Motion-Blurred Noisy Images
  43. Extreme Learning Machine for Credit Risk Analysis
  44. A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings
  45. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
  46. A One-Pass Approach for Slope and Slant Estimation of Tri-Script Handwritten Words
  47. Secure Communication through MultiAgent System-Based Diabetes Diagnosing and Classification
  48. Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
  49. Pythagorean Fuzzy Einstein Hybrid Averaging Aggregation Operator and its Application to Multiple-Attribute Group Decision Making
  50. Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content
  51. A Flame Detection Method Based on Novel Gradient Features
  52. Modeling and Optimization of a Liquid Flow Process using an Artificial Neural Network-Based Flower Pollination Algorithm
  53. Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
  54. A Grey Wolf Optimizer for Text Document Clustering
  55. Classification of Masses in Digital Mammograms Using the Genetic Ensemble Method
  56. A Hybrid Grey Wolf Optimiser Algorithm for Solving Time Series Classification Problems
  57. Gray Method for Multiple Attribute Decision Making with Incomplete Weight Information under the Pythagorean Fuzzy Setting
  58. Multi-Agent System Based on the Extreme Learning Machine and Fuzzy Control for Intelligent Energy Management in Microgrid
  59. Deep CNN Combined With Relevance Feedback for Trademark Image Retrieval
  60. Cognitively Motivated Query Abstraction Model Based on Associative Root-Pattern Networks
  61. Improved Adaptive Neuro-Fuzzy Inference System Using Gray Wolf Optimization: A Case Study in Predicting Biochar Yield
  62. Predict Forex Trend via Convolutional Neural Networks
  63. Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  64. A Novel Weakest t-norm based Fuzzy Fault Tree Analysis Through Qualitative Data Processing and Its Application in System Reliability Evaluation
  65. FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
  66. A Modified Jaya Algorithm for Mixed-Variable Optimization Problems
  67. An Improved Robust Fuzzy Algorithm for Unsupervised Learning
  68. Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems
  69. An Efficient Lossless ROI Image Compression Using Wavelet-Based Modified Region Growing Algorithm
  70. Predicting Automatic Trigger Speed for Vehicle-Activated Signs
  71. Group Recommender Systems – An Evolutionary Approach Based on Multi-expert System for Consensus
  72. Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
  73. A New Feature Selection Method for Sentiment Analysis in Short Text
  74. Optimizing Software Modularity with Minimum Possible Variations
  75. Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices
  76. Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach
  77. Feature Pair Index Graph for Clustering
  78. Tangramob: An Agent-Based Simulation Framework for Validating Urban Smart Mobility Solutions
  79. A New Algorithm Based on Magic Square and a Novel Chaotic System for Image Encryption
  80. Video Steganography Using Knight Tour Algorithm and LSB Method for Encrypted Data
  81. Clay-Based Brick Porosity Estimation Using Image Processing Techniques
  82. AGCS Technique to Improve the Performance of Neural Networks
  83. A Color Image Encryption Technique Based on Bit-Level Permutation and Alternate Logistic Maps
  84. A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
  85. Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language
  86. Trapezoidal Linguistic Cubic Fuzzy TOPSIS Method and Application in a Group Decision Making Program
  87. Histopathological Image Segmentation Using Modified Kernel-Based Fuzzy C-Means and Edge Bridge and Fill Technique
  88. Proximal Support Vector Machine-Based Hybrid Approach for Edge Detection in Noisy Images
  89. Early Detection of Parkinson’s Disease by Using SPECT Imaging and Biomarkers
  90. Image Compression Based on Block SVD Power Method
  91. Noise Reduction Using Modified Wiener Filter in Digital Hearing Aid for Speech Signal Enhancement
  92. Secure Fingerprint Authentication Using Deep Learning and Minutiae Verification
  93. The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code
  94. Non-word Attributes’ Efficiency in Text Mining Authorship Prediction
  95. Design and Evaluation of Outlier Detection Based on Semantic Condensed Nearest Neighbor
  96. An Efficient Quality Inspection of Food Products Using Neural Network Classification
  97. Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation
  98. M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
  99. Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
  100. Linear Regression Supporting Vector Machine and Hybrid LOG Filter-Based Image Restoration
  101. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
  102. Implementation of Improved Ship-Iceberg Classifier Using Deep Learning
  103. Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM
  104. Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory
  105. A 4D Trajectory Prediction Model Based on the BP Neural Network
  106. A Blind Medical Image Watermarking for Secure E-Healthcare Application Using Crypto-Watermarking System
  107. Discriminating Healthy Wheat Grains from Grains Infected with Fusarium graminearum Using Texture Characteristics of Image-Processing Technique, Discriminant Analysis, and Support Vector Machine Methods
  108. License Plate Recognition in Urban Road Based on Vehicle Tracking and Result Integration
  109. Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection
  110. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic
  111. Cloud Security: LKM and Optimal Fuzzy System for Intrusion Detection in Cloud Environment
  112. Power Average Operators of Trapezoidal Cubic Fuzzy Numbers and Application to Multi-attribute Group Decision Making
Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2018-0171/html
Scroll to top button