Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory

Ayman S. Ghabayen; Basem H. Ahmed

doi:10.1515/jisys-2018-0356

Article Open Access

Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory

Ayman S. Ghabayen and Basem H. Ahmed

Published/Copyright: August 15, 2019

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Intelligent Systems Volume 29 Issue 1

Abstract

Nowadays, sentiment analysis is a method used to analyze the sentiment of the feedback given by a user in an online document, such as a blog, comment, and review, and classifies it as negative, positive, or neutral. The classification process relies upon the analysis of the polarity features of the natural language text given by users. Polarity analysis has been an important subtask in sentiment analysis; however, detecting correct polarity has been a major issue. Different researchers have utilized different polarity features, such as standard part-of-speech (POS) tags such as adjectives, adverbs, verbs, and nouns. However, there seems to be a lack of research focusing on the subcategories of these tags. The aim of this research was to propose a method that better recognizes the polarity of natural language text by utilizing different polarity features using the standard POS category and the subcategory combinations in order to explore the specific polarity of text. Several experiments were conducted to examine and compare the efficacies of the proposed method in terms of F-measure, recall, and precision using an Amazon dataset. The results showed that JJ + NN + VB + RB + VBP + RP, which is a POS subcategory combination, obtained better accuracy compared to the baseline approaches by 4.4% in terms of F-measure.

Keywords: Sentiment analysis; opinion mining; polarity features; sentiment lexicon; POS tagger; SentiWordNet; natural language text

1 Introduction

The recent years have brought significant growth in social media websites across the Internet. Alongside a huge amount of user-generated data, users provide their data not only through discussion and personal notes but also, on a mass scale, by sharing what they think and feel about products, services, issues, events, and policies in E-commerce websites. This is called user opinion or review, which aims to determine the mood of the writer or the attitude of the speaker. Therefore, user-generated data have become a very important source for business intelligence decision-making processes by helping organizations improve their products. In addition, it could also be helpful for consumers to read the reviews of other consumers to help them make up their mind about purchasing particular products before deciding to buy those products. The positive or negative feeling expressed by people is known as sentiment.

Sentiment analysis (SA), also known as opinion mining, is considered a natural language approach that analyzes people’s attitudes about a specific product or topic [15]. SA is the purpose of automatic analysis of an online document, such as a blog, comment, review, and other new items like a comprehensive sentiment, and summarizes it as positive, negative, or neutral [15]. SA can be used in different areas, such as predicting election results [25], providing companies and organizations with information about their products [8], automating product summary or outlines in reviews, or even predicting the stock market in online purchasing sites [8]. SA can be viewed as a classification process, as demonstrated in Figure 1.

Figure 1:

Sentiment Analysis Process [13].

According to Refs. [10], [13], there are three levels on which SA can be conducted:

Document level: This approach considers the entire document (e.g. a comment or review) as a basic information unit, and then classifies it as positive, negative, or neutral. However, in some cases, the results given by this approach are incompatible; for example, a document that has positively recognized a particular item does not indicate that the author seems to have only positive opinions about all features of that item. Similarly, a document that has negatively recognized an item does not indicate that the author is completely negative about all features of that item. Typically, authors convey both positive and negative sentiments about a particular item and its features.
Sentence level: This approach attempts to establish the opinion expressed in each sentence by breaking the entire document into sentences, with each sentence handled as a separate information unit. It is first recognized whether a sentence is subjective or objective, and then it is decided whether the sentence conveys a positive or a negative opinion.
Aspect level: This approach performs a fine-grained analysis to identify relevant aspects and entities of a particular item, and the sentiment/polarity is expressed toward each aspect. Here is an aspect that refers to a feature of a particular item, e.g. the battery life of a mobile phone.

The fundamental problem of SA is sentiment polarity categorization [10], [11], [13], [17]. Generally, there are many pieces of research in SA conducted for classifying a given textual content regarding the opinions expressed in it, by utilizing different polarity features, such as part-of-speech (POS) tags [4], [6], [18], [21], [24]. The aim of this study was to present lexical-based methods for sentiment polarity categorization based on polarity features, such as adjectives, adverbs, verbs, and nouns, with combinations among these features and all their subfeatures. The rest of this paper is structured as follows. In Section 2, the related work and the existing methods or techniques used in SA are presented. Next, the proposed methods and details about the different steps that are performed, the proposed framework, dataset, and evaluation matrices used are explained in Section 3. Section 4 presents and discusses the results of the proposed methods. Lastly, Section 5 gives the conclusion and our recommendations for future work that can be performed in this field.

2 Related Works

Generally, one of the fundamental problems in recent SA research focuses on the categorization of sentiments. This process relies on assigning the opinion polarity of words and phrases that express sentiments in order to decide the subjectivity/objectivity orientation of a document, like positive or negative (or neutral) [4]. Sentiment classification techniques described in the literature can be roughly classified into two approaches: supervised and unsupervised SA [16]. This classification relates to the methodology whereby the application is designed to make the sentiment classification. Supervised learning applies machine learning methods, such as support vector machine, maximum entropy, k-nearest neighbor, naïve Bayes, decision tree, and artificial neural network [2], [5], [18], [16]. The supervised learning approach uses labeled training data that are manually annotated and testing data in resolving the SA. Besides, it uses a set of linguistic and/or syntactic features vector extracted directly from the original feedback sentences in order to make a classification decision [5], [11].

In contrast, the unsupervised approach does not need to have labeled training data. To illustrate, the methodology in this research follows the unsupervised approach. The main approaches in this methodology are linguistics based and lexicon based [7]. The lexicon-based approach involves statistical calculation of sentiments from the semantic orientation of words or phrases that occur in the text [22]. The basic assumption underlying lexicon-based approaches is that the most essential indications of sentiment in natural language text are words that express sentiments, which are also called opinion words. Based on this approach, a pre-compiled dictionary of positive and negative terms is required. In addition, SentiWordNet is considered a famous lexical resource that was explicitly developed for supporting sentiment classification and analysis applications [24]. The linguistics-based SA divides natural language text not simply into particular constituent words and sentences. It also identifies their syntactic constructions in order to locate a syntactic POS category, such as an adjective or verb phrase, that is the most likely to be expressed in an opinion. The sentiment classification of a textual content considers the polarity score of each word in the text. For instance, if a word is matched with a positive sentiment score in the lexicon, then the positive polarity score of the content is increased. Subsequently, if the positive polarity score is bigger than the negative one, the content is considered positive, or else the content is considered negative.

There is a large volume of research that focused on finding commonly used terms that express the sentiment in an online review using the learning lexicon-based approach and natural language processing in order to find common terms that express opinion mining [1], [4], [21]. Thet et al. [23] suggested a linguistic approach for SA of discussion posts on conversation forums, where they work in a clause-level opinion analysis. They employed SentiWordNet to obtain the preceding word sentiment scores, with a specific domain lexicon (movie review domain) built on purpose. Then, they identified the sentiment score for each clause by inspecting the natural language syntactic dependencies of words, reliance on syntactic dependency trees, and considering pattern rules. Sarkar et al. [18] proposed an SA approach deploying linguistic features, such as adverb-adjective-noun-verb on the document level. However, a set of well-identified axioms was employed to compute the function value of SA. Bethard et al. [1] extracted the opinion at the sentence level by utilizing the combination of adverbs and adjectives. Chesley et al. [4] utilized linguistic features, such as adjectives and verbs, in order to classify blog sentiment. The Wikipedia dictionary is utilized for determining the polarity scores of the blog content.

Although there are plenty of works that have been done for SA covering, there is use of natural language, POS, such as adjectives, adverbs, verbs, and nouns. However, mostly, no research has been done until now that considers all subcategories of POS tags. For example, adjective is syntactically categorized into comparative and superlative. Table 1 lists a summary of related methods.

Table 1:

Summary of the Reported Methods.

Method	Description	Used by
Linguistics based	This method identifies the syntactic construction of a natural language text in order to locate the syntactic POS phrase that is most likely to express an opinion	Sarkar et al. [21], Bethard et al. [1], Chesley et al. [4], Thet et al. [23]
Lexicon based	This method involves statistical calculation of sentiments from the semantic orientation of words or phrases that occur in a text	Tomar and Sharma [17], Taboada et al. [22]

This study focuses on the feature evaluation of the standard POS – adjectives, adverbs, verbs, and nouns – and all its subcategories that indicate the property and informativeness of a word in a natural language text. Then, a comprehensive study of combinations between these subfeatures is presented to show the strength of such features in sentiment polarity.

3 Materials and Methods

This section comprises two main subsections. The first subsection presents the proposed approach to better recognize the polarity of natural language text and details of the processing that comprises seven steps. The second subsection presents the dataset and evaluation matrices used to evaluate the proposed approach.

3.1 Proposed Approaches

The aim of this research is to propose methods to improve the classification efficiency of identifying the sentiment expressed in a text by using the SentiWordNet lexicon and natural language POS. Figure 2 presents the proposed framework for sentiment classification, which consists of several steps, such as data pre-processing, sentiment tokenization, POS tagging, sentiment negation, term score, feature selection, sentiment classification, accuracy metric, and result comparison.

Figure 2:

Sentiment Classification Framework.

3.1.1 Data Pre-processing

The first step is collecting the reviews from the Amazon dataset. As known, before classifying a text, it is important to process it. The second step is mainly focused on pre-processing and cleaning redundant data in the dataset. First, punctuation standardization is performed to ensure that writing rules can be respected. Thereafter, all non-alphabetical characters, like numbers and emotion letters including smileys, punctuation periods, hyphens, and apostrophes, are removed from each review. Then, all words in the review are converted into lowercase letters. Consequently, each split word and phrase are used for further processing.

3.1.2 Sentiment Tokenization

In this step, all the review text is tokenized, by breaking it up into sentences (based on the use of periods) and each sentence into words. After the text is separated in tokens, the next step is frequently designed to conduct a morph syntactic analysis to recognize characteristics, for example its grammatical or lexical category. This analysis is well known as POS tagging.

The steps of the sentiment classification framework are presented in further detail in the following subsections.

3.1.3 POS Tagger

A POS tagger is a linguistic approach in natural language processing that reads a textual content in a given language as input and provides POS tags to every single word in a sentence, such as a noun, verb, adjective, etc. Generally, there are eight POS in the English language: adjective, noun, verb, pronoun, adverb, preposition, conjunction, and interjection. The POS tagger employed in this research is Stanford POS [12]. The tagger is capable of providing 36 different tags that can recognize more detailed syntactic parts, more than just eight. Table 2 presents a list of POS tags that have been included in the POS tagger. Actually, POS tags indicate the informativeness of a word; thus, it can be used to calculate the term scores in the classification process.

Table 2:

POS Tags and Abbreviations.

POS tags	Definition	SentiWordNet abbr.
NN	Noun, singular or mass	N
NNP	Proper noun, singular	N
NNPS	Proper noun, plural	N
NNS	Noun, plural	N
VB	Verb, base form	V
VBD	Verb, past tense	V
VBG	Verb, gerund/present participle	V
VBN	Verb, past participle	V
VBP	Verb, present tense	V
VBZ	Verb, third person	V
RB	Adverb	R
RBR	Adverb, comparative	R
RBS	Adverb, superlative	R
JJ	Adjective	A
JJR	Adjective, comparative	A
JJS	Adjective, superlative	A

3.1.4 Negation

Negation negates the current polarity of paired words in a sentence. Negation words, such as “not,” “can’t,” and “don’t,” act as a polarity inverter of the word that is paired with it. For example, whenever a negation word is paired with a positive word, it will be turned into negative. Alternatively, when a negation word is paired with a negative word, it turns into positive. Accordingly, the negation words must be appropriately treated in a sentiment classification. This process adapted the work that was performed by Pang et al. [16]. For a negation handling process, a list of negation words is presented and each sentence in the review is checked based on the occurrence of a negation word. If a negation word is found, a negation process is applied; it is considered as a set of negation words defined as follows:

(1) NegW={set of negation words}.

If a negation word is found in the sentence, the positive score and the negative score of opinion terms in a sentence are swapped. For each term in the sentence, we apply the following equation:

(2) Score(termx)={Pos.Score(termx)=Neg.Score(termx)Neg.Score(termx)=Pos.Score(termx)∃ termy∈NegW,

where termx is any term in the tokenized sentence.

3.1.5 Term Score

This process takes the review corresponding feature vector that represents the POS, then represents a set of sentiment scores. Each sentiment-carrying word within the input review is assigned a term score from the sentiment lexicon, which is the most essential resource for most SA methods. In this research, SentiWordNet was developed to set the synsets as the main target part of WordNet, in accordance with the “positive,” “negative,” and “neutral” notation of each sunset. Each set of terms sharing the same meaning in synsets is linked with three numerical scores, Pos.Score(termx), Neg.Score(termx), and obj.Score(termx), for positive, negative, and neutral, respectively. The score can be 0 or 1, which indicates the negative and positive bias of the term according to the following formula:

(3) Pos.Score(termx)+Neg.Score(termx)+obj.Score(termx)=1.

If a word is not found in the SentiWordNet lexicon, the term score is calculated by using the WordNet lexicon, by collecting the corresponding synonym set (synset) of the target word based on its POS tags. For example, if the target word is an adjective, all the synonym sets that are tagged with adjective POS in WordNet is collected. We believe that this procedure might improve accuracy, as it could possibly overcome the ambiguity and the variety of the vocabulary. In this case, the term score determined the based maximum value of the absolute value of the subtraction between its positive and negative score. The term score for such word is calculated as follows: let Sx be the set of terms synset (synonyms), then

(4) Score(termx)=Max∀synsets∈Sx|Pos.Score(synset)−Neg.Score(synset)|,

where Pos.Score and Neg.Score correspond to the collected synonyms for terms.

3.1.6 Feature Selection

This study focuses on feature extraction retracted to different types of POS (adjectives, adverbs, verbs, and nouns). The feature selection process receives a tokenized tag for each feature with a term score corresponding to each POS tag shown in Table 2. Then, to compare the effectiveness of each POS tag on the sentiment polarity, each feature of tags is selected separately. A combination of sets of the best POS tags is generated as a new feature set.

3.1.7 Sentiment Classification

Sentiment classification relies on the selected feature set that shows a user’s opinion. Each review comprises a variety of opinion words of variable sequence. The selected feature set consists of different grammatical words, as mentioned in Table 2. In this study, the sentiment classification approach is the unsupervised lexicon-based approach. The sentiment classification of review R is calculated by the difference between the summations of its positive and negative term scores, as follows:

(5) SentiScore(R)=∑pos=1pScore(termpos)−∑neg=1nScore(termneg)n+p,

where p denotes the total number of positive terms and n denotes the total number of negative terms. As review R is longer, it may contain more terms that are regarded as positive or negative. Therefore, in order to compare the sentiment polarity of different review lengths, the SentiScore is normalized by dividing it by the number of sentiment terms in R, with the intention to dampen the impact of the review size on its score. Then, the normalized SentiScore values are within the interval [−1,1]. Thus, the review is classified as a positive review if its normalized SentiScore is positive. Alternatively, the review is classified as a negative review if its normalized SentiScore is negative.

3.2 Dataset and Evaluation Matrices

The proposed methods were examined on the Amazon reviews dataset released by the Association for Computational Linguistics [3]. Each review consists of the user comments (opinion text), a reviewer name and location, a product name, a review title and date, and a numerical rating scale from 1 to 5 stars. The low rating scale (1 star) indicates an extremely negative opinion and a very high rating (5 stars) reflects an extremely positive opinion on the product. To prepare the dataset, all reviews with user rating >3 were labeled as positive and those with rating <3 were labeled as a negative review. Furthermore, reviews with a rating of 3, which is considered neutral, are ignored because they lie near the boundary of a binary classifier, under the assumption that there is less to learn from neutral texts compared to the ones with a clear positive or negative sentiment [20].

The dataset was pre-processed and cleaned from its raw form. For instance, HTML tags were eliminated using the HTML parser. Additionally, all punctuation signifiers and numbers were eliminated. Experiments were conducted on a dataset of 21,972 positives plus 16,576 negatives with a total of 38,548 reviews. The experimental dataset is a benchmark dataset that consists of 26 different product types from different domains. Table 3 presents a summary of the dataset used.

Table 3:

Amazon Product Dataset Summary.

Domain	Positive	Negative
Apparel	1000	1000
Automotive	584	152
Baby	1000	900
Beauty	1000	493
Books	1000	1000
Camera & photo	1000	999
Cell phones & service	639	384
Computer & video games	1000	458
DVD	1000	1000
Electronics	1000	1000
Gourmet food	1000	208
Grocery	1000	352
Health & personal care	1000	1000
Jewelry & watches	1000	292
Kitchen & housewares	1000	1000
Magazines	1000	970
Music	1000	1000
Musical instruments	284	48
Office products	367	64
Outdoor living	1000	327
Software	1000	915
Sports & outdoors	1000	1000
Tools & hardware	98	14
Toys & games	1000	1000
Video	1000	1000
Total no.	21,972	16,576

As known, most SA algorithms categorize data into positive, neutral, and negative. Hence, the rule of thumb is to measure performance by examining if the system categorized the data in accordance with the intuition of the use. In order to examine the effectiveness of the proposed methodology, well-known evaluation metrics, precision-recall and F-measure, are used. These metrics compare the correct classification that considers the true positive and true negative considering both positive and negative reviews, respectively.

4 Results and Discussion

In order to compare the performance of our proposed methods, we compared our approach with the lexicon-based classifier. The first baseline (baseline1) is based on the idea that the polarity of a text can be given by the sum of the individual polarity values of each word or phrase presented in the text, where each term is associated with numerical scores indicating positive and negative sentiment information. The total polarity of the text is considered positive if the sum of the positive word polarity is greater than the sum of the negative word polarity. Otherwise, the total polarity of the text is considered negative [9], [19]. The second baseline (baseline2) involves counting the number of positive and negative word scores to determine the sentiment polarity of a text. In this approach, the total polarity of the text is considered positive if the count of positive word polarity is greater than the count of negative word polarity. Otherwise, the total polarity of the text is considered negative [9], [14].

Table 4 reports the results of each individual method. The performance was evaluated by the recall, precision, and F-measure metrics per method, where PR is the positive recall, PP is the positive precision, PF is the positive F-measure, NR is the negative recall, NP is the negative precision, and NF is the negative F-measure. Hence, WF is the weighted F-measure. WF is the weighted average of the PF and NF scores.

Table 4:

Experimental Results for POS Subcategories.

Exp.	PR	PP	PF	NR	NP	NF	WF
Baseline1	72.6	71.8	72.2	53.3	52.8	53.1	64
Baseline2	76.9	72.1	74.5	35.5	32.6	34	57.1
Adjective	73.7	69.9	71.8	57	53	55	64.6
JJ	70.4	66.5	68.4	57.7	53.5	55.6	62.9
JJS	97	12.6	22.4	38.8	3.4	6.3	15.5
JJR	0	0	0	0	0	0	0
Adverb	60.1	53.2	56.5	60.3	55.7	58	57.2
RB	58.3	50.8	54.3	62	56.4	59.1	56.4
RP	68	18.5	29.1	39.3	13.1	19.7	25.1
RBR	100	5.3	10.1	0.2	0.1	0.2	5.9
RBS	100	0.7	1.4	0	0	0	0.8
Noun	63.2	62	62.6	41.9	41.2	41.6	53.6
NN	66.6	63.1	64.9	38.8	36.9	37.9	53.3
NNS	49.5	9.2	15.6	46.3	8.9	15	15.4
NNP	31.4	2.7	5	65	5.4	10	7.2
NNPS	12.5	0.1	0.2	60	0.1	0.2	0.2
Verb	68.2	62.3	65.2	37.1	34.7	35.9	52.7
VB	65.9	53.2	58.9	33.3	29.2	31.2	47
VBP	60.3	44.6	51.3	48.7	35.7	41.2	47
VBD	71.5	9.4	16.7	33.9	5.2	9.1	13.5
VBN	73.2	6.7	12.3	28.1	2.6	4.8	9.1
VBZ	39.9	0.4	0.8	52.4	0.6	1.2	1
VBG	38.5	0.2	0.4	55	0.3	0.6	0.5

The result illustrates the accuracy of different POS tag methods. Firstly, we observed the classification accuracy of using the standard POS tags, which uses adjectives, adverbs, verbs, and nouns only as their scoring source. The classification method using adjectives achieved the highest accuracy of 64.6%. Secondly, we compared the results of the subcategories. The results showed that JJ, RB, VBP, and NN achieved the best result compared to the other POS subcategories in the standard POS tags adjectives, adverbs, verbs, and nouns, respectively.

Table 5 presents POS subcategory combinations, where the subcategories that obtain the highest accuracy in the standard POS tags are selected in the combinations. For example, in the adjective category, JJ obtains the highest accuracy compared to other adjective categories, similarly NN in noun, VB in verb, and RB in adverb. Moreover, the subcategories that obtain the next highest accuracy in the standard POS tags are added in the combinations in order to explore their effect on classification accuracy. For example, the verb VBP subcategory obtains 47% in terms of WF score; therefore, it is elected in the subcategory combinations. The same procedure is followed for RP, JJS, and VBD in the combination process.

Table 5:

Subcategory Combinations.

Set no.	Feature set combinations
1.	JJ, NN, VB, RB
2.	JJ, NN, VB, RB, VBP
3.	JJ, NN, VB, RB, VBP, RP
4.	JJ, NN, VB, RB, VBP, RP, JJS
5.	JJ, NN, VB, RB, VBP, RP, JJS, VBD
6.	JJ, NN, VB, RB, VBP, RP, JJS, VBD, NNS
7.	JJ, NN, VB, RB, VBP, RP, JJS, VBD, NNS, VBN

Table 6 depicts the WF measure for POS subcategory combinations. The results show that the JJ + NN + VB + RB combination obtained a better accuracy of 67.5% compared to a baseline1 approach, which achieved 64.1%. Next, we compared the result of other POS combinations in order to examine the effectiveness of each POS category combination. In case of adding the VBP feature to the previous combination, the accuracy obtains an enhancement of 1%. However, other POS feature combinations do not show a significant improvement in the classification accuracy; for example, the RP tag does not have any accuracy improvement. Meanwhile, the JJS tags show a small improvement of 0.1%, which cannot be considered a significant improvement. Finally, the results illustrate that adding the VBD, NNS, and VBN POS features to the combination produces a negative effect on classification accuracy.

Table 6:

Experimental Results for POS Subcategory Combinations.

Exp./Metrics	PR	PP	PF	NR	NP	NF	WF
Baseline1	72.6	71.9	72.3	53.3	52.8	53.1	64.1
Baseline2	76.9	72.1	74.5	35.5	32.6	34	57.1
1	75.5	74.7	75.1	57.6	56.9	57.3	67.5
2	75.5	75.3	75.4	59.4	59.2	59.3	68.5
3	77.6	76.9	77.3	57.1	56.5	56.8	68.5
4	77.6	76.9	77.3	57.2	56.6	56.9	68.6
5	77.6	76.9	77.3	56.7	56.1	56.4	68.4
6	77.6	76.9	77.3	56.7	56.1	56.4	68.4
7	78	77.3	77.7	56.1	55.5	55.8	68.3

5 Conclusion

In this paper, we presented lexical-based methods for SA that better recognize the polarity of natural language text by utilizing different polarity features with the standard POS tags, such as adjectives, adverbs, verbs, and nouns, and examined the combination of subcategories. An experimental study was conducted on the Amazon dataset to explore the specific polarity of text. We examined different polarity features using the standard POS tags and the combination of the subcategories. The experimental results indicated that the JJ + NN + VB + RB + VBP + RP combination achieved 4.4% enhancement compared with baseline1. This result is very promising in performing their tasks compared to the other feature combinations and the baseline approaches. In the future, we are planning to improve the presented work by considering the semantic definition of each word. We could also employ different general inquirer dictionaries for further categorization.

Bibliography

[1] S. Bethard, H. Yu, A. Thornton, V. Hatzivassiloglou and D. Jurafsky, Automatic extraction of opinion propositions and their holders, in: 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text, vol. 2224, 2004.Search in Google Scholar

[2] R. Bhargava, S. Arora and Y. Sharma, Neural network-based architecture for sentiment analysis in Indian languages, J. Intell. Syst. 28 (2018), 361–375.10.1515/jisys-2017-0398Search in Google Scholar

[3] J. Blitzer, M. Dredze and F. Pereira, Biographies, Bollywood, boom-boxes and blenders: domain adaptation for sentiment classification, ACL 7 (2007), 440–447.Search in Google Scholar

[4] P. Chesley, B. Vincent, L. Xu and R. K. Srihari, Using verbs and adjectives to automatically classify blog sentiment, in: Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, N. Nicolov, F. Salvetti, M. Liberman, and J. H. Martin, eds., AAAI Press, Menlo Park, CA, 27–29, Technical Report SS-06-03, vol. 580, no. 263, p. 233, 2006.Search in Google Scholar

[5] M. D. Devika, C. Sunitha and A. Ganesh, Sentiment analysis: a comparative study on different approaches, Proc. Comput. Sci. 87 (2016), 44–49.10.1016/j.procs.2016.05.124Search in Google Scholar

[6] X. Fang and J. Zhan, Sentiment analysis using product review data, J. Big Data 2 (2015), 5.10.1186/s40537-015-0015-2Search in Google Scholar

[7] R. Feldman, Techniques and applications for sentiment analysis, Commun. ACM 56 (2013), 82–89.10.1145/2436256.2436274Search in Google Scholar

[8] M. Ghiassi, J. Skinner and D. Zimbra, Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network, Expert Syst. Appl. 40 (2013), 6266–6282.10.1016/j.eswa.2013.05.057Search in Google Scholar

[9] C. S. Khoo and S. B. Johnkhan, Lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons, J. Inform. Sci. 44 (2017), 491–511.10.1177/0165551517703514Search in Google Scholar

[10] B. Liu, Sentiment analysis and opinion mining, Synthesis Lect. Hum. Lang. Technol. 5 (2012), 1–167.10.1007/978-3-642-19460-3_11Search in Google Scholar

[11] Y. Liu, J.-W. Bi and Z.-P. Fan, Multi-class sentiment classification: the experimental comparisons of feature selection and machine learning algorithms, Expert Syst. Appl. 80 (2017), 323–339.10.1016/j.eswa.2017.03.042Search in Google Scholar

[12] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard and D. McClosky, The Stanford corenlp natural language processing toolkit, in: ACL (System Demonstrations), Stanford, pp. 55–60, 2014.10.3115/v1/P14-5010Search in Google Scholar

[13] W. Medhat, A. Hassan and H. Korashy, Sentiment analysis algorithms and applications: a survey, Ain Shams Eng. J. 5 (2014), 1093–1113.10.1016/j.asej.2014.04.011Search in Google Scholar

[14] B. Ohana and B. Tierney, Sentiment classification of reviews using SentiWordNet, in: IT&T Conference, Dublin Institute of Technology, Dublin, Ireland, 22nd–23rd October, 2009.Search in Google Scholar

[15] B. Pang and L. Lee, Opinion mining and sentiment analysis, Found. Trends Inform. Retriev. 2 (2008), 1–135.10.1561/9781601981516Search in Google Scholar

[16] B. Pang, L. Lee and S. Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, 2002.10.3115/1118693.1118704Search in Google Scholar

[17] K. Ravi and V. Ravi, A survey on opinion mining and sentiment analysis: tasks, approaches and applications, Knowl.-Based Syst. 89 (2015), 14–46.10.1016/j.knosys.2015.06.015Search in Google Scholar

[18] S. Sarkar, P. Mallick and T. K. Mitra, A novel machine learning approach for sentiment analysis based on Adverb-Adjective-Noun-Verb (AANV) combinations, Int. J. Recent Trends Eng. Technol. 7 (2012).Search in Google Scholar

[19] A. Sharma, A. Sharma, R. K. Singh and M. D. Upadhayay, Hybrid classifier for sentiment analysis using effective pipelining, Int. Res. J. Eng. Technol. (IRJET) 4 (2017), 2276–2281.Search in Google Scholar

[20] F. Smarandache, M. Teodorescu and D. Gîfu, Neutrosophy, a sentiment analysis model, in: The 3 rd Workshop on Social Media and the Web of Linked Data, Toronto, Ontario, Canada, ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), pp. 38–41, 2017.Search in Google Scholar

[21] V. S. Subrahmanian and D. Reforgiato, AVA: adjective-verb-adverb combinations for sentiment analysis, IEEE Intell. Syst. 23 (2008), 43–50.10.1109/MIS.2008.57Search in Google Scholar

[22] M. Taboada, J. Brooke, M. Tofiloski, K. Voll and M. Stede, Lexicon-based methods for sentiment analysis, Comput. Linguist. 37 (2011), 267–307.10.1162/COLI_a_00049Search in Google Scholar

[23] T. T. Thet, J.-C. Na, C. S. Khoo and S. Shakthikumar, Sentiment analysis of movie reviews on discussion boards using a linguistic approach, in: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 81–84, ACM, 2009.10.1145/1651461.1651476Search in Google Scholar

[24] D. S. Tomar and P. Sharma, A text polarity analysis using SentiWordNet based an algorithm, Int. J. Comput. Sci. Inform. Technol. 7 (2016), 190–193.Search in Google Scholar

[25] A. Tumasjan, T. O. Sprenger, P. G. Sandner and I. M. Welpe, Predicting elections with twitter: what 140 characters reveal about political sentiment, in: Fourth International AAAI Conference on Weblogs and Social Media, vol. 10, no. 1, pp. 178–185, 2010.10.1609/icwsm.v4i1.14009Search in Google Scholar

Received: 2018-08-31

Published Online: 2019-08-15

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

https://doi.org/10.1515/jisys-2018-0356

Keywords for this article

Sentiment analysis; opinion mining; polarity features; sentiment lexicon; POS tagger; SentiWordNet; natural language text

Creative Commons

BY 4.0