Question: briefly discuss polarity classification methodolgies for opinion mining. This article is organized as follows: In Section 2, key elements of the polarity clas- sification task

briefly discuss polarity classification
briefly discuss polarity classification
briefly discuss polarity classification
briefly discuss polarity classification methodolgies for opinion mining.
This article is organized as follows: In Section 2, key elements of the polarity clas- sification task are explained, and those works in this area that can be useful for the emotion-mining task are reviewed. In Section 3, a set of important resources, including lexicons and datasets that researchers need for a polarity classification task, are intro- duced. Reviewing emotion theories in order to gain knowledge about basic emotions is done is Section 4. A thorough survey on emotion-related research is given in Section 5. Section 6 is dedicated to introducing useful resources specific to emotion-mining work, and, finally, Section 7 summarizes and concludes the discussion 2. POLARITY CLASSIFICATION METHODOLOGIES Polarity classification is the task of classifying the opinion of a given text falling under one of two opposing sentiment polarities, the most famous of which is "like" vs. "dislike" [Pang and Lee 2008). Although much of the work in this area has been done on products and services reviews, which mostly hold positive or negative opinions, there are other problems where "like" or "dislike" are interpreted as other concepts such as different political views [Pang and Lee 2008). As stated in Section 1, different media can be used to express opinions, among which we only focus on text. For more information about other types of polarity classification, one can refer to Morency et al. [2011]. Automatic classification of polarity can be categorized with respect to various per spectives. In terms of granularity, it can be done on a document, sentence, or aspect level. -Document level: In this category, the whole document, whether short or long, is the atomic unit of input to the problem, and the polarity of the whole document is the essence of the study. Document-level polarity classification concerns most of the body of the work for this area and is considered the simplest sentiment analysis task in the research community (Liu 2015). At the same time, it is widely demanding, since most of the online data includes documents such as reviews, blog posts, and comments Document-level polarity classification is an essential requirement for studies such as social and psychological studies in social networks [Ortigosa et al. 2014; Gao et al. 2015a), consumer satisfaction (Kang and Park 2014), and analyzing patients in medical settings (Denecke and Deng 2015]. Sentence level: The objective of this group of studies is to determine the polarity of a sentence. As noted in Neviarouskaya et al. (2007), a challenge at this level is the influence of the surrounding context on the sentence. For example, depending on what context it is used the sentence "I can't really describe this product better than this can be both positive and negative. Polarity classification of tweets, which has been extensively studied in the recent years, is the most interesting application of sentence-level polarity classification. -Aspect level: This category, also known as feature-based opinion mining, encom- passes the study of discovering opinion polarities about a specific aspect of a product or service. For instance, opinions on restaurants can be about two aspects of quality, namely the food and the cleanliness of the restaurant. This category of works is highly useful for business owners and politicians to gain insights about aggregations of people's opinions regarding various features of their product and services, where document or sentence-level classifications do not suffice. Extraction of aspects from text and polarity classification of the extracted aspects DELL the influence of the surrounding context on the sentence. For example, depending on what context it is used, the sentence "I can't really describe this product better than this" can be both positive and negative. Polarity classification of tweets, which has been extensively studied in the recent years, is the most interesting application of sentence-level polarity classification. --Aspect level: This category, also known as feature-based opinion mining, encom- passes the study of discovering opinion polarities about a specific aspect of a product or service. For instance, opinions on restaurants can be about two aspects of quality, namely the food and the cleanliness of the restaurant. This category of works is highly useful for business owners and politicians to gain insights about aggregations of people's opinions regarding various features of their product and services, where document- or sentence-level classifications do not suffice. Extraction of aspects from text and polarity classification of the extracted aspects are the two major components of aspect-level polarity classification. The work of Hu and Liu (2004) is one of the earliest in this field. Further attempts mostly focused ACM Computing Surveys, Vol. 10, No. 2, Article 25, Publication date: May 2017 25:6 A. Yadollahi et al. on enhancing only one of these components. For instance, one of the most important group of works in this category is devoted to utilizing topic modeling in aspect ex- traction such as the work of Lin and He (2009), Jo and Oh (2011), Mukherjee and Liu [2012], and Wang et al. (2016). With respect to the nature of the data, there are two important modes of the problem. Some datasets benefit from being annotated by a human, while there are many un. labeled datasets of reviews and posts. Methods working with labeled data often show better results; nevertheless, they require manual labeling, which one might be unable to afford. In the following two subsections, we discuss previous methods on annotated and unannotated text data, respectively. 2.1. Works on Annotated Data The algorithms that deal with labeled data are called "supervised methods." Supervised methods apply some machine learning algorithms on a set of training data to be able to predict the label of unseen test data. They need an annotated dataset of texts for the task of training, which creates a model to discriminate between polarities. In order to apply machine-learning methods, one should represent the text by means of descriptive features. After that, some techniques should be used to train a polarity classifier. Most solutions introduced in the literature are general-purpose machine- learning techniques, while some of them are sentiment specific. Sebastiani [2002) was the first to apply general text categorization algorithms on the field of sentiment detec- tion. Later, Pang et al. (2002) compared performance of Support Vector Machine (SVM) and Naive Bayes against each other for movie reviews Ranrauaniaitanlaamina maharshanthan.nramiainalauailination anlt.in With respect to the nature of the data, there are two important modes of the problem. Some datasets benefit from being annotated by a human, while there are many un labeled datasets of reviews and posts. Methods working with labeled data often show better results; nevertheless, they require manual labeling, which one might be unable to afford. In the following two subsections, we discuss previous methods on annotated and unannotated text data, respectively, 2.1. Works on Annotated Data The algorithms that deal with labeled data are called "supervised methods."Supervised methods apply some machine learning algorithms on a set of training data to be able to predict the label of unseen test data. They need an annotated dataset of texts for the task of training, which creates a model to discriminate between polarities In order to apply machine learning methods, one should represent the text by means of descriptive features. After that, some techniques should be used to train a polarity classifier. Most solutions introduced in the literature are general purpose machine. learning techniques, while some of them are sentiment specific. Sebastiani (2002) was the first to apply general text categorization algorithms on the field of sentiment detec tion Later, Pang et al. (2002) compared performance of Support Vector Machine (SVM) and Naive Bayes against each other for movie reviews Representation learning methods have shown promising classification results in var- ious applications, one of which is the polarity classification. Socher et al. (2013) utilize deep learning to train a Treebank sentiment classifier, Tang et al. (2014al develop a deep learning Twitter sentiment model, dos Santos and Gatti (2014) apply Deep Convolutional Neural Networks on classifying short text, Tang et al. (2014b) develop neural networks to find continuous word representation along with the sentiment of the word, and Tang (2015) attempts to encapsulate features of a document using caus eaded constitutes and to learn sentiment of documents. All these works attempt to find a representation of the polarity by applying various layers of hidden nodes among which the first layer consists of the raw features of the text. A fairly large part of the literature is dedicated to finding out the usefulness of many features and techniques in learning. The most common types of those features, which have been also applied in other areas of text mining, are as follows 2.1.1. Presence-Based and Frequency Based Features. The most common way to describe a piece of text is by using n binary vector in which each element corresponds to one term from a dictionary. The clement at index i in the vector is set to 1 if the term i is present in the text and in otherwise. Likewise, one may describe the text as a vector representing the number of times individual terms have been repeated. The former is called the presence-based and the latter is named the frequency-based type offenture. Althyugh term frequency is a popular feature in information retrieval, Pang et al. (2002) obtain better performance when using presence-based features 2.1.2. Unigram and N-Gram Features. A unigram refers to one single word in a text and anti-stram representa a group of adjacent words in a sentence, preserving the order. Although n-roms have more information than unigram features, concerning the position of words in the sentence and being used as a group, them being more effective in increasing the performance is a matter of some debate For instance, Pang et al. (20021 report that unigrama are more effective than n grams, however, some WCMC. No. Anna DELL

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!