2015 lEEE Jordan Conference on Applied Eiechicat Engineering and Computing Technologies {AEECT} Twitter Sentiment Analysis: A Case Study in the Automotive Industry Sarah E. Shulcri Rawan I, Yaghi Business Information Business Information Technology Department Technology Department The University Of Jordan The University Of Jordan Amman, Jordan Amman, Jordan Sar814l 19?@fgs.ju.edu.jo Roa8141203@fgs.ju.edu.jo Abstract Sentiment analysis is one of the fastest growing areas which uses the natural language processing, text mining and computational linguistic to extract useful information to help in the decision making process. In the recent years, social media websites have been spreading widely, and their users are increasing rapidly. Automotive industry is one of the largest economic sectors in the world with more than 90 million cars and vehicles. Automotive industry is highly competitive and requires that sellers, automotive companies, carefully analyze and attend to consumers' opinions in order to achieve a. competitive advantage in the market. Analysing consumers' opinions using social media data can be very great way for the automotive companies to enhance their marketing targets and objectives. In this paper, a sentiment analyses on a case study in the automotive industry is presented. Text mining and sentiment analysis are used to analyze unstructured tweets on Twitter to extract the polarity, and emotions classification towards the automotive classes such as Mercedes, Audi and BMW. We can note from the emotions classication results that. \"joy\" category is better for BMW comparing to Mercedes and Audi, The \"sadness\" percentage is larger for Audi and Mercedes comparing to BMW. Furthermore, we can note from the polarity classication that BMW has 72% positive tweets compared 79% for Mercedes and 83% for Audi. In addition, the results show that BMW has 8% negative polarity compared 18% for Mercedes and 16% for Audi. Witter; Keywords Sentiment Automotive; Classification Analysis,\" I. INTRODUCTION Others' opinions have always been an important piece of information for consumers when it's time to make buying decision. Long before awareness of the World Wide Web became widespread, people often rely on their t'riends' recommendations and specialized magazines or websites as the main sources of information. But with the growth of the web over the last decade, the social media nowadays provides new tools to efciently create and share useful information [1]. This made it possible to find out about experiences and the opinions almost everywhere (blogs, forums, social networks, news portals, and content-sharing sites, etc.). Ibrahim Aljarah Hamad Alsawalqah Business Information Computer Information Technology Department Systems Department The University Of Jordan The University Of Jordan Amman, Jordan Amman, Jordan i.aljarah@ju.edu.jo h.sawalqah@ju.edu.jo Researches indicate that using the social media sites is considered as the best way to grow a business in terms of money, time, effort and other resources [2]. Although these opinions are meant to be helpful, the massive availability of such opinions and their unstructured nature make it difcult for companies to benefit from them. To solve this issue, a number of techniques for analysing data generated by users on social media sites have been developed. Sentiment analysis which is known as opinion mining is one such recent techniques. Sentiment analysis uses natural language processing, text mining and computational linguistic to extract useful information and knowledge from source data. The purpose of sentiment analysis is to classify polarity from a source text into positive, neutral and negative. Text mining is a crucial step in sentiment analysis where unstructured data are analysed and scored based on how much it relates to a specic concept, in order to be classied later based on its given score [3]. Automotive industry is one of the largest and highly competitive economic sectors in the world. Due to the high competition, automotive companies are moving toward using social media sites to reach further customers and advertise their products in considerably short time. Twitter is one of the highest growing social media websites in the world. Twitter is a micro blogging services which enables users to tweet within any topic with a maximum length of 140 characters. As of June ZOIS', Twitter has more than 500 million users, out of which more than 302 million are active users. With an average of 500 million tweets created daily; twitter became one of the greatest sources of information that is available on the Internet [4]. Thus. twitter data can be very useful for automotive marketers because it can be used for mining consumers' opinions and reviews in the automotive industry using sentiment analysis. This can provide useful insights to help companies in creating a competitive advantage over their competitors. ' about.twitter.comr'company 2015 lEEE Jordan eanierence on Applied Electrical Enm'neortng and Computing recnnoiogiee (nelson This research applies sentiment analysis to analyse peoples' opinions and reviews about three automotive companies: Mercedes, Audi, and BMW. To do so, tweets are extracted from twitter and processed using text mining techniques. These tweets are then used in the sentiment analysis to classify tweets based on the sentiment that is expressed in a text [5]. At the end, tweets are classied into three categories: positive sentiment, negative sentiment, or neutral sentiment. As the attempts to apply applying sentiment analysis in the automotive industry, to the best of our knowledge, are very few [10, 11], the results of this research can provide further insights about the importance of analysing the consumers' reviews and opinions in this industry. The remainder of this paper is organized as follows: Section 11 presents the research work related to this research. Section III presents the methodology. Section IV presents a demonstration of the method on the case study and discusses the results. Section V concludes the paper with a summary and an outlook on future research direction. ll. RELATED WORK With the explosion of Web 2.0 platforms, social media sites become a huge source for consumer voices. Capturing and analyzing public opinions from social media sites has recently enjoyed a huge burst of research activity. One of The resulting emerging elds is sentiment analysis [1, 5]. Subsequently there have been literally hundreds of papers published on the subject. Among these papers, we focus on the most related to the work presented in this paper as follows: In paper [6], the authors analyzed three of the most popular companies in pizza industry by using text mining. The authors studied information from social media sites about the users of those companies and their competitors. The goal was to help those companies improve their services and strategies to attract more customers. They found that social media sites have an important role in creating competitive advantage. Authors recommended that good understanding and use of social media users' information can improve the relationship of companies with their users, improve their services' levels, and improve the quality of their decision. Another work [?] presented a new approach to provide decision support for vehicle defect discovery. Authors used many techniques such as text mining and sentiment analysis on popular social media communities. Their focus was on improving vehicle quality management by analyzing social media. They found that a good analysis of social media data can improve automotive quality management strategies. As an attempt to overcome the challenges that may face the developers while developing opining mining tools, the authors in [3] developed a model rule-based approach which can analyze the linguistics of social media sites. In [9], we can nd a case study which applies sentiment analysis on twitter. Authors presented a method to make sentiment analysis and opinion mining using tweets. The rst step in the presented method is collecting the corpus and preparing it for the analysis while the second one is building the model to classify the tweets using Naive Bayes algorithm (NB) based on sentiments (positive, negative and neutral). Another work [10] introduced what is called the JD. Power and Associates (JDPA) sentiment Corpus. The JDPA corpus consists of users' blog posts containing opinions about automobiles. Moreover, the authors presented statistics including inter-annotator agreement and catalogued components of sentiment that occur naturally. The authors in [I 1] analyzed a data set of around 730,000 Tweets published in a time frame of l9 weeks using sentiment analysis. Within this data set, they analyzed those Tweets dealing with the corporate crisis of Toyota in 2010. Their focus was on the dynamics of discussions in social media in order to reect sentiments within these discussions. The authors Identied and investigated specic stages of communication, which they called \"quiet stages" and \"'.'peaks lll. METHODOLOGY As the usage of social media sites grows and extends, the companies can use social media sites to assess their state in the market as well as their competitors. This can be done by studying the data generated by users on these sites. Such data tells about users' opinions and comments about these companies' products or services. Thus, in this paper we will study the automotive industry in social media, and try to answer the following questions: What is the rate of using these companies' data by users? What is the percentage of negative reviews and comments compared to the positive ones? Who is the leader in automotive sector based on polarity classications of reviews and comments? While the social media provides a great engagement of users, and leads to incredibly high level of communication between the user and the seller, still there are some industries that do not engage in social media. The automotive industry represents a great example of engagement in social media, as published in 2014 CMO council report: 1 out of 4 - which equals 23%- of car buyers has discussed other users' experiences and reviews before purchasing their car. 38% of cars' costumers said that they will use social media in the next purchase. 84% of the car's customers use Facebook with a 24% of them using social media sites to purchase their last car and in the range of October 2012- April 2013 an amazing increase in the number of clicks of automotive Ad's on Facebook occurred to jump up from 16% to 39%}. [n this paper, we will first discuss the level of engagements in social media of these three automotive manufacturers. We extracted the engagements percentage from the Talkwalker API'. BMW, Mercedes and Audi are defined to be of the largest automotive brands in Europe, it's very critical to discuss the level of their engagement in social media. Figure 1 shows the engagement percentage in different social media sites. 2 maniacounciloig J www.mlkmlketcom 2018 tEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECU As we can note in Figure 1. BMW has the largest engagement percentage in twitter with a percentage of 62%. Mercedes also has the largest engagement percentage throw online news. Blogs. and Other with [8%, 6%, and 30%, respectively. Audi also has engagement percentage through twitter comparing to Mercedes with a percentage of 59% (Audi), and 47% (Mercedes). I BMW l Mercedes I hurt! son -- son 5 g aux i run on W Onllne um Slog! m SOCIM MEDIA SITES Figure l. Social Media Sites engagement percentage A. Data collection In this paper. we collected data from twitter using the twitter API. The corpus had 3000 tweets, tweets are extracted using R\". B. Data pre-processing Tweets are filtered to be in English language. The corpus contains three types of cars: Mercedes, Audi. and BMW. Each type is represented by 1000 tweets. The tweets are extracted based on the search query using \"'@ annotation followed by the car's type. To build a good experiment, Dataset of each car's type was extracted from twitter pages and users. After that, we have started to prepare the extracted datasets by cleaning them from any unnecessary characters such as retweets and usernames' symbols, hashtags, numbers, punctuations, stop words, whitespaces and html links. In this paper, we applied the following text mining pre-processing techniques: - Tokenization: that reads the text that will be mined and removes all tabs and punctuations between words and replaces them with a white space, e Filtering: that will remove words such as: stop words, extremely repeated words and rarely repeated words, 0 Lemmatization: which will be used to transform all the verbs to the innite tense and all the nouns to the singular form. 0 Stemming: will be used to return all the words to their basic forms where it will remove the plural 's' from the nouns and the 'ing' from the verbs. 4 httpa:a'a'www.r-prnjelrtorgir C. Sentiment Analysis Models We used the classication algorithm Natve Bayes (NB) to classify the polarity and emotions in the sentiment analysis. The NB algorithm is simple. easy to implement and efficient with acceptable accuracy. Furthermore, two sentiment models are investigated based on polarity lexicon [13], and emotions lexicon [14]. The NB algorithm is a simple probabilistic model that assumes all the data attributes are independent. The probabilistic model uses the Bayes theorem to solve the classication problems such as the maximum posterior probability of the class label given the attributes set is calculated. Bayes theorem is given by the following equation: _ P!X |C!Pll l P(C|X) P(X} ( ) Where C is a Class label, X is the attributes set, while HO and P(X|C} are the prior probability of the class and the conditional probability of the attributes given the class. The first sentiment model uses NB classifier, which is trained by the training data set, and makes use of Wiebe's polarity lexicon [13]. The training data set is annotated to three classes: positive, neutral and negative tweets. The NB polarity classier uses polarity lexicon based on the matching criteria between the tweet words and lexicon words. When the training process is finished and the model is well trained. the second step begins to test the model using testing data set, which is not labeled. The testing process is used to assess the accuracy of the built model. The last step is to validate the model and extract the polarity percentages for the three categories: positive, negative, and neutral. The second NB classifier is trained on training data set and makes use of emotions lexicon using the Strapparava emotions lexicon [14]. The training data set is annotated to seven classes: anger, disgust, fear. joy, sadness, surprise, and unknown tweets. Like the polarity classication, the matching criteria between the tweet words and emotions lexicon words. IV. RESULTS The tweets collected about BMW, Mercedes, and Audi contains the @BMW tag. @Mercedesbenz, and @Audi, respectively. Each tweet is analysed and classied to be positive or negative or neutral tweet based on a query term and polarity classication. Table I, Table II, and Table III contain some tweet samples about BMW, Mercedes, and Audi, respectively and the polarity classications. TABLE ]: TWEETS' SAMBLES BMW Pol-ti CI-nlcatlo- it?\" Elegance and sportiness united Positive in one vehicle: the new #BMW #series Coupe sucnabadcarsamw mn_ 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) Figure 3 shows emotions classification results for three TABLE II: TWEETS' SAMPLES (MERCEDES) automotive companies. BMW emotion classifications are 79% labeled as "unknown", 5% "Joy", 0.5% "Surprise", 9% Tweet Polarity "Sadness", 0% "Fear", 5.5% "Anger" and 1% for "Disgust". Classification Mercedes emotions categories are 56.6% labeled as @MercedesBenz Intelligent Positive "Unknown", 31.9% "Joy", 0.5% "Surprise", 4.1% "Sadness", innovation and safety as never before. 0.4% "Fear", 6.4% "Anger" and 0.1% for "Disgust". Audi Preview of the future of the #EClass emotions categories are 63.2% labeled as "Unknown", 10% Amazing @ MercedesBenz 300 SLR Positive "Joy", 17.7% "Surprise", 5.1% "Sadness", 0.2% "Fear", 1.3% @MercedesBenz That's not what we'd Negative "Anger" and 2.4% for "Disgust". These results give a good expect. Please contact your local indicator for customers seeking to buy cars and help them to Workshop so that our Technicians take a right decision. We can note that, "joy" category was inspect the issue. better for BMW comparing to Mercedes and Audi. This is can be due to the fact that positive reviews are not necessary to be TABLE III: TWEETS' SAMPLES (AUDI) "Joy" always, other categories can be also determined as a positive, since it has no negative implication. Tweet Polarity Classification @audi Probably one of my Negative BMW Mercedes . Audi worst decisions was buying an 100.0% Proud to own an Audi @audi Positive @audi Sorry RPM but this is Negative 80.0% rubbish. There is so much great motor sport happening 60.07 and you dish up crap @Audi Excellent SUV from Positive 40.0% Audi! Beautiful Car! PERCENTAGE (%) 20.0% Polarity classification for BMW, Mercedes, and Audi are 0.0% shown in Figure 2. The figure shows that BMW has 72% unknown Joy Surprise Sadness Fear Anger Disgust positive tweets compared 79% for Mercedes and 83% for EMOTION CLASSIFICATIONS Audi. Furthermore, the figure shows that BMW has 8% negative polarity compared 18% for Mercedes and 16% for Fig 3. Emotion Classifications for BMW, Mercedes, and Audi Audi. This gives a good indication for customers seeking to buy cars from the manufacturers that have a good reviews and V. CONCLUSION comments from users owning this car and it gives indications to competitors that Audi is a huge competitor. Sentiment Analysis is considered one of the most attractive fields that encourage to study and apply in various sectors. In this paper, sentiment analysis models are applied on three of most leading automotive industry companies to extract the Positive Neutral Negative polarity and emotions (opinions) of customers around each 100% company, which are very useful information that helps in 79% 83% marketing. The results showed that Audi's positive polarity PERCENTAGE (%) 80% 72% was higher (83%) than other companies. On the other hand, the negative polarity of Audi is less than all other companies. 609 This means that for example offers in Audi's page would circulate to higher number of satisfied people than in BMW 409 and Mercedes. 20% 18% 20% 16% Furthermore, the analysis results show that that the 8% 3% percentage of positive reviews in Audi are the most among the 1% three companies with a percentage of 83%. In addition, Audi BMW Mercedes Audi negative polarity is less than others with a percentage of 16%. AUTOMOTIVE CLASS We can conclude that, the Audi users have more satisfaction comparing to the other users. This will help the users that welling to buy a car to compare between the three of the Fig 2. Polarity Classification for BMW, Mercedes, Audi companies based on the previous users' opinions. In addition, the emotions classification results were consistent with the polarity classifications, and give more information about each polarity class