Question
Goal: The goal of this assignment is to help students familiarize themselves with the following Java programming concepts: 1. Input/Output to and from the terminal.
Goal: The goal of this assignment is to help students familiarize themselves with the following Java programming concepts:
1. Input/Output to and from the terminal.
2. Storing data in a file and reading data from a file.
3. Creating object-oriented classes and methods to handle data.
4. Using data structures to store data in main memory (e.g. HashSet).
5. Working with character strings.
6. Using Javadoc comments and generating and html documentation of the program.
Description: For this assignment you will create a program to classify a set of Tweets as positive, negative, or neutral based on their sentiment. This process is known as Sentiment Analysis. More information about sentiment analysis can be found on Wikipedia and other sources. Although complex algorithms have been developed for sentiment analysis, in this assignment we will classify a tweet as positive, negative, or neutral, by just counting the number of positive and negative words that appear in that tweet. These positive and negative words will be given to the program as input in two separate files, namely positive-words.txt and negative-words.txt. The set of tweets to be classified, will be also given as an input to the program in a CSV (comma separated values) file: testdata.manual.2009.06.14.csv The file contains 498 tweets extracted using the twitter API. The tweets have been annotated (0 = negative, 2 = neutral, 4 = positive) and they can be used to detect sentiment. It contains the following 6 fields:
1. target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
2. ids: The id of the tweet ( 2087)
3. date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
4. flag: The query (lyx). If there is no query, then this value is NO_QUERY.
5. user: the user that tweeted (robotickilldozr)
6. text: the text of the tweet We only care about fields 1 and 6 in this file.
Your program should operate in the following manner:
1. When the program starts, it asks the user to provide the file paths of the positive words, negative words, and twitter data file.
2. The program loads the positive words and negative words and stores them in two separate lookup tables. The HashSet data structure can be used as a lookup table in Java as it provides a fast way to look if a word exists in it or not.
3. The program iterates over the tweets in the twitter data file and it counts the number of positive and negative words that the tweet contains. If the tweet contains more positive than negative words it is classified as positive, and vice versa. If no positive or negative words were found on the tweet, it is classified as neutral. It the same number of positive and negative words were found on the tweet, it counts as negative.
4. After each tweet has been classified, the program prints out in the command line the tweet itself, its real label and its predicted label.
5. At the end the program should also print how many tweets were correctly classified and how many were misclassified.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started