Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Introductory Java Computer Science Question: Java Coding Question This program is about file processing. You will read a file containing all tweets sent by the

Introductory Java Computer Science Question:

Java Coding Question

This program is about file processing. You will read a file containing all tweets sent by the President since 5/1/20178 and collect and display statistical information about them. The tweets are to be examined according to a couple of categories. In particular, the AP folks are interested in isolating what percentage of the President's tweets contain inflammatory or "negative" keywords versus the number of tweets that are complimentary, affirmative or contain "positive" keywords. In addition to this information, the company is also interested in knowing which tweet was re-tweeted the most. If there's a tie for the most re-tweeted, output the one that came last in the file.

For purposes of clarification an affirming or "positive tweet" is a tweet line with at least one positive keyword. An inflammatory or "negative keyword" is a tweet line with at least one negative keyword. A tweet line with both positive keyword and negative keyword appearing at least once is counted as a positive tweet, a negative tweet and in addition a category called "both positive and negative tweet".

As you learned with the previous tweet analyzer program, it's important to have a clear and consistent definition for a "word". In this exercise, the company wants to be sure to catch all kinds of various of the keywords, for instance if the keyword is "fake", the company would like to catch any words that have "fake" in them, including ""faker", "#fake", "fake!!!", "(fake)", The company realizes that this definition may be overly broad and will pick up a few false readings, but they are willing to accept this because it will be a small number. The would however, like you to avoid flagging the word "sad" in "ambassador" since they believe that particularly keyword may show up frequently.

Specs and Decomposition:

As usual, we are giving you a decomposition that you must use. You are welcome to create additional methods as needed. Since you have now been trained in how to properly test your methods, the AP expects you will carefully test all methods and deliver to them a fully-functioning product.

Required constants - You must declare the following constants:

public static final String FILE = "trumpRecentTweets.csv"; public static final String POSITIVE = "wonderful good great fantastic super fabulous " + "amazing incredible tremendous winning best"; public static final String NEGATIVE = "fake illegal joke traitor criminal hoax overrated " + "dishonest worst loser bad idiot sad"; 

FILENAME is the name of the file containing the tweets. POSWORDS and NEGWORDS contain appropriate keywords to be on the lookout for in the tweets. [instructor note: Yes, we realize the above are not exhaustive lists, but please use them exactly as defined since the unit tests are based on these words only. Once you have submitted your work, you are free to play around on your own to analyze other words in the input file.]

The CSV file - You can find the CSV file located on Brightspace under assignments PA09. It is called trumpRecentTweets.csv. You can copy that file into the location in intelliJ where you keep your input files. Processing the input file: As you start processing each line of the CSV file, you'll notice that every line of the input file (i.e., each record) actually contains several pieces of information separated by commas. This means you'll have to "parse" this information a bit to isolate the information you need -- the tweet and the number of retweets this tweet received. Start thinking about you can accurately do that and watch this page for hints early next week.

public static File getInputFile(Scanner console) - This method will prompt the user for the path where the input file is located. In IntelliJ, the path may be iofiles/ as discussed in lecture. In Zybooks, the path is not needed so it will be the empty string when you are testing in development mode. Validation: Your method should first verify that the file exists -- if it does not exist, you should re-prompt for the path to the input file. Once you have determined the file exists, your program will display the file length (see sample run).

public static int startAt (String tweet) - This method returns the position of the start of the tweet and returns -1 if this tweet is not thought to be a valid tweet (i.e., does not contain any commas).

public static int searchTweet(String tweet, String keywords) - This method will return the number of occurrences where one of the words from a given keyword list appears in a tweet. [instructor note: We realize we are returning an "int" here rather than a boolean. This is because the client anticipates in the future they will want to display a total number of occurrences. Right now, it doesn't matter because all you need to know is that you can use this "int" result to determine whether an occurrence was found at all in the tweet. If you get a >0 back, you know a keyword was found. If you get a 0 back, no keyword was found]. You should make sure your method operates independently of whether a word is uppercase or lowercase (i.e., ignore case). Lastly, remember that the word you search for may appear anywhere in the tweet. Tip: Use a scanner assigned to the keywords string to go through each keyword and then look to see whether or not the tweet contains that keyword.

public static int getRetweetedCount(String tweet) - This method accepts a tweet record and locates and returns the integer that represent the retweet count for that tweet. Locating this is a little tricky since we can't depend on a count of how many commas separate the fields before the retweet count. This is because the tweet itself may contain commas. Instead, we will use an anchor in the line which are the characters "-2018" which are part of the time/datestamp for the tweet. However, to avoid a problem where the -2018 is actually part of a tweet rather than representing the timestamp, we will also include in our anchor the fact the characters that immediately follow the "-2018" must be a blank space, two digits and a colon. If we find this sequence, we can be pretty confident that after the next comma will be the retweet count. Tip: You will need to convert this retweet count from a String to an int. You can do this using the parseInt method. You can do something like this: int retweetCount = Integer.parseInt(retweetedString)

public static void displayStats(int totalCnt, int posCnt, int negCnt, int mixCnt, int invalidCount, int maxRetweet, String maxRetweetString) - This method displays the various tweet stats accumulated during the running of the program. In order to make the columns line up appropriately in the output, you should use the following format for the positive/negative/both keyword lines: System.out.printf("%-35s %4d %8.2f%%%n", title, posCnt, (double) 100 * posCnt / totalCnt) where title contains the string to be printed.

main - The main method is fairly straightforward. Open the file, read it one line at a time, analyze each line using the above described methods and track the results by counting tweets as requested. After you have finished processing the file, display the desired statistics (see sample run):

Important Submission Information

Since Zylabs does not need a path to the input file, it's important you remove any path information prior to submitting. In this program, since the path information is coming from the console, what you will need to do is put a single carriage return in the input area for Zylabs prior to running your program. That will signal to your program that no path is needed and then the filename will simply be passed onto Zylabs. You do not need to try to submit the input file to Zylabs since it already has a copy. You only need to submit your code.

Sample Run:

Enter path for file trumpRecentTweets.csv? iofiles/ File trumpRecentTweets.csv successfully found. Filesize: 458840 bytes Total number of tweets analyzed: 1777 Invalid tweets found: 4 Trump Tweetalyzer Statistics ------------------------------------------------------ Category # % "Positive" keywords: 547 30.78% "Negative" keywords: 275 15.48% Both "positive" and "negative": 75 4.22% Most retweets received:108867 Most retweeted string: ,RT @realDonaldTrump: They just didn't get it but they do now! https://t.co/9T50NupkDy,07-10

File can be found at:

https://openload.co/f/W8lkXQRe3Xc/trumpRecentTweets.csv

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Ai And The Lottery Defying Odds With Intelligent Prediction

Authors: Gary Covella Ph D

1st Edition

B0CND1ZB98, 979-8223302568

More Books

Students also viewed these Databases questions

Question

The company has fair promotion/advancement policies.

Answered: 1 week ago