Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 10, 2024

Methods to help with the rating prediction task: - public void createStopWordsSet (String inFile, String outFile) Read the stopwords.txt as the input file (inFile) and

Methods to help with the rating prediction task:

- `public void createStopWordsSet (String inFile, String outFile)`

Read the stopwords.txt as the input file (inFile) and create a HashSet of stop words and also output the HashSet into an output file (outFile) called uniqueStopwords.txt. This output file should contain one stop word in each line and should not have any duplicate stop words.

- `public void cleanData (String inFile, String outFile, boolean ratingIncluded)`

This method should read a raw input file and output a cleaned data file: For example:

Input File Name	Output File Name
`rawReviewRatings.txt`	`cleanReviewRatings.txt`
`rawReviews.txt`	`cleanReviews.txt`
`rawReviewRatingsBig.txt`	`cleanReviewRatingsBig.txt`
`rawReviewsBig.txt`	`cleanReviewsBig.txt`

You can read in the reviews line by line and call in the methods for cleaning the data one after the other which were described above. Make sure you call the functions for cleaning the data in this order:

splitLine()
splitAtHyphensAndQuotes()
removePunctuation()
removeWhiteSpaces()
removeEmptyWords()
removeSingleLetterWords()
toLowerCase()
removeStopWords()

The cleaned files should have the same structure as the input files i.e; if the file had ratings followed by review then your output file should also have rating followed by cleaned review. If the input file has only reviews then the output file should have only cleaned reviews. The boolean flag ratingIncluded can be used to differentiate between files that have rating and reviews (ratingIncluded = true) and files that have only reviews (ratingIncluded = false).

- `public void updateHashMap(String inCleanFile)`

This method should take in the cleaned data file as the input and use it to update the HashMap as explained before in the instance variable section.

- public void rateReviews (String inCleanFile, String outRatingsFile)

For this, first you should have already cleaned the file with only reviews given to you. Once you have the cleaned file (for example: cleaned data file is cleanReviews.txt for the given raw review file i.e; rawReviews.txt), you will predict the ratings for the reviews given in this cleaned file. Using the HashMap that you created in the previous step, you are going to read new unrated reviews from a cleaned file (e.g. cleanReviews.txt) and predict a rating for each review in this file. The predicted rating for each review is written to an output file named ratings.txt.

How do we predict the ratings for the unrated reviews?

Rate each review by finding the rating for each word from the HashMap that was updated previously. The rating for a line of review is the average value of the rating of all the words in the review. If some word in this unrated review is not found in the HashMap, then that word is given a neutral rating of 2. If a review is empty (i.e. the review contains no words in it), then such a review is also given a neutral rating of 2.

based on these methods image text in transcribed

public void createStopWordsSet (String inFile, String outFile)

Read the stopwords.txt as the input file (inFile) and create a HashSet of stop words and also output the HashSet into an output file (outFile) called uniqueStopwords.txt. This output file should contain one stop word in each line and should not have any duplicate stop words.

- public void cleanData (String inFile, String outFile, boolean ratingIncluded)

This method should read a raw input file and output a cleaned data file: For example:

Input File Name Output File Name

rawReviewRatings.txt cleanReviewRatings.txt

rawReviews.txt cleanReviews.txt

rawReviewRatingsBig.txt cleanReviewRatingsBig.txt

rawReviewsBig.txt cleanReviewsBig.txt

splitLine()

splitAtHyphensAndQuotes()

removePunctuation()

removeWhiteSpaces()

removeEmptyWords()

removeSingleLetterWords()

toLowerCase()

removeStopWords()

For example:

If the input file rawReviewRatings.txt file contains:

4 The Jungle Book is awesome!

2 "The Lion King" is awe-inspiring !

0 Jack and Jill is worst!

1 " Finding Dory" is good .

3 Zootopia is fantastic.

4 Jungle Book is fantastic.

3 Lion King is fantastic.

Then the output file cleanReviewRatings.txt file should contain:

4 jungle book awesome

2 lion king awe inspiring

0 jack jill worst

1 finding dory good

3 zootopia fantastic

4 jungle book fantastic

3 lion king fantastic

- public void updateHashMap(String inCleanFile)

This method should take in the cleaned data file as the input and use it to update the HashMap as explained before in the instance variable section.

For example: The HashMap for the above cleaned data file cleanReviewRatings.txt should be as follows:

Key Values

king [5, 2]

book [8, 2]

inspiring [2, 1]

finding [1, 1]

good [1, 1]

lion [5, 2]

awe [2, 1]

jack [0, 1]

jill [0, 1]

zootopia [3, 1]

awesome [4, 1]

fantastic [10, 3]

worst [0, 1]

jungle [8, 2]

dory [1, 1]

- public void rateReviews (String inCleanFile, String outRatingsFile)

For this, first you should have already cleaned the file with only reviews given to you. Once you have the cleaned file (for example: cleaned data file is cleanReviews.txt for the given raw review file i.e; rawReviews.txt), you will predict the ratings for the reviews given in this cleaned file. Using the HashMap that you created in the previous step, you are going to read new unrated reviews from a cleaned file (e.g. cleanReviews.txt) and predict a rating for each review in this file. The predicted rating for each review is written to an output file named ratings.txt.

How do we predict the ratings for the unrated reviews?

Rate each review by finding the rating for each word from the HashMap that was updated previously. The rating for a line of review is the average value of the rating of all the words in the review. If some word in this unrated review is not found in the HashMap, then that word is given a neutral rating of 2. If a review is empty (i.e. the review contains no words in it), then such a review is also given a neutral rating of 2.

e.g. Let see how we computed the rating for the 2nd review in cleanReviews.txt (i.e. lion king fantastic). We lookup the HashMap that we created before (in the updateHashMap method description) and get the average rating for each word in this review. The average ratings of each word in this review is shown below:

lion: 5/2 = 2.5

king: 5/2 = 2.5

fantastic: 10/3 = 3.3333333

Rating for this line = (2.5 + 2.5 + 3.3333333) / 3 = 2.7777777

After rounding up, Rating for this line = 2.8

(We are dividing by 3 since this review contains 3 words in total)

Based on these individual values, this line gets an average review of 2.7 which will be written in the ratings.txt file. Make sure to use floating point division instead of integer division since the decimal points will be truncated in the latter. Also, once you get the final rating for a particular review, round it to just one decimal place (do not simply truncate). For example, in this above example, the floating point division would give 2.7777777, but you need to round this upto one decimal place into your ratings.txt file which will be 2.8 after rounding off. Also, if the final rating you get is 2 then make sure your value is 2.0 in the output file.

Another example: finding nemo great

finding: 1

nemo: 2 (because it is NOT found in the HashMap)

great: 2 (because it is also NOT found in the HashMap)

Rating for this line = (1 + 2 + 2) / 3 = 1.6666667

After rounding up, Rating for this line = 1.7

Write the corresponding ratings for all the reviews to an output file named ratings.txt.

For example, You have been given a file called rawReviews.txt. For this file, after cleaning the data and predicting the ratings should give the following result:

rawReviews.txt cleanReviews.txt ratings.txt

I like "The Jungle Book". like jungle book 3.3

The Lion King is fantastic ! lion king fantastic 2.8

Jack and Jill is bad. jack jill bad 0.7

Finding Nemo is great! finding nemo great 1.7

Zootopia is awesome ! zootopia awesome 3.5

12 901WdS 92% 5:41 PM Fri Feb 21 $ 92% import java.io. Buffered Reader; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader: import java.io.IOException; @author baderddine import java.util.ArrayList; package words; import java.util.HashSet: public class Words // Put the words in arraylist public static ArrayList splitLine(String sentence) { for (int i = 0; i list = new ArrayList(); Stringlx return list; x = sentence.split(""); for (String y: words) { // Split single quote ** V it ": for (int i0;i splitHyphens And Quotes(ArrayList words) { list.add(0); ArrayList list new ArrayList(); Stringlx // Returned list variable ArrayList newList = new ArrayList(); for (String y: list) { // Split single hyphens x = y.split(""); for (int i = 0; i removePunctuation(ArrayList words) { ArrayList list = new ArrayList0; if (x.contains(""){ X X .replace( ): if (x.contains("")) { x = x.replace("","); if (x.contains( )) { X X.replace(""""); if (x.contains("")) { x = x.replace("s", "); if (x.contains("%")) { x = x.replace("%", ""); if ix.contains("")) X = x.replace("&", ""); if (x.contains( )) { X = x.replace(" "); if ix.contains("")) { XX.replace(""); if (x.contains())) X X.replace("",""); if (x.contains("**) X X.replace( ): if (x.contains("")) { X X.replace(""""); if (x.contains("")) { x= x.replace( ): if (x.contains". XX.replace(" "); .contains X.replace(", X ifix.contains replace( ) if.contains X X.replace( ): contains XX.replace x contains replace , return it public static ArrayList removeEmpty WordsArrayList word) ArrayList list new ArrayList remove SingleLetter Words ArrayList list = new ArrayList 1) list addid; Remove all Single letter words return list: for (String x words) public static ArrayList removeStopWords (ArrayList words) throws FileNotFoundException, IOException The list we will return ArrayList list new ArrayList(); The stop Words HashSetString StopWords new HashSetString You must change the path of your stowards file Reading te Flele new File("/home/baderdding/Bureau/stopWords.txt Strings Getting Stors and put them in the while stradine ArrayList String news for String news stood ear Buffered Reader brew Bufforderw Compare our words with asset towards for (String words Words.com sta : Remove all empty words for String: word) Lowercase stadt public static ArrayListrito LowCasa Ama ArrayList splitLine(String sentence) { for (int i = 0; i list = new ArrayList(); Stringlx return list; x = sentence.split(""); for (String y: words) { // Split single quote ** V it ": for (int i0;i splitHyphens And Quotes(ArrayList words) { list.add(0); ArrayList list new ArrayList(); Stringlx // Returned list variable ArrayList newList = new ArrayList(); for (String y: list) { // Split single hyphens x = y.split(""); for (int i = 0; i removePunctuation(ArrayList words) { ArrayList list = new ArrayList0; if (x.contains(""){ X X .replace( ): if (x.contains("")) { x = x.replace("","); if (x.contains( )) { X X.replace(""""); if (x.contains("")) { x = x.replace("s", "); if (x.contains("%")) { x = x.replace("%", ""); if ix.contains("")) X = x.replace("&", ""); if (x.contains( )) { X = x.replace(" "); if ix.contains("")) { XX.replace(""); if (x.contains())) X X.replace("",""); if (x.contains("**) X X.replace( ): if (x.contains("")) { X X.replace(""""); if (x.contains("")) { x= x.replace( ): if (x.contains". XX.replace(" "); .contains X.replace(", X ifix.contains replace( ) if.contains X X.replace( ): contains XX.replace x contains replace , return it public static ArrayList removeEmpty WordsArrayList word) ArrayList list new ArrayList remove SingleLetter Words ArrayList list = new ArrayList 1) list addid; Remove all Single letter words return list: for (String x words) public static ArrayList removeStopWords (ArrayList words) throws FileNotFoundException, IOException The list we will return ArrayList list new ArrayList(); The stop Words HashSetString StopWords new HashSetString You must change the path of your stowards file Reading te Flele new File("/home/baderdding/Bureau/stopWords.txt Strings Getting Stors and put them in the while stradine ArrayList String news for String news stood ear Buffered Reader brew Bufforderw Compare our words with asset towards for (String words Words.com sta : Remove all empty words for String: word) Lowercase stadt public static ArrayListrito LowCasa Ama ArrayList