Lab Assignment - Sentiment Analrsis and SVM [ 8 marks] We will analyze positive and negative reviews comments of a movie. You will use the movie dataset provided for this lab assignment accessible from your blackboard (MOB dotoset sample, csv). This assignment consists of 2 parts: (1) Looking at your sentiment analysis output; (2) analyze your data with SVM operator. There are 8 questions you need to answer - questions (A) to (A) below. Part 1. Text Analysis 1. Read your data using suitable Read operator. When you run this operator, it will show you a table with every sentence in a separate row. Observe the data in the table and answer the following questions. A. List the word you think frequesty appear in the dataset for positive sentiment, and negative sentiment. List down FIVE werds for positive sentiment, and FIVE words for neganive sentienent. [This as your own observation of the data. Do not run the occurrences model YET] Enample for posifive seatimine - sood. Erample for sicgative senaiment - poor. [I mark) 2. Nent, you need to teil Rapid Miner to treat the messages data as text (Use Nominal to Text operator). 3. Use Process Document from Data opentor. - Unchecked 'create ward vector' and 'add meta information' - Set the prune method to 'absolute'. Fill in 2 for 'prune below absolute' and 99999 for 'prune above absolute', This way we ignore words that only appear in one document. 4. Process Document from Data is a nested process. Go inside the process and do the following- - Tokenize the sentences - Changes all words to lowercase - Remove the stopwords from the sentences - Generate n-grams for the sentences with max length set to 2 5. Now process the output to look at the most frequent word. B. Analye the word frecuency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set fhow many people are mentioning this B. Analyze the word frequency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set (how many people are mentioning this word)? Example - Good was mention in total 100 times in 6 reviews. Do this for all the words you mentioned in (A). [1 mark] Note - this task is time consuming You have to scroll down the list of the word to find out whether it appears or not. 6. Create a word cloud of the top 20 n-grams. - Add Wordlist to Data operator in your design. Place it after the Process Documents from Data operator. Connect the 'wor' port of the process Documents from Data to the 'wor' port of the Wordlist to Data operator - Add Sort operator to the Wordlist to Data. Connect the 'exa' port from Wordlist to Data to the Sort operator. Set Sort operator to 'descending', and set the attribute name to 'total' (you have to type this). - Add Filter Example Range operator after Sort operator. Set Filter Example Range to first =1, and last =20. This is done to include the top 20 most occurring words in our wordeloud. - You process design should look like in the following diagram. - How to view your result: - Go to the Data tab for the ExampleSet, and click for Visualization - Select 'word cloud' for the plot type - Select 'word' for value column - Select 'total' for weight C. Screen shot your wordcloud. Does the wordcloud met your expectation of the word list? Why yes, or, why not? [1 mark] - How to solve your issue in C)? Let's adjust something. Go back to the design, Go into your Process Documents from Data operator Add Filter Tokens (by Length) operator before Generate n-grams operator. Set the min chars (minimum characters or letter of a word) to 3 , and maximum to 999. D. Run your model again, and make a new wordcloud. Screen shot your new wordcloud. [1 mark] E. Modify your model, and make it looks like the following. Explain what does connecting the port 'ori' to 'res' do? [1 mark] Lab Assignment - Sentiment Analrsis and SVM [ 8 marks] We will analyze positive and negative reviews comments of a movie. You will use the movie dataset provided for this lab assignment accessible from your blackboard (MOB dotoset sample, csv). This assignment consists of 2 parts: (1) Looking at your sentiment analysis output; (2) analyze your data with SVM operator. There are 8 questions you need to answer - questions (A) to (A) below. Part 1. Text Analysis 1. Read your data using suitable Read operator. When you run this operator, it will show you a table with every sentence in a separate row. Observe the data in the table and answer the following questions. A. List the word you think frequesty appear in the dataset for positive sentiment, and negative sentiment. List down FIVE werds for positive sentiment, and FIVE words for neganive sentienent. [This as your own observation of the data. Do not run the occurrences model YET] Enample for posifive seatimine - sood. Erample for sicgative senaiment - poor. [I mark) 2. Nent, you need to teil Rapid Miner to treat the messages data as text (Use Nominal to Text operator). 3. Use Process Document from Data opentor. - Unchecked 'create ward vector' and 'add meta information' - Set the prune method to 'absolute'. Fill in 2 for 'prune below absolute' and 99999 for 'prune above absolute', This way we ignore words that only appear in one document. 4. Process Document from Data is a nested process. Go inside the process and do the following- - Tokenize the sentences - Changes all words to lowercase - Remove the stopwords from the sentences - Generate n-grams for the sentences with max length set to 2 5. Now process the output to look at the most frequent word. B. Analye the word frecuency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set fhow many people are mentioning this B. Analyze the word frequency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set (how many people are mentioning this word)? Example - Good was mention in total 100 times in 6 reviews. Do this for all the words you mentioned in (A). [1 mark] Note - this task is time consuming You have to scroll down the list of the word to find out whether it appears or not. 6. Create a word cloud of the top 20 n-grams. - Add Wordlist to Data operator in your design. Place it after the Process Documents from Data operator. Connect the 'wor' port of the process Documents from Data to the 'wor' port of the Wordlist to Data operator - Add Sort operator to the Wordlist to Data. Connect the 'exa' port from Wordlist to Data to the Sort operator. Set Sort operator to 'descending', and set the attribute name to 'total' (you have to type this). - Add Filter Example Range operator after Sort operator. Set Filter Example Range to first =1, and last =20. This is done to include the top 20 most occurring words in our wordeloud. - You process design should look like in the following diagram. - How to view your result: - Go to the Data tab for the ExampleSet, and click for Visualization - Select 'word cloud' for the plot type - Select 'word' for value column - Select 'total' for weight C. Screen shot your wordcloud. Does the wordcloud met your expectation of the word list? Why yes, or, why not? [1 mark] - How to solve your issue in C)? Let's adjust something. Go back to the design, Go into your Process Documents from Data operator Add Filter Tokens (by Length) operator before Generate n-grams operator. Set the min chars (minimum characters or letter of a word) to 3 , and maximum to 999. D. Run your model again, and make a new wordcloud. Screen shot your new wordcloud. [1 mark] E. Modify your model, and make it looks like the following. Explain what does connecting the port 'ori' to 'res' do? [1 mark]