Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 03, 2024

Lab Assignment - Sentiment Analrsis and SVM [ 8 marks] We will analyze positive and negative reviews comments of a movie. You will use the

Lab Assignment - Sentiment Analrsis and SVM [ 8 marks] We will analyze positive and negative reviews comments of a movie. You will use the movie dataset provided for this lab assignment accessible from your blackboard (MOB dotoset sample, csv). This assignment consists of 2 parts: (1) Looking at your sentiment analysis output; (2) analyze your data with SVM operator. There are 8 questions you need to answer - questions (A) to (A) below. Part 1. Text Analysis 1. Read your data using suitable Read operator. When you run this operator, it will show you a table with every sentence in a separate row. Observe the data in the table and answer the following questions. A. List the word you think frequesty appear in the dataset for positive sentiment, and negative sentiment. List down FIVE werds for positive sentiment, and FIVE words for neganive sentienent. [This as your own observation of the data. Do not run the occurrences model YET] Enample for posifive seatimine - sood. Erample for sicgative senaiment - poor. [I mark) 2. Nent, you need to teil Rapid Miner to treat the messages data as text (Use Nominal to Text operator). 3. Use Process Document from Data opentor. - Unchecked 'create ward vector' and 'add meta information' - Set the prune method to 'absolute'. Fill in 2 for 'prune below absolute' and 99999 for 'prune above absolute', This way we ignore words that only appear in one document. 4. Process Document from Data is a nested process. Go inside the process and do the following- - Tokenize the sentences - Changes all words to lowercase - Remove the stopwords from the sentences - Generate n-grams for the sentences with max length set to 2 5. Now process the output to look at the most frequent word. B. Analye the word frecuency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set fhow many people are mentioning this B. Analyze the word frequency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set (how many people are mentioning this word)? Example - Good was mention in total 100 times in 6 reviews. Do this for all the words you mentioned in (A). [1 mark] Note - this task is time consuming You have to scroll down the list of the word to find out whether it appears or not. 6. Create a word cloud of the top 20 n-grams. - Add Wordlist to Data operator in your design. Place it after the Process Documents from Data operator. Connect the 'wor' port of the process Documents from Data to the 'wor' port of the Wordlist to Data operator - Add Sort operator to the Wordlist to Data. Connect the 'exa' port from Wordlist to Data to the Sort operator. Set Sort operator to 'descending', and set the attribute name to 'total' (you have to type this). - Add Filter Example Range operator after Sort operator. Set Filter Example Range to first =1, and last =20. This is done to include the top 20 most occurring words in our wordeloud. - You process design should look like in the following diagram. - How to view your result: - Go to the Data tab for the ExampleSet, and click for Visualization - Select 'word cloud' for the plot type - Select 'word' for value column - Select 'total' for weight C. Screen shot your wordcloud. Does the wordcloud met your expectation of the word list? Why yes, or, why not? [1 mark] - How to solve your issue in C)? Let's adjust something. Go back to the design, Go into your Process Documents from Data operator Add Filter Tokens (by Length) operator before Generate n-grams operator. Set the min chars (minimum characters or letter of a word) to 3 , and maximum to 999. D. Run your model again, and make a new wordcloud. Screen shot your new wordcloud. [1 mark] E. Modify your model, and make it looks like the following. Explain what does connecting the port 'ori' to 'res' do? [1 mark] Lab Assignment - Sentiment Analrsis and SVM [ 8 marks] We will analyze positive and negative reviews comments of a movie. You will use the movie dataset provided for this lab assignment accessible from your blackboard (MOB dotoset sample, csv). This assignment consists of 2 parts: (1) Looking at your sentiment analysis output; (2) analyze your data with SVM operator. There are 8 questions you need to answer - questions (A) to (A) below. Part 1. Text Analysis 1. Read your data using suitable Read operator. When you run this operator, it will show you a table with every sentence in a separate row. Observe the data in the table and answer the following questions. A. List the word you think frequesty appear in the dataset for positive sentiment, and negative sentiment. List down FIVE werds for positive sentiment, and FIVE words for neganive sentienent. [This as your own observation of the data. Do not run the occurrences model YET] Enample for posifive seatimine - sood. Erample for sicgative senaiment - poor. [I mark) 2. Nent, you need to teil Rapid Miner to treat the messages data as text (Use Nominal to Text operator). 3. Use Process Document from Data opentor. - Unchecked 'create ward vector' and 'add meta information' - Set the prune method to 'absolute'. Fill in 2 for 'prune below absolute' and 99999 for 'prune above absolute', This way we ignore words that only appear in one document. 4. Process Document from Data is a nested process. Go inside the process and do the following- - Tokenize the sentences - Changes all words to lowercase - Remove the stopwords from the sentences - Generate n-grams for the sentences with max length set to 2 5. Now process the output to look at the most frequent word. B. Analye the word frecuency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set fhow many people are mentioning this B. Analyze the word frequency and check if the word you mention in your answer in question (A) is what you see in this result. Write down how many occurrences of this word you see, and how many times in appears in the data set (how many people are mentioning this word)? Example - Good was mention in total 100 times in 6 reviews. Do this for all the words you mentioned in (A). [1 mark] Note - this task is time consuming You have to scroll down the list of the word to find out whether it appears or not. 6. Create a word cloud of the top 20 n-grams. - Add Wordlist to Data operator in your design. Place it after the Process Documents from Data operator. Connect the 'wor' port of the process Documents from Data to the 'wor' port of the Wordlist to Data operator - Add Sort operator to the Wordlist to Data. Connect the 'exa' port from Wordlist to Data to the Sort operator. Set Sort operator to 'descending', and set the attribute name to 'total' (you have to type this). - Add Filter Example Range operator after Sort operator. Set Filter Example Range to first =1, and last =20. This is done to include the top 20 most occurring words in our wordeloud. - You process design should look like in the following diagram. - How to view your result: - Go to the Data tab for the ExampleSet, and click for Visualization - Select 'word cloud' for the plot type - Select 'word' for value column - Select 'total' for weight C. Screen shot your wordcloud. Does the wordcloud met your expectation of the word list? Why yes, or, why not? [1 mark] - How to solve your issue in C)? Let's adjust something. Go back to the design, Go into your Process Documents from Data operator Add Filter Tokens (by Length) operator before Generate n-grams operator. Set the min chars (minimum characters or letter of a word) to 3 , and maximum to 999. D. Run your model again, and make a new wordcloud. Screen shot your new wordcloud. [1 mark] E. Modify your model, and make it looks like the following. Explain what does connecting the port 'ori' to 'res' do? [1 mark]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Illuminated

Authors: Catherine M Ricardo, Susan D Urban

3rd Edition

1284056945, 9781284056945

More Books

Students also viewed these Databases questions

Question

★★★★★

Answer the following questions: 1. Build the management-research question hierarchy for Starbuck project 2. the Duetto Card team turned to Green field Online to recruit a panel for one of its online...

Answered: 1 week ago

Question

★★★★★

1.7 Voter Attitudes You are a candidate for your provincial assembly, and you want to survey voter attitudes regarding your chances of winning. Identify the population that is of interest to you and...

Answered: 1 week ago

Question

★★★★★

Bhushan Company has been using LIFO for inventory purposes because it would prefer to keep gross profits low for tax purposes. In its second year of operation (20-2), the controller pointed out that...

Answered: 1 week ago

Question

★★★★★

The Regal Cycle Company manufactures three types of bicyclesa dirt bike, a mountain bike, and a racing bike. Data on sales and expenses for the past quarter follow: Total Dirt Bikes Mountain Bikes...

Answered: 1 week ago

Question

★★★★★

Rosie's Company has three products, P1, P2, and P3. The maximum Rosie's can sell is 66,000 units of P1, 25,000 units of P2, and 13,000 units of P3. Rosie's has limited production capacity of 148,000...

Answered: 1 week ago

Question

★★★★★

In the field of computer networking there is an imprecise relationship between the level of use of a network communication bandwidth and the latency experienced in peer-to-peer communication. Let X~...

Answered: 1 week ago

Question

★★★★★

Time Left: 09:2 Uncategorized i Question Type: MCQ Question No. 14. Mark/s: 4.00 | Negative Mark/s: 0.00 A manufacturing firm would like to develop an aggregate plan for the next quarter. Data...

Answered: 1 week ago

Question

★★★★★

What is the correct way to pass a vector through this method declaration in C++? Where would it go exactly and what is the syntax for it? Thank you! void MergeSort (SortBy sortBy);

Answered: 1 week ago

Question

★★★★★

The use of loyalty programs in the marketplace allows companies to collect more and more data on customers. Some consumers do not realize this information is being compiled and used to market to...

Answered: 1 week ago

Question

★★★★★

(x) Consider the one-dimensional (1D) discrete system with linear springs depicted in Figure 1 for which we have the following data: 1) Boundary conditions (fixed support): U1 = 0, 42 = 0 2) External...

Answered: 1 week ago

Question

★★★★★

17.2 Describe the influence of intercountry differences on the workplace.

Answered: 1 week ago

Question

★★★★★

1. What type of a union is Unifor? What evidence do you have to support your position? Unifor (Canadas largest private sector union) was offi cially formed on August 31, 2013, through a merger of two...

Answered: 1 week ago

Question

★★★★★

16.7 Describe the three steps in the collective bargaining process.

Answered: 1 week ago

Previous Question Next Question