Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. A collection of reviews about comedy movies (data D) contains the following keywords and binary labels for whether each movie was funny (+) or

image text in transcribed

1. A collection of reviews about comedy movies (data D) contains the following keywords and binary labels for whether each movie was funny (+) or not funny (-). The data are shown below: for example, the cell at the intersection of "Review 1" and "laugh" indicates that the text of Review 1 contains 2 tokens of the word "laugh." Review laugh hilarious awesome dull yawn bland | Y 1 1 1 0 + 2 0 0 0 + 0 0 0 1 + 0 2 1 0 - 1 2 0 You may find it easier to complete this problem if you copy the data into a spreadsheet and use formulas for calculations, rather than doing calculations by hand. Please report all scores as log-probabilities, with 3 significant figures (10 pts (a) Assume that you have trained a Naive Bayes model on data D to detect funny vs. not funny movie reviews. Compute the model's predicted score for funny and not-funny to the following sentence S i.e. P(+S) and P(-1S)), and determine which label the model will apply to S. (4 pts) S: "This film was hilarious! I didn't yawn once. Not a single bland moment. Every minute was a laugh." (b) The counts in the original data are sparse and may lead to overfitting, e.g. a strong prior on assigning the "not funny" label to reviews that contain "yawn." What would happen if you applied smoothing? Apply add-1 smoothing and recompute the Naive Bayes model's predicted scores for S. Did the label change? (4 pts) (c) What is an additional feature that you could extract from text to improve the classification of sentences like S, and how would it help improve the classification? (2 pt]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Sams Teach Yourself Beginning Databases In 24 Hours

Authors: Ryan Stephens, Ron Plew

1st Edition

067232492X, 978-0672324925

More Books

Students also viewed these Databases questions

Question

Explain walter's model of dividend policy.

Answered: 1 week ago

Question

2. To compare the costs of alternative training programs.

Answered: 1 week ago