Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

List the code and output. 1. Import Python nltk and random packages. Load the movie_reviews corpus (1000 positive files and 1000 negative files) from nltk.

List the code and output.

1. Import Python nltk and random packages. Load the movie_reviews corpus (1000 positive files and 1000 negative files) from nltk. How many words are there in this corpus? What are the two movie review categories? For more details about this corpus, run movie_reviews.readme( ).

2. Create a Python list named documents. Each list element contains the words used in a movie review and the reviews category. Randomly shuffle the list.

3. Create a list named word_features that contains the 2000 most frequent words in the overall corpus. These 2000 words should not include stop words or punctuation marks.

4. Define the document_features function that shows whether each review file contains any of the 2000 most frequent words. Apply the function to each element of the document list, and then create a list named featuresets that combines each review files document features with its category.

5. Split featuresets into test_set (the first 100 review files) and the train_set (the other 1900 review files). Apply nltks NaiveBayesClassifier to the train_set. Whats the trained models out-of-sample prediction accuracy for the test_set? Show the top 15 most informative words for the Nave Bayes classifier.

6. Use twenty fold cross validation to show how the Bayes classifier performs over different subsets of featuresets. Display the twenty out-of-sample prediction accuracy rates and the overall prediction accuracy (i.e., the average of the twenty accuracy rates).

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M Kroenke, David J Auer

6th Edition

0132742926, 978-0132742924

More Books

Students also viewed these Databases questions