Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will use nltk to explore the Herman Melville novel Moby Dick. Write the python code for answering the following questions: 1. Import the libraries

You will use nltk to explore the Herman Melville novel Moby Dick. Write the python code for answering the following questions:

1. Import the libraries required and Set up Data. Use the link https://www.gutenberg.org/files/2701/old/moby10b.txt to access the ebook.

2. Find how many tokens (words and punctuation symbols) are in the text. A token is a linguistic unit such as a word, punctuation mark, or alpha-numeric strings.

3. Find how many of the tokens found in 1.2) are unique.

4. Find how many tokens are unique after removing stopwords.

5. What is the lexical diversity of the given text input? (i.e. ratio of unique tokens to the total number of tokens)

6. What percentage of tokens is 'whale' or 'Whale'?

7. What are the 20 most frequently occurring (unique) tokens in the text? What is their frequency?

8. What tokens have a length of greater than 5 and frequency of more than 150?

9. Find the longest word in the text and that word's length.

10.What unique words (not punctuation) have a frequency of more than 2000? What is their frequency?

11. What is the average number of tokens per sentence?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design And Implementation

Authors: Edward Sciore

2nd Edition

3030338355, 978-3030338350

More Books

Students also viewed these Databases questions