Question
You will use nltk to explore the Herman Melville novel Moby Dick. Write the python code for answering the following questions: 1. Import the libraries
You will use nltk to explore the Herman Melville novel Moby Dick. Write the python code for answering the following questions:
1. Import the libraries required and Set up Data. Use the link https://www.gutenberg.org/files/2701/old/moby10b.txt to access the ebook.
2. Find how many tokens (words and punctuation symbols) are in the text. A token is a linguistic unit such as a word, punctuation mark, or alpha-numeric strings.
3. Find how many of the tokens found in 1.2) are unique.
4. Find how many tokens are unique after removing stopwords.
5. What is the lexical diversity of the given text input? (i.e. ratio of unique tokens to the total number of tokens)
6. What percentage of tokens is 'whale' or 'Whale'?
7. What are the 20 most frequently occurring (unique) tokens in the text? What is their frequency?
8. What tokens have a length of greater than 5 and frequency of more than 150?
9. Find the longest word in the text and that word's length.
10.What unique words (not punctuation) have a frequency of more than 2000? What is their frequency?
11. What is the average number of tokens per sentence?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started