Answered step by step
Verified Expert Solution
Question
1 Approved Answer
n the next few problems, we will work with a text file that contains the complete works of William Shakespeare. The data file using for
n the next few problems, we will work with a text file that contains the complete works of William Shakespeare.
The data file using for this problem is located at: FileStoretablesshakespearecomplete.txt
We will begin by loading and processing the file and tokenizing the lines into individual words.
Complete the following steps in a single code cell:
Read the contents of the file shakespearecomplete.txt into an RDD named wslines.
Create an RDD named wswords by applying the transformations described below. This will require
several uses of map and flatMap and a single call to filter Try to chain together the
transformations together to complete all of these steps with a single statement that will likely span
multiple lines
Tokenize the strings in wslines by splitting them on the characters in the following list:
:t
The resulting RDD should consist of strings rather than lists of strings. This will require
multiple separate uses of flatMap and split
Use the Python string method strip with the punctuation string to remove common
punctuation symbols from the start and end of the tokens. Then use strip again with the
string to remove numbers from the start and end of the tokens.
Code cell continued on next page.
Code cell continued from previous page.
Use the Python string method replace to replaces instances of the single
quoteapostrophe with the empty string
Convert all strings to lower case using the lower string method.
The steps above will create some empty strings of the form within the RDD Filter out
these empty strings.
Create a second RDD named distwords that contains only one copy of each word found in
wswords.
Print the number of words in wswords and the number of distinct words using the format shown
below. Add spacing so that the numbers are leftaligned.
Total Number of Words: xxxx
Number of Distinct Words: xxxx
We will now use sample to get a sense as to the types of words found in wswords.
Draw a sample from wswords using the arguments withReplacementFalse and fraction
Collect and print the results
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started