n the next few problems, we will work with a text file that contains the complete works of William Shakespeare The data file using for this problem is located at FileStore tables shakespeare complete txt We will begin by loading and processing the file and tokenizing the lines into individual words Complete the following steps in a single code cell 1 Read the contents of the file shakespeare complete txt into an RDD named ws lines 2 Create an RDD named ws words by applying the transformations described below This will require several uses of map ( ) and flatMap ( ) and a single call to filter ( ) Try to chain together the transformations together to complete all of these steps with a single statement ( that will likely span multiple lines ) Tokenize the strings in ws lines by splitting them on the 8 characters in the following list ' ' , ' ' , ' ' , ' ' , ' , ' , ' ' , ' ' , ' t ' The resulting RDD should consist of strings rather than lists of strings This will require multiple separate uses of flatMap ( ) and split ( ) Use the Python string method strip ( ) with the punctuation string to remove common punctuation symbols from the start and end of the tokens Then use strip ( ) again with the string ' 0 1 2 3 4 5 6 7 8 9 ' to remove numbers from the start and end of the tokens ( Code cell continued on next page ) ( Code cell continued from previous page ) Use the Python string method replace ( ) to replaces instances of the single quote apostrophe ' with the empty string ' ' Convert all strings to lower case using the lower ( ) string method The steps above will create some empty strings of the form ' ' within the RDD Filter out these empty strings 3 Create a second RDD named dist words that contains only one copy of each word found in ws words 4 Print the number of words in ws words and the number of distinct words using the format shown below Add spacing so that the numbers are left aligned Total Number of Words xxxx Number of Distinct Words xxxx We will now use sample ( ) to get a sense as to the types of words found in ws words Draw a sample from ws words using the arguments withReplacement False and fraction 0 0 0 0 1 Collect and print the results

Question

n the next few problems, we will work with a text file that contains the complete works of William Shakespeare  The data file using for this problem is located at    FileStore   tables   shakespeare   complete txt   We will begin by loading and processing the file and tokenizing the lines into individual words  Complete the following steps in a single code cell  1   Read the contents of the file shakespeare   complete txt into an RDD named ws   lines  2   Create an RDD named ws   words by applying the transformations described below  This will require several uses of map ( ) and flatMap ( ) and a single call to filter ( )   Try to chain together the transformations together to complete all of these steps with a single statement ( that will likely span multiple lines )   Tokenize the strings in ws   lines by splitting them on the 8 characters in the following list    ' ' , '   ' , '   ' , '   ' , ' , ' , '   ' , '   ' , '   t '   The resulting RDD should consist of strings rather than lists of strings  This will require multiple separate uses of flatMap ( ) and split ( )   Use the Python string method strip ( ) with the punctuation string to remove common punctuation symbols from the start and end of the tokens  Then use strip ( ) again with the string ' 0 1 2 3 4 5 6 7 8 9 ' to remove numbers from the start and end of the tokens  ( Code cell continued on next page  ) ( Code cell continued from previous page  ) Use the Python string method replace ( ) to replaces instances of the single quote   apostrophe   '   with the empty string ' '   Convert all strings to lower case using the lower ( ) string method  The steps above will create some empty strings of the form ' ' within the RDD   Filter out these empty strings  3   Create a second RDD named dist   words that contains only one copy of each word found in ws   words  4   Print the number of words in ws   words and the number of distinct words using the format shown below  Add spacing so that the numbers are left   aligned  Total Number of Words  xxxx Number of Distinct Words  xxxx We will now use sample ( ) to get a sense as to the types of words found in ws   words  Draw a sample from ws   words using the arguments withReplacement   False and fraction   0   0 0 0 1   Collect and print the results

Accepted Answer

The Answer is in the image, click to view ...

Question

n the next few problems, we will work with a text file that contains the complete works of William Shakespeare. The data file using for

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Database Processing Fundamentals, Design, and Implementation

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question