Question
1. words = ['dog', 'dog', 'dolphin', 'tiger', 'tiger', 'eagle', 'tiger', 'eagle', 'eagle'] Q For a given list of words, create an RDD. wordsRDD= 2. Write
1. words = ['dog', 'dog', 'dolphin', 'tiger', 'tiger', 'eagle', 'tiger', 'eagle', 'eagle']
Q For a given list of words, create an RDD.
wordsRDD=
2. Write a piece of code that will add (s) to the list (RDD) of animals (make them plural)
def make_plural(name): # fill in return
3. For given RDD of words, get the length of each word using the lambda keyword
wordLength =
4. Create a list of tuples. The tuples should have the word itself and 1. For example, [('dog', 1), ('dog', 1), ('dolphin', 1), ('tiger', 1), ('tiger', 1), ('eagle', 1), ('tiger', 1), ('eagle', 1), ('eagle', 1)]
def count_tuples(word): # fill in return
5. This time you will write a piece of code to calculate the word counts in a list using the groupByKey() function
For example, animals = ['dog', 'dog', 'dolphin', 'dolphin', 'monkey', 'monkey', 'lion'] should return
[(dog, 2), (dolphin, 2), (monkey, 2), (lion, 1)]
Hint: You will need to use map and and lambda
groupedRDD = wordOccurances =
6. This time you will write a piece of code to calculate the word counts in a list using the reducedByKey() function
For example, animals = ['dog', 'dog', 'dolphin', 'dolphin', 'monkey', 'monkey', 'lion'] should return
[(dog, 2), (dolphin, 2), (monkey, 2), (lion, 1)]
Hint: You will need to use map and and lambda
wordOccurances =
7. This time put everything together. In one line, calculate the word count for a given a given RDD.
wordOccurances=
8. Find the unique words in a list word. You can use the wordCountsRDD that you have created in question 6
uniquewordsRDD =
9. Remove Punctuation
# Run this cell before you move forward # Do not do any changes to this cell import os.path baseDir = os.path.join('databricks-datasets') inputPath = os.path.join('cs100', 'lab1', 'data-001', 'shakespeare.txt') fileName = os.path.join(baseDir, inputPath) shakespeareRDD = (sc.textFile(fileName, 8)) shakespeareRDD.take(20)
import string import re def removePunctuation(text): write the code here return
10. Format RDD
# Run this cell before you move forward # Do not do any changes to this cell shakespeareRDD = shakespeareRDD.map(removePunctuation) shakespeareRDD.take(25)
shakespeareWordsRDD =
11.Remove Empty Elements
The next step is to filter out the empty elements. Remove all entries where the word is ''.
shakespeareWordsRDD =
12.Word Count
Define a function for word counting. You should reuse the techniques that have been covered in earlier parts of this assignment. This function should take in an RDD that is a list of words like wordsRDD and return a pair RDD that has all of the words and their associated counts.
def wordCount(wordListRDD): codes go here
return
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started