Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. words = ['dog', 'dog', 'dolphin', 'tiger', 'tiger', 'eagle', 'tiger', 'eagle', 'eagle'] Q For a given list of words, create an RDD. wordsRDD= 2. Write

1. words = ['dog', 'dog', 'dolphin', 'tiger', 'tiger', 'eagle', 'tiger', 'eagle', 'eagle']

Q For a given list of words, create an RDD.

wordsRDD=

2. Write a piece of code that will add (s) to the list (RDD) of animals (make them plural)

def make_plural(name): # fill in return

3. For given RDD of words, get the length of each word using the lambda keyword

wordLength =

4. Create a list of tuples. The tuples should have the word itself and 1. For example, [('dog', 1), ('dog', 1), ('dolphin', 1), ('tiger', 1), ('tiger', 1), ('eagle', 1), ('tiger', 1), ('eagle', 1), ('eagle', 1)]

def count_tuples(word): # fill in return

5. This time you will write a piece of code to calculate the word counts in a list using the groupByKey() function

For example, animals = ['dog', 'dog', 'dolphin', 'dolphin', 'monkey', 'monkey', 'lion'] should return

[(dog, 2), (dolphin, 2), (monkey, 2), (lion, 1)]

Hint: You will need to use map and and lambda

groupedRDD = wordOccurances =

6. This time you will write a piece of code to calculate the word counts in a list using the reducedByKey() function

For example, animals = ['dog', 'dog', 'dolphin', 'dolphin', 'monkey', 'monkey', 'lion'] should return

[(dog, 2), (dolphin, 2), (monkey, 2), (lion, 1)]

Hint: You will need to use map and and lambda

wordOccurances =

7. This time put everything together. In one line, calculate the word count for a given a given RDD.

wordOccurances=

8. Find the unique words in a list word. You can use the wordCountsRDD that you have created in question 6

uniquewordsRDD =

9. Remove Punctuation

# Run this cell before you move forward # Do not do any changes to this cell import os.path baseDir = os.path.join('databricks-datasets') inputPath = os.path.join('cs100', 'lab1', 'data-001', 'shakespeare.txt') fileName = os.path.join(baseDir, inputPath) shakespeareRDD = (sc.textFile(fileName, 8)) shakespeareRDD.take(20)

import string import re def removePunctuation(text): write the code here return

10. Format RDD

# Run this cell before you move forward # Do not do any changes to this cell shakespeareRDD = shakespeareRDD.map(removePunctuation) shakespeareRDD.take(25)

shakespeareWordsRDD =

11.Remove Empty Elements

The next step is to filter out the empty elements. Remove all entries where the word is ''.

shakespeareWordsRDD =

12.Word Count

Define a function for word counting. You should reuse the techniques that have been covered in earlier parts of this assignment. This function should take in an RDD that is a list of words like wordsRDD and return a pair RDD that has all of the words and their associated counts.

def wordCount(wordListRDD): codes go here

return

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Power Of Numbers In Health Care A Students Journey In Data Analysis

Authors: Kaiden

1st Edition

8119747887, 978-8119747887

More Books

Students also viewed these Databases questions