Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write a MapReduce program that counts the number of unique words in a given text file. Example Input: Lorem ipsum dolor sit amet, consectetur adipiscing

Write a MapReduce program that counts the number of unique words in a given text file.
Example Input:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec condimentum elit vel mauris varius, id laoreet tortor placerat.
Nulla scelerisque felis ac risus varius, sit amet luctus elit mattis.
Example Output:
"adipiscing" 1
"condimentum" 1
"consectetur" 1
"Donec" 1
"dolor" 1
"elit" 1
"felis" 1
"id"1
"ipsum" 1
"laoreet" 1
"luctus" 1
"mattis" 1
"mauris" 1
"Nulla" 1
"placerat" 1
"risus" 1
"scelerisque" 1
"sit" 1
"tortor" 1
"vel" 1
"varius" 1
Problem#2: WordCount With Stopwords
Write a MapReduce program that only counts non-stop words. List of stopwords are: the, and,
of, a, to, in, is, it.
Example Input:
This is a sample input text. It contains some common words such as the, and, of, a, and to.
These stopwords should be removed in the output.
Example Output:
"common" 1
"contains" 1
"input" 1
"output" 1
"removed" 1
"sample" 1
"should" 1
"some" 1
"stopwords" 1
"text" 1
Problem#3:
Let's consider a scenario where we are interested in counting the occurrences of word bigrams
instead of individual words. A word bigram refers to a pair of words that are adjacent to each
other in the text (excluding bigrams that span across line breaks). For example, given the line of
text "cat dog sheep horse," the corresponding bigrams would be ("cat", "dog"),("dog", "sheep"),
and ("sheep", "horse"). To achieve this goal, we need to construct a map function and a reduce
function. The map function will emit each word bigram as a key-value pair, where the key
represents the bigram separated by a comma (e.g., "cat,dog"), and the value is set to 1.
Please note that we will only consider bigrams that occur on the same line, and there is no need
to handle bigrams that cross line breaks.
Here's an example illustrating the input and output format:
Example Input:
a man a plan a canal panama there was a plan to build a canal in panama in panama a canal
was built

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions

Question

Identify ways to increase your selfesteem.

Answered: 1 week ago