Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Please help the second job, need help for mapper.py and reducer.py with Python. The purpose of this project is to build upon your first wordcount

Please help the second job, need help for mapper.py and reducer.py with Python.

The purpose of this project is to build upon your first wordcount project. We want to find the 11th most frequent word in the Cranfield collection which is a set of text documents. The collection is in the following directory:

/assignment2/data.txt

Copy the text file to your HDFS directory before running any of the programs. Make sure the output of your programs are located in your own HDFS directory as well.

In order to complete the assignment you will need to run two MapReduce jobs.

1 First Job: Word Count.

The first job is the wordcount program which simply counts the number of times a word occurred in the Cranfield collection. You can use the instructions in the first assignment to generate the word counts for the Cranfield collection. The output of the first MapReduce program will be given as input to the second MapReduce job.

2 Second Job: Sort Counts.

The mapping phase of the second job has to take the output of the previous job as input, which is in the form of < , >, and then output < , > pairs. During sort and shuffle phases after mapping, the < , > pairs need to be sorted by the keys.

The reduce phase will simply take the output of the mapper phase and writes the result to the output directory.

You can use the output directory of the previous job in the argument for the second job as:

-input /user//output1

And the output option for the second job as:

-output /user//output2

Hint: You can take advantage of the options provided for the mapper to compare the keys during sort and shuffle:https://hadoop.apache.org/docs/r1.2.1/streaming.html

3 Submission.

After successfully running both programs, the final output for the second MapReduce job should generate the following results for the first 11 words with highest frequency in the Cranfield collection:

Output:

20179 the

13964 of

11046 .

7053 and

6413 a

4972 in

4669 to

4075 is

3699 for

2430 with

2420 are

The assignment will be graded based on the second MapReduce job that you have to implement and successfully generating the final result in your HDFS directoy.

------------------------------

Need help for this part:

The input (data) will be like this( sorted with Alphabet order from a to z, but more than 11 words):

input:

. 11046

a 6413

and 7053

are 2420

boy 200

child 150

for 3699

in 4972

is 4075

jim 28

of 13964

the 20179

to 4669

with 2430

-----------

The output result is the first 11 words with highest frequency(order goes by the counter#):

20179 the

13964 of

11046 .

7053 and

6413 a

4972 in

4669 to

4075 is

3699 for

2430 with

2420 are

---------

Need to submit mapper.py & reduce.py.

will use dic, put words & counts to a dictionary and sort.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Visual Basic 6 Databases

Professional Visual Basic 6 Databases

Authors: Charles Williams

1st Edition

1861002025, 978-1861002020

More Books

Students explore these related Databases questions

Question

Describe the following items: a. Net incomenoncontrolling interest b. Equity in earnings of nonconsolidated subsidiaries

Answered: 3 weeks ago

Question

RP-6 When tested immediately after viewing a list of words, we tend to recall the first and last items best, which is known as the effect.

Answered: 3 weeks ago

Question

To learn about writing processes for many types of correspondence, including business letters, e-mails, and memos, check out Writing Guides at Colorado State Universitys online writing center,...

Answered: 3 weeks ago

Question

Garrett Industries turns over its inventory six times each year; it has an average collection period of 45 days and an average payment period of 30 days. The firms annual sales are $3 million. Assume...

Answered: 3 weeks ago

Question

Please help the second job, need help for mapper.py and reducer.py with Python. The purpose of this project is to build upon your first wordcount project. We want to find the 11th most frequent word...

Answered: 3 weeks ago

Question

On 1 Jan 20X1 Cap Ltd acquired an Item of plant for an agreed consideration of 1000 of its own shares. The plant was received on 1 Jan 20X1 and the obligation to transfer shares was to be settled on...

Answered: 3 weeks ago

Question

Explore the role of cryptography in blockchain technology. How does cryptographic hashing ensure the integrity of the blockchain? What are the security implications of public and private keys in...

Answered: 3 weeks ago

Question

Most virtual teams meet at their launch and during crisis times. True O False

Answered: 3 weeks ago

Question

In what ways is an array constrained, and why do these constraints make the array a problematic structure for dynamic sets?

Answered: 3 weeks ago

Question

Sliding Boxes 25 mg dreamstime.com INCLINED PLANE mgx 65 10 153364331 Designaa 1. What is the mass of the box on the ramp if the hanging box is 25kg and the system is accelerating at 2.8 m/s down the...

Answered: 3 weeks ago

Question

You are working as project risk analyst who are responsible for the evaluation of investment proposals in a large multinational company called Exxon Mobile Corporation. One of the company's largest...

Answered: 3 weeks ago

Question

How many Tables Will Base HCMSs typically have? Why?

Answered: 3 weeks ago

Question

What is the process of normalization?

Answered: 3 weeks ago

Question

What is Notation in Data Modeling, and what is the most common Notation Type used?

Answered: 3 weeks ago

Previous Question Next Question