Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2) Suppose you are tasked with analysis of the companys web server logs. The log dump contains a large amount of information with up to

2) Suppose you are tasked with analysis of the companys web server logs. The log dump contains a large amount of information with up to 8 different attributes (columns). You regularly run a Hadoop job to perform analysis pertaining to 3 specific attributes TimeOfAccess, OriginOfAccess and FileName.

a) How would you attempt to speed up the repeated execution of the query? (this is an intentionally open-ended question, there are several acceptable answers)

b) If a Mapper task fails while processing a block of data what is the location (which node) where MapReduce framework will prefer to restart it?

c) If the job is executed with 4 Reducers i) How many files does the output generate?

ii) Suggest one possible hash function that may be used to assign keys to reducers.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Excel As Your Database

Authors: Paul Cornell

1st Edition

1590597516, 978-1590597514

More Books

Students also viewed these Databases questions