Question
2) Suppose you are tasked with analysis of the companys web server logs. The log dump contains a large amount of information with up to
2) Suppose you are tasked with analysis of the companys web server logs. The log dump contains a large amount of information with up to 8 different attributes (columns). You regularly run a Hadoop job to perform analysis pertaining to 3 specific attributes TimeOfAccess, OriginOfAccess and FileName.
a) How would you attempt to speed up the repeated execution of the query? (this is an intentionally open-ended question, there are several acceptable answers)
b) If a Mapper task fails while processing a block of data what is the location (which node) where MapReduce framework will prefer to restart it?
c) If the job is executed with 4 Reducers i) How many files does the output generate?
ii) Suggest one possible hash function that may be used to assign keys to reducers.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started