Answered step by step
Verified Expert Solution
Question
1 Approved Answer
i need your help to do this assignments, Especially in the first step creating datasets with pythone code Assignment Objective and Description: By the completion
i need your help to do this assignments, Especially in the first step creating datasets with pythone code
Assignment Objective and Description: By the completion of this assignment you should be able to: - Write map-reduce jobs in Java language and run them on Hadoop system. - Before writing your code, you should perfectly understand the "WordCount" example from Lab2 (it is the "Hello World..." example of mapReduce Jobs). You do not need any programming skills other than the ones you already have under your belt. - You need only to familiarize yourself with how Hadoop reads and writes integers, floats, and text fields. Check IntWritable, FloatWritable, and Text classes to learn which one to use and when. Assignment Task: In this assignment, you will mainly do the following: (1) Create two datasets (2) Upload the datasets into HDFS, (refer to Lab 3) (3) Write two map-reduce jobs. Step 1: Create two Datasets! [2 marks] Write a java program that creates two datasets (each is stored in an independent file!); Buyers and Purchases. Each line in Buyers file represents a single Buyer, where each line in Purchases file represents a single Purchase. The attributed values within each line are comma separated. The Buyers dataset should have the following attributes for each Buver: BuyerID: unique sequential number from 1 to 10,000 (meaning, the file has 10,000 buyers). BuyerName: random sequence of characters of length between 10 and 15 (make sure that you exclude comma from the possible generated characters!) BuyerAge: random integer number between 12 to 75 BuyerGender: randomly generated string that is either "male" or "female" BuyerSalary: random float number between 3500 and 11000 The Purchases dataset should have the following attributes for each Purchase: purchID: unique sequential integer number from 1 to 1,000,000 (the file has 1M purchases). BuyerID: References one of the buyers IDs, i.e., from 1 to 10,000 (on Avg. a buyer has 100 purchases.) purchPrice: random float number between 10 and 100 purchNumItems: random integer number between 1 and 10 Note: This task is writing a regular java program, not a mapReduce Job! Step 2: Upload the created dataset into HDFS! [2 marks] Refer to Lab 2, use one of the Hadoop commands (either put or coppFromLocal) to upload the datasets into HDFS.. Then open Hadoop's Web UI (inside the docker container) and make sure that it is successfully uploaded! (Take screen shots for the report!) Step 3: Writing mapReduce Jobs! Few advises before you start coding: 1- use wordCoumt example as your starting point, if you fully understand it, you won't face any problem in coding the following required jobs! 2- Before coding, try to solve the problems in a way similar to what we have done in Lab 5! Meaning, answer these questions: (a) is it a mapOnly job or mapReduce Job? (b) What would be the keyValue pairs of mappers output? (c) if there is a reduce function, what would be the task that we will perform in reduce! Once you have the full plan ahead of you, start coding! First Job [4 marks]: Write a job that reports the buyers whose Age between 20 and 50 . Second Job [4.5 marks]: Write a job that reports for every buyer, the number of purchases that he made (i.e., count his purchases) and the sum of the prices of these purchases. The output should have one line for each buyer containing: BuyerID, numPurchases, sumPrices [ 4 points] You are required to use a Combiner in this job [0.5 point]. What to Submit, When and How.. This assignment should be done in teams of 4 students (with a single group of 5 students). Each team (a single student) should submit a single zip file (using the blackboard) containing: - The Java application for creating the dataset, - The Java code for the MapReduce jobs, - A document that contains snap shots of the results of task1, task2, and task3! You have two weeks to submit this assignment, you should submit it by no later than (Thursday, Feb 6th,11:59pm ) Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started