Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 24, 2024

Assignment 1 1- The DISTINCT(X) operator is used to return only distinct (unique) values for datatype (or column) X in the entire dataset . As

Assignment 1

1- The DISTINCT(X) operator is used to return only distinct (unique) values for datatype (or column) X in the entire dataset .

As an example, for the following table A:

A.ID	A.ZIPCODE	A.AGE
1	12345	30
2	12345	40
3	78910	10
4	78910	10
5	78910	20

DISTINCT(A.ID) = (1, 2, 3, 4, 5)

DISTINCT(A.ZIPCODE) = (12345, 78910)

DISTINCT(A.AGE) = (30, 40, 10, 20)

Implement the DISTINCT(X) operator using Map-Reduce. Provide the algo-

rithm pseudocode. You should use only one Map-Reduce stage, i.e. the algorithm should

make only one pass over the data.

2-The SHUFFLE operator takes a dataset as input and randomly re-orders it.

Hint: Assume that we have a function rand(m) that is capable of outputting a random integer between [1, m].

Implement the SHUFFLE operator using Map-Reduce. Provide the algorithm pseudocode.

3-What is the communication cost (in terms of total data flow on the network between mappers and reducers) for following query using Map-Reduce:

Get DISTINCT(A.ID from A WHERE A.AGE > 30 )

The dataset A has 1000M rows, and 400M of these rows have A.AGE <= 30. DISTINCT(A.ID) has 1M elements. A tuple emitted from any mapper is 1 KB in size.

4-Consider the checkout counter at a large supermarket chain. For each item sold, it generates a record of the form [ProductId, Supplier, Price]. Here, ProductId is the unique identifier of a product, Supplier is the supplier name of the product and Price is the sales price for the item. Assume that the supermarket chain has accumulated many terabytes of data over a period of several months.

The CEO wants a list of suppliers, listing for each supplier the average sales price of items provided by the supplier. How would you organize the computation using the Map-Reduce computation model?

For the following questions give short explanations of your answers.

5-True or False: Each mapper/reducer must generate the same number of output key/value pairs as it receives on the input.

6-True or False: The output type of keys/values of mappers/reducers must be of the same type as their input.

7-True or False: The input to reducers is grouped by key.

8-True or False: It is possible to start reducers while some mappers are still running.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Visual Basic 2017 For Windows Web And Database Applications

Microsoft Visual Basic 2017 For Windows Web And Database Applications

Authors: Corinne Hoisington

1st Edition

1337102113, 978-1337102117

More Books

Students also viewed these Databases questions

Question

★★★★★

Standardized, reusable containers have fairly obvious benefits for shipping. What is the purpose of these devices within the plant?

Answered: 1 week ago

Question

★★★★★

Given the line with parametric equation r = a + d show that the perpendicular distance p from the origin to this line can take either of the forms Find the parametric equation of the straight line...

Answered: 1 week ago

Question

★★★★★

Define human resource development (HRD)

Answered: 1 week ago

Question

★★★★★

E-books, an online book retailer, has two operating divisions-corporate sales and consumer sales-and two support divisions-human resources and information systems. Each sales division conducts...

Answered: 1 week ago

Question

★★★★★

2nd image is textbook figure. Answer only if u know it is correct and complete. Problems 1. Read the explanation of Table 5.4 on textbook page 224 or the notes of slides page 65. First, understand...

Answered: 1 week ago

Question

★★★★★

A firm distributes 50% of its earnings as dividends, and its dividends have been growing at an annual rate of 3%. Current EPS is $2.40. a. If the required rate of return is 11%, what should the...

Answered: 1 week ago

Question

★★★★★

Question 9 3 pts Currently, the Bright Ltd is an all-equity company. Earnings before interest and taxes (EBIT) for the company is expected to be $73,084 forever, and the cost of capital is currently...

Answered: 1 week ago

Question

★★★★★

A state fisheries commission wants to estimate the number of bass caught in a given lake during a season in order to restock the lake with the appropriate number of young fish. The commission could...

Answered: 1 week ago

Question

★★★★★

QUESTION 4 The following Trial Balance was extracted from the books of Mega Enterprise as at 31 December 2020. Particulars Debit (RM) Credit (RM) Capital 215,000 Duty on purchases 2,300 Insurance...

Answered: 1 week ago

Question

★★★★★

As the safety manager on a construction site, you have many roles. At the site, we are working we must approve the trenching and excavation plans for the subcontractors performing the work. Looking...

Answered: 1 week ago

Question

★★★★★

Write a C++ program that reads a single line of text from a file, then prints the contents of that line with the first and last characters swapped to an output file calledoutput.txt. The program...

Answered: 1 week ago

Question

★★★★★

5. Discuss what a trainer needs to do to ensure that school-to-work and hard-coreunemployed training programs are effective.

Answered: 1 week ago

Question

★★★★★

1. Identify what positions are included in the plan.

Answered: 1 week ago

Question

★★★★★

2. Identify the employees who are included in the plan.

Answered: 1 week ago

Previous Question Next Question