Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

The data contains the following columns: index, region, country, item_type, sales_channel, order_priority, order_date, order_id, ship_date, units_sold, unit_price, unit_cost, total_revenue, total_cost, total_profit Here are 2 questions

The data contains the following columns:

index, region, country, item_type, sales_channel, order_priority, order_date, order_id, ship_date, units_sold, unit_price, unit_cost, total_revenue, total_cost, total_profit

Here are 2 questions you need to answer from the data :

(a) What is the total units_sold in each region across the entire data set ?

(b) What is the average total_profit per transaction in the entire data set ? Remember that average is not an associative and commutative operation.

Think about how will you partition the data in each of the cases, what computation will happen on each partition of data, how will you "join" all the partial results together and present it on one node ?

You can adopt a master-slave architecture where a specific process is the master that takes the file and performs data partitioning, sends data to slaves, slaves perform parallel computation and send results to the master, master does some last step computation if needed and presents the answer to the user. A master process can also function as a slave. You do not need to adopt any parallel processing technique (e.g. using multi-threading) within the master or the slave to speed up local tasks. Your focus should be on parallel processing at a process level.

You need to use OpenMPI to implement the parallel program. Use the standard functions for communication within the cluster of processes. You can run the processes on the same multi-core machine. Given a large data set, you can use the file system to store data partitions, intermediate results at slaves and the final result. Just name the files prefixed with a process identifier so that processes know which data should be read or written by which process. MPI rank can be used as a process identifier.

Some of the dataset is given below image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Support For Data Mining Applications Discovering Knowledge With Inductive Queries Lnai 2682

Database Support For Data Mining Applications Discovering Knowledge With Inductive Queries Lnai 2682

Authors: Rosa Meo ,Pier L. Lanzi ,Mika Klemettinen

2004th Edition

3540224793, 978-3540224792

More Books

Students also viewed these Databases questions

Question

★★★★★

You are developing next years financial plan for Ajax Inc., a medium sized manufacturing company thats currently operating at 80% of factorys capacity. The firm is launching a sales promotion thats...

Answered: 1 week ago

Question

★★★★★

In Exercises 3344, add or subtract terms whenever possible. EA9 + ENL

Answered: 1 week ago

Question

★★★★★

Discuss how a psychological skills program might be evaluated.

Answered: 1 week ago

Question

★★★★★

1. Given that Blades expects to use the cash flows generated by the Thai subsidiary to pay the interest and principal of the notes, would the effective financing cost of the baht-denominated notes be...

Answered: 1 week ago

Question

★★★★★

Which of the following is a non - volatile memory type that can be electrically erased and reprogrammed? a ) DRAM b ) SRAM c ) Flash memory d ) Cache memory

Answered: 1 week ago

Question

★★★★★

Use the table to the right to find the given derivative. X NO 4 f(x) 4 W N d [xf(x)] f'(x) 5 2 dx | x =3 d dx [xf (x)] = (Simplify your answer.) X = 3

Answered: 1 week ago

Question

★★★★★

7. A series of transactions for Ace Repair is given below. Use the chart to show the effect of each of the transactions on assets, liabilities, and owner's equity by placing an x in the appropriate...

Answered: 1 week ago

Question

★★★★★

(CLO3) Sam and Mike are discussing using a new project management framework. Mike mentions that no matter what the team should always be willing to try new things. Sam asks Mike to focus on the pros,...

Answered: 1 week ago

Question

★★★★★

What would be your opinion about these commenta? 1. The problem with prison gangs and violence is that there is no control inside those prisons. Corrections and jail wardens have a hard time...

Answered: 1 week ago

Question

★★★★★

3 Parametric Equations Parametric Equations 1. Consider the parametric equations 3 x= t+1 = 2t 1 (a) For what values of t are the parametric equations defined? (b) Sketch a graph of the curve on the...

Answered: 1 week ago

Question

★★★★★

Complete the MPS record below for a single item. (Enter your responses as integers. A response of "0" is equivalent to being not applicable.) Item: A Order Policy: 100 units Lead Time: 1 week January...

Answered: 1 week ago

Question

★★★★★

6-11 This project develops skills in searching Web-enabled databases with information about products and services in faraway locations. Your company, Caledonian Furniture, is located in Cumbernauld,...

Answered: 1 week ago

Question

★★★★★

6-6 To what extent should end users be involved in the selection of a database management system and database design?

Answered: 1 week ago

Question

★★★★★

6-13 Explain the role of the database in SAPs threetier system. The Lego Group, which is headquartered in Billund, Denmark, is one of the largest toy manufacturers in the world. Legos main products...

Answered: 1 week ago

Previous Question Next Question