Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The data contains the following columns: index, region, country, item_type, sales_channel, order_priority, order_date, order_id, ship_date, units_sold, unit_price, unit_cost, total_revenue, total_cost, total_profit Here are 2 questions

The data contains the following columns:

index, region, country, item_type, sales_channel, order_priority, order_date, order_id, ship_date, units_sold, unit_price, unit_cost, total_revenue, total_cost, total_profit

Here are 2 questions you need to answer from the data :

(a) What is the total units_sold in each region across the entire data set ?

(b) What is the average total_profit per transaction in the entire data set ? Remember that average is not an associative and commutative operation.

Think about how will you partition the data in each of the cases, what computation will happen on each partition of data, how will you "join" all the partial results together and present it on one node ?

You can adopt a master-slave architecture where a specific process is the master that takes the file and performs data partitioning, sends data to slaves, slaves perform parallel computation and send results to the master, master does some last step computation if needed and presents the answer to the user. A master process can also function as a slave. You do not need to adopt any parallel processing technique (e.g. using multi-threading) within the master or the slave to speed up local tasks. Your focus should be on parallel processing at a process level.

You need to use OpenMPI to implement the parallel program. Use the standard functions for communication within the cluster of processes. You can run the processes on the same multi-core machine. Given a large data set, you can use the file system to store data partitions, intermediate results at slaves and the final result. Just name the files prefixed with a process identifier so that processes know which data should be read or written by which process. MPI rank can be used as a process identifier.

Some of the dataset is given belowimage text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Support For Data Mining Applications Discovering Knowledge With Inductive Queries Lnai 2682

Authors: Rosa Meo ,Pier L. Lanzi ,Mika Klemettinen

2004th Edition

3540224793, 978-3540224792

More Books

Students also viewed these Databases questions

Question

Discuss how a psychological skills program might be evaluated.

Answered: 1 week ago