Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 10, 2024

Mini Project: Fundamentals in Parallel Computing 1. Suppose that we need to compute n values and add them together. The serial code that we are

Mini Project: Fundamentals in Parallel Computing

1. Suppose that we need to compute n values and add them together. The serial code that we are familiar is as follows:

sum = 0;

for (i = 0; i

x = compute_next_value();

sum += x;

}

The function compute_next_value() could be executed independently.

Now suppose we also have p cores and p is much small than n. Then each core can form a partial sum of approximately n/p values:

my_sum = 0;

block_size = n % p ==0 ? n / p : floor(n/p) + 1;

my_first_i = block_size * i;

my_last_i = block_size * (i+1) - 1 > n? n : block_size * (i+1);

for (i = my_first_i; i

my_x = compute_next_value();

my_sum += my_x;

}

Here the prefix my_ indicates that each core is using its own, private variables, and each core can execute this block of code independently of the other cores. After each core completes execution of this code, its variable my_sum will store the sum of the values computed by its calls to the function compute_next_value().

When the cores are done computing their values of my_sum, they can form a global sum by sending their results to a designated "master" core in a distributed-memory setting, which can add their results:

if (I'm the master core){

sm = my_x;

for each core other than myself{

wait and receive value from core;

sum += value;

}

} else {

send my_x to the master;

}

Or carry out the following procedures in a shared-memory setting:

if (I'm the master core){

sm = my_local_sum[0];

for each core j other than myself{

enter critical section

sum += my_local_sum[j];

exit from critical section

}

A better way to do this is as follows:

Especially if te number of cores are large. Instead of making the master core do all the work of computing the final sum, we can pair the cores so that while core 0 adds in the result of core 1, core 2 can add in the result of core 3, core 4 can add in the result of core 5 and so on. Then we can repeat the process with only the even-ranked cores: 0 adds in the result of 2, 4 adds in the result of 6, and so on. Now cores divisible by 4 repeat the process, and so on.

See the following tree structure, where circles contain the example value of each core's sum, and the lines with arrows indicate that one core is sending its sum to another core. The plus signs indicate that a core is receiving a sum from another core and adding the received sum into its own sum.

image text in transcribed

For both "global" sums, the master core (core 0) does more work than any other cores, and the length of time it takes the program to complete the final sum should be the length of time it takes for the master to complete. However, with eight cores, the master will carry out seven serialized steps using the first method , while with the second method it will only carry out three.

Task:

Try to write two pseudo-codes for the tree-structural global sum, one for a shared-memory setting and the other for a distributed-memory setting.

First consider how this might be done in a shared-memory setting. Then consider how this might be done in a distributed-memory setting. In the shared-memory setting, which variables are shared and which are private?

2. Suppose we have a program that generates large quantities of random floating point data that it stores in an array. In order to get some feel for the distribution of the data, we can make a frequency histogram of the data. A frequency histogram is a graph with vertical columns that represent the frequency of a data point or range of data points occurring in a set of data.

Recall that to make a histogram, we simply divide the range of the data up into equal sized subintervals, or bins, determine the number of measurements in each bin, and plot a bar graph showing the relative sizes of the bins or display how many floating point values are cast into each bin.

As a very small example, suppose our data are

1.3, 2.9, 0.4, 0.3, 1.3, 4.4, 1.7, 0.4, 3.2, 0.3, 4.9, 2.4, 3.1, 4.4, 3.9, 0.4, 4.2, 4.5, 4.9, 0.9

Then the data lie in the range 0-5, and if we choose to have five bins, for example, 0-1, 1-2, , and 4-5, the histogram might look something like

image text in transcribed

In the exercise, you do not need to draw the histogram plot but you are required to display the frequency of range of floating point values.

Task:

Write a serial program that solves the frequency histogram problem.

Given the following inputs, output an array containing the frequency count of floating point values being cast in each bin:

- the array of data_count floats, data;

- the count of floating-point values, data_count

- the minimum floating-point value, min_meas;

- the maximum floating-point value, max_meas;

- the number of bins, bin_count.

Note: You just need to display the frequency in the text format. No plot is needed. Please use pointers instead of regular array operations.

Grading rubrics:

Question	Assessment Items	Points
Global sum in a shared-memory setting	pseudo-code	10
	Function goal is reached	5
	Coordination among threads to obtain a global sum	5
	Tree structure implementation on the thread coordination	5
Global sum in a distributed-memory setting	pseudo-code for distributed-memory setting	10
	Function goal is reached	5
	Coordination among threads to obtain a global sum	5
	Tree structure implementation on the thread coordination	5
Histogram	Coding	30
Histogram	Function goal is reached (a mistake about three points)	20