Question
Task : Try to write two pseudo-codes for the tree-structural global sum, one for a shared-memory setting and the other for a distributed-memory setting. First
Task:
Try to write two pseudo-codes for the tree-structural global sum, one for a shared-memory setting and the other for a distributed-memory setting.
First consider how this might be done in a shared-memory setting. Then consider how this might be done in a distributed-memory setting. In the shared-memory setting, which variables are shared and which are private?
Hints:
An example pseudocode for the shared-memory setting is as follows:
Algorithm_Global_Sum
Goal: given a sequence of N integers, to calculate the sum of these N integers
Input: data: an integer array
N: input size
p: number of cores
Output: the sum of N integers
Algorithm CalculateGlobalSum
// Step 1. calculate partial sum in a core
sum[rank] = 0
block_size = n % p ==0 ? n / p : floor(n/p) + 1
my_first_i = block_size * i
my_last_i = block_size * (i+1) - 1 > n? n : block_size * (i+1)
for I <- my_first_i to my_last_i do
sum[rank] += my_x;
synchronize
// Step 2. coordinate cores and calculate global sum by using tree structure
for stage <-1 to ceil(log p base 2)) do
if rank % 2 i-1 == 0
//process rank participate in this stage i communicate between rank ^ 2 i-1
if rank % 2 i == 0
sum[rank] += sum[rank + 2 i-1 ]
synchronize
As for the distributed-memory setting, the main structure is same, and the main difference is in handling the data when the partially-calculated sum values is distributed.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started