Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this assignment youre going to use OpenMP to implement a parallel prefix sum that works with an arbitrary sized array and the default number

For this assignment youre going to use OpenMP to implement a parallel prefix sum that works with an arbitrary sized array and the default number of OpenMP threads.

Use the flat version of the algorithm presented in class that involves the following steps:

1. local prefix sum on each partition

2. a serial prefix sum of the maximum values of each partition

3. a second iteration over each partition, adding the maximum from the partition to the left.

Remember, this code evenly distributes size indices across nt threads:

int nt = omp_get_num_threads ();

int rank = omp_get_thread_num ();

int count = size / nt;

int start = rank * count ;

int mod = size % nt;

if ( rank < mod )

{ count ++; start += rank ; }

else { start += mod; }

Use omp_get_wtime() to calculate the wall clock times for various runs. How close to linear speedup do you get? You will be given accounts on the CSE Qbert cluster where you can test on 1-16 cores.

Your program should be compiled like this:

g++ -fopenmp -O3 -Wall source.cpp -o executable

Your program can be run with different numbers of threads like this:

OMP_NUM_THREADS=16 ./executable

This algorithm requires a large array and relatively many threads to see a performance improvement.

Heres a place to start:

# include

# include

# include

# include

// replace the contents of data with the prefix sum of the contents

void prefix_sum (std :: vector & data )

{

# pragma omp parallel default ( none ) shared ( data )

{

int rank = omp_get_thread_num ();

int nt = omp_get_num_threads ();

// Your code goes here .

}

}

int main (int argc , char * argv [])

{

// test with a vector of 1s

std :: vector data (10000 , 1);

double start = omp_get_wtime ();

prefix_sum ( data );

double end = omp_get_wtime ();

printf (" took %lf to prefix sum %d elements on %d ", (end - start ), (int ) data . size (), omp_get_max_threads ());

}

Hints

Use #pragma omp barrier to synchronize threads within a parallel section.

Iterate over the prefix sum results to see if it works.

Comment out extraneous print statements in the final code.

Turn in ..

The source code file.

A screenshot of your code being compiled and run.

A plot of the wall clock time as a function of number of threads, for a large enough array that the values are meaningful.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing

Authors: David Kroenke

11th Edition

0132302675, 9780132302678

More Books

Students also viewed these Databases questions