Complete the function compute using pthreads v1() to parallelize the SAXPY loop using the chunking method. That is, given vectors x and y of some
Complete the function compute using pthreads v1() to parallelize the SAXPY loop using the "chunking" method. That is, given vectors x and y of some arbitrary length n, and when k threads are used, individual threads calculate SAXPY in parallel on smaller chunks of these vectors
This assignment, worth twenty points, is due February 1, 2024, by 11:59 pm. You may work on it in a group of up to two people. Please submit original code. You are asked to compare two different ways of parallelizing the SAXPY loop using the pthread library. SAXPY is a function within the standard Basic Linear Algebra Subroutines (BLAS) library and stands for "Single-Precision AX Plus Y." The implementation is very simple, involving a combination of scalar multiplication and vector addition. The routine takes as inputs two vectors of 32-bit floating-point values x and y with n elements each, and a scalar value a. It multiplies each element x[i] by a and adds the result to y[i]. The serial implementation looks like this: void saxpy (float *x, float *y, float a, int n) { } int i; for (i = 0; i < n; i++) y[i] = a *x[i] +y[i]; Using the provided sequential program saxpy.c as a starting point, develop two versions that parallelize SAXPY. Complete the function compute_using-pthreads_vl() to parallelize the SAXPY loop using the "chunking" method. That is, given vectors x and y of some arbitrary length n, and when k threads are used, individual threads calculate SAXPY in parallel on smaller chunks of these vectors. Complete the function compute_using_pthreads_v2() to parallelize the SAXPY loop using the "striding" method. That is, each thread strides over elements of the vectors with some stride length, calculating SAXPY along the way. For example, given k = 4 threads, thread 0 calculates SAXPY for elements y[0], y[4], y[8], . . ., thread 1 calculates SAXPY for elements y[1], y[5], y[9], . . ., and so on. The pseudo-code for this method looks like this for each thread: /* tid is the thread ID and k is the number of threads created */ int stride = k; while (tid < n) { y[tid] a * x[tid] + y[tid];
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started