Question: 1. - 30 % Answer these questions: a) On a single-CPU system, under what circumstances does a multithreaded program using kernel threads provide better performance
1. - 30 % Answer these questions: a) On a single-CPU system, under what circumstances does a multithreaded program using kernel threads provide better performance (such as faster execution time) compared to a singlethreaded solution (that does not use asynchronous or event-based programming) ? b) Consider the Dining Philosophers implementation with a monitor. Explain why does the putdown() operation call the test() operation twice ? c) Contrast synchronization with spinlocks in single-CPU vs. multiple-CPU computers. 2. Summing up to PI () 35 % This programming problem involves multithreading (Chapter 4) and uses synchronization mechanisms from Chapter 5. Write a multithreaded application called pie in C or C++ for Linux with the Pthreads library that computes an approximation of the number using an 'infinite series' with N+1 terms. The series sum is partitioned in T non-overlapping partial sums, each computed by a separate child thread. This program exemplifies data parallelism. Numbers T and N are passed to the pie program as command line parameters. The main(argc, argv[]) call gets these parameters in argv[1] and argv[2] as strings. Use the Nilakantha approximation formula for (http://en.wikipedia.org/wiki/Pi): = 3 + 4/(2*3*4) 4/(4*5*6) + 4/(6*7*8) - 4/(8*9*10) + . + k*4/((2*i)* (2*i+1)*( 2*i+2))+... where k = 1 if i is odd and k = -1 if i is even and i goes from 1 to N. Name your source file pie.c. The program takes two command line parameters. The first parameter, N is the upper limit of the number sequence to sum. The second parameter T is the number of child threads that compute the partial sums. N should be always greater than T, N>T. Otherwise the program should display an error message and do exit(1). Run it like this from a shell: ./pie N T For instance, if you run command: ./pie 100 4 the parent thread in main() will create 4 child threads, each getting an index from 0 to 3. Thread 0 computes the partial sum for i going from 0 to 24 Thread 1 computes the partial sum for i going from 25 to 49 Thread 2 computes the partial sum for i going from 50 to 74 Thread 3 computes the partial sum for i going from 75 to 99 Important: In general, a child thread with index j (0 to T-1) will compute a partial sum for i going from N*j/T+1 to (N/T)*(j+1)-1 and will store it to a local variable. Note that * and / are used on ints. The thread with index T-1 (with the highest index) will compute the partial sum for i going from N/T*(T-1) to N-1. A global variable gpie MUST be declared as: double gpie = 0; to store the current approximation of , After completing the computation of its partial sum, each thread must update the shared global variable gpie in a critical section: gpie = gpie + myPartialSum; myPartialSum is a local variable in the current thread. The child thread that updates gpie must protect the critical section with a pthread mutex. (Think about why is it better to update the shared global variable only once per child thread with the partial sum instead of adding directly each series term to the shared global variable gpie.) After the completion of the critical section, a child thread will end its execution. The main thread will wait for all child threads to finish (using pthread join), after which it will print the result, as follows: printf(pi computed with %d terms in %d threads is %d , N, T, gpie);); where gpie is the sum of all partial sums. Hints: - Do not forget to destroy the pthread_mutex and the mutex attribute at the end. After that, the main thread will exit. write first a program that computes the approximation with one for loop (i.e. no multithreading) and make sure it works. don't forget that the approximation series starts with 3 + . the parent thread runs a loop in main() for j=0 to T-1 the parent thread should share N, T with the child threads and should pass the current loop index j to each child thread as a thread parameter the last parameter in the pthread_create() call. after the completion of the loop, the parent thread should wait for all threads to complete using pthread_join(). 3. Barrier 35% a) Implement a barrier for pthreads in C++ using pthread condition variables and mutexes. Consider a process with N running threads. During their execution, all threads call repeatedly the barriers wait() method. For the 1st, 2nd, ..., (N-1)th call, the wait() function blocks (suspends) the calling thread. The Nth thread that calls this function will unblock all suspended threads and then will return. All threads that were previously blocked in wait() will now return from wait() and will resume their execution. They will continue until their next call to the barriers wait() method. In effect, the slowest thread will determine the overall progress of the application. Note that the identity and type of thread are irrelevant in deciding to suspend it or raise the barrier in wait(): only the Nth call to wait() will resume all processes, while all other call to wait() will suspend the caller thread. Here is an illustration of a barrier from the Intro to Parallel Computing guide from LLNL: (Analogy: imagine a real-world barrier on a highway. Cars that arrive at the barrier have to stop and wait for the barrier to be raised. For each Nth car that arrives, the barrier is raised and all N cars waiting can now proceed. Right after the last car passes, the barrier is lowered again.) A barrier is useful to control a group of threads so they all proceed in sync at specific points in the program. Here is how a barrier object may be used: // the main thread: // barrier object declared as a global variable, to be accessible from all threads: Barrier barrier(N); // assume N was defined as an int somewhere // child threads run this thread function or similar: void* thread_fun(void* param) { while (not_exit) { // thread runs in a loop // do some work barrier.wait(); // suspend all calling threads until the Nth thread makes the call, // after which all threads will resume and return from wait // do some more work } } The Barrier class must have at least a public constructor, a destructor, and a method wait(): class Barrier { public: Barrier(int n); // n is the number of threads (N) ~Barrier(); // destructor: cleanup, destroy mutex, condvar, etc. void wait(); // may need a condition variable, a mutex, and some other attributes }; A Barrier object must encapsulate all necessary synchronization objects (mutex and condition variable, plus something else) to do its job. Use the principle of encapsulation and keep private the instance variables. b. Write a multithreaded application in C++ that demonstrates how your barrier works.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
