Question
In this assignment, you will implement a matrix-vector multiplication in CUDA. Your program is going to be similar to the matrix-matrix multiplication code we covered
In this assignment, you will implement a matrix-vector multiplication in CUDA. Your program is going to be similar to the matrix-matrix multiplication code we covered in the class. But I expect you to use shared memory in this assignment. Submissions without using shared memory will be considered over 60 points (not over 100).
Your kernel will take an MxN float matrix and an Nx1 float vector as inputs. You will multiply the matrix by the vector and return a new Mx1 float vector as the output. For example:
Output Mx1 |
| Input MxN |
| Input Nx1 | |||||||||||||||||||||||||||||||||||||||||
| = |
| X |
|
I recommend you to start writing the program without the shared memory. Make all global memory accesses to implement it. Later, look at your solution and find which part of the data is accessed multiple times. When you find it, you can try to move that part to shared memory, so there will be less global memory accesses. You can use the matrix-matrix multiplication program as a reference. Your goal is to implement a simpler version of that program.
You will submit your program and a document including your sample results. In the document, I would like to see a short description of how you used the shared memory, a test case for your input matrix/vector, and the output vector. Also, I would like to see your CGMA ratio in your program. You can calculate it as we did in the class. If you like, you can compare your CGMA ratio with the version where no shared memory is used. That is something interesting to see.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started