Question
This is a High Performance Computing question: 1. Puff is a student working on a GPU research project. In this project, he is implementing an
This is a High Performance Computing question:
1. Puff is a student working on a GPU research project. In this project, he is implementing an algorithm for Finite Element modeling using CUDA. His program is slow so he uses a profiler to analyze performance. Help Puff figure out how to improve his code by answering the questions below. Assume that the size of the matrices is in the millions of elements and is not multiples of powers of 2, and the device used has compute capability 3.7.
The first thing Puff has noticed is low occupancy. Without knowing details of his implementation, the kernel, indicate whether each of the following factors could affect this metric(answer yes or not) and briefly explain why
a. Having a few blocks with many threads (over 16K threads) each
b. Having many blocks with exactly 32 threads per block
c.Having many blocks with a few threads (more than 32 but less than 1024)
d. Using only the default stream for execution
.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started