Question
PLEASE USE CUDA C LANGUAGE FOR THIS TASK A matrix addition takes two input matrices A and B and produces one output matrix C. Each
PLEASE USE CUDA C LANGUAGE FOR THIS TASK
A matrix addition takes two input matrices A and B and produces one output matrix C. Each element of the output matrix C is the sum of the corresponding elements of the input matrices A and B, i.e., C[i][j] = A[i][j] + B[i][j]. For simplicity, we will only handle square matrices whose elements are single-precision floating-point numbers. Write a matrix addition kernel and the host stub function that can be called with four parameters: pointerto-the-output matrix, pointer-to-the-first-input matrix, pointer-to-the-secondinput matrix, and the number of elements in each dimension. Follow the instructions below:
A. Write the host stub function by allocating memory for the input and output matrices, transferring input data to device; launch the kernel, transferring the output data to host and freeing the device memory for the input and output data. Leave the execution configuration parameters open for this step.
B. Write a kernel that has each thread to produce one output matrix element. Fill in the execution configuration parameters for this design.
C. Write a kernel that has each thread to produce one output matrix row. Fill in the execution configuration parameters for the design.
D. Write a kernel that has each thread to produce one output matrix column. Fill in the execution configuration parameters for the design.
E. Analyze the pros and cons of each kernel design above.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started