Answered step by step
Verified Expert Solution
Question
1 Approved Answer
computer science your input image is named santa - grayscale.jpg , the execution command would be / / convolution santa - grayscale.jpg
computer science
your input image is named "santagrayscale.jpg the execution command would be convolution santagrayscale.jpg
No command argument for filter radius is necessary in this assignment, as we utilize the constant average filter of size where the filter radius is
Within
convolution.cu implement one host function and two CUDA kernels to perform convolution, respectively. Specifically,
a A host function that performs the convolution operation using CPUonly.
b A CUDA kernel that performs the convolution operation using GPU but without tiling. You may refer to the example provided on Slide in Module but please modify it to incorporate the average filter matrix. It is acceptable to load the average filter from either GPU global memory or constant memory.
c An optimized CUDA kernel that presents a "tiled" version of the convolution operation using GPU shared memory. Specifically, in this optimized kernel, please load the average filter from GPU constant memory.
In this assignment, you are tasked with developing a complete CUDA CC program for an image blur application, also known as image smoothing that we leamed in Module "Multidimensional Grids and Data".
Convolution serves as the fundamental operation for implementing the image blur process. Specially, one of the convolution kemels requested in this assignment should be optimized using the tiled convolution technique that we learned in Module "Convolution". This optimization involves leveraging GPU shared memory and constant memory to enhance performance.
Below are the specific requirements:
Develop a single CUDA program file named
convolution.cu containing all the necessary code to blur an input image and generate its blurred image. For simplicity, we will use the average filter of size in this assignment, where each filter element holds the floatingpoint value
It is highly recommended to utilize the NCSA Delta GPUs for this assignment. However, if you're experienced in successfully building and installing OpenCV for from sources, you may proceed with your own computers.
Below are the commandlines for compiling and executing your program using NCSA Delta GPUs:
Log in to the NCSA supercomputer using your own NCSA account.
Enter an interactive session with GPUs, for example: srun accountbchndeltagpu partitiongpuAxinteractive nodes spuspernode tasks taskspernode cpuspertask mem pty bash
Load the OpenCV module: module load opencvx
To compile: nvec convolution
convolution.cu I SOPENCVHOMEincludeopencvL SOPENCVHOMEIblopency core lopency imgcodecs lopency imgroc
To execute: convolution inputimg,jps
When running your program,
Please make sure to replace "inputImg.jpg with the name of your input image file, which should be located in the same directory as your program. For example, if
Ensure that your code can handle images with varying image sizes. Please also consider boundary conditions to ensure proper handling in such cases.
For testing purposes, two input images of different sizes are provided in the zipped folder accompanying this assignment.
"santagrayscale.jpg: a grayscale image of size
"treegrayscale.jpg: a grayscale image of size
You may also choose to test your program with additional grayscale images if desired.
Utilize timing techniques such as CPU timers or CUDA events, to measure the performance of your implementation of the host function and two CUDA kernels as specified above.
We also suggest structuring the
convolution.cu by implementing the following macros, host functions, and CUDA kernels. At the end of this assignment, we will provide screenshots of an example of the program's structure.
#define CHECKcall
A macro for error checking.
double myCPUTimer
A timer for measuring execution time.
void blurImagehcv::Mat PoutMath cv::Mat PinMath unsigned int nRows, unsigned int nCols
A host function for CPUonly convolution.
globalvoid blurImageKernelunsigned char Pout, unsigned char Pin, unsigned int width, unsigned int height
A CUDA kernel performs a simple convolution without using tiling.
void void blurImagedcv::Mat PoutMath cv::Mat PinMath unsigned int nRows, unsigned int nCols
A host function for handling device memory allocation and free, data copy, and calling the specific CUDA kernel, blurImageKernel
Given a string s find the length of the longest substring without repeating characters.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started