Question
Please help manipulate this code to answer the question below #include #include #include #define FILE DOUBLEPRECISION_VMA to demonstrate evaluating Cycles per Ops ( CpOPs )
Please help manipulate this code to answer the question below
#include
#include
#include
#define FILE "DOUBLEPRECISION_VMA to demonstrate evaluating Cycles per Ops ( CpOPs )"
#define SIZE 1000 //1000 base. If vector = 1 and LOOPs=1E10, and /O3 cycle per flop represent approx latency to L1
#define TOTAL_ITERATIONS 1E10 // total iterantions is the produst of SIZE and # of outer accuracy improvement loops (LOOP)
#define CPU_CLK 2.394455e9 //3.43E9 for Dell i7-6600, 2.8E9 for i5 Surface Pro, 3.48E9 for HP i7-7700; 2.394455e9 for Xeon for Sun 1.593E9
int main()
{
//declare vectors and variables
int i;
long long j, LOOP;
double * z;
double * x;
double * y;
double a;
double NumOfOps;
double FLOPS;
double ElapsedTime;
double ElapsedTimePerVector;
double ElapsedTimePerVectorElement;
//declare time.h counter variables
time_t start_time;
time_t end_time;
a = 0.001; // declare the axpy coefficient and variables. Allocate memory/stack space for them
z = (double*)malloc(SIZE * sizeof(double));
x = (double*)malloc(SIZE * sizeof(double));
y = (double*)malloc(SIZE * sizeof(double));
LOOP = TOTAL_ITERATIONS / SIZE;
//fill vectors with random values
for (i = 0; i
{
x[i] = (double)1 + rand() % 100;
y[i] = (double)1 + rand() % 100;
z[i] = 0.0;
}
printf("File being run : %s ", FILE);
printf("Program written by ***** ");
printf("Arrays filled with data... computation begins now... ");
//printf("Compiler flags: %s ", FLAGS); //print or suppress printing of compiler flags as needed
printf("Number of elements per vector is: %d ", SIZE);
printf("Number of resolution loops is: %e ", (double)LOOP);
printf("Total number of inner & outer loop iterations is: %e ", (double)TOTAL_ITERATIONS);
printf("Processor clock frequency is: %0.2e cycles per second ", CPU_CLK);
NumOfOps = (double)(2 * SIZE*(double)LOOP); //evaluate total number of multiply adds
printf("# of floating point adds is: %0.3e ", NumOfOps);
printf("Vectors populated.. computation begins now ... ");
start_time = (double)time(0);
for (j = 0; j
{
for (i = 0; i
{
//z[i] = a*x[i];
//z[i] = x[i] + y[i];
z[i] = (a*x[i]) + y[i]; //single line of code to implement daxpy
}
}
end_time = (double)time(0);
printf("Computation Done ! ");
ElapsedTime = ((end_time - start_time)); //elapsed time in double precision format
printf("Measured elasped time was: %0.4e seconds ", ElapsedTime);
/* printf("Start number of seconds since 0 is: %d ", start_time);
printf("End number of seconds since 0 is: %d ", end_time);
printf("Start time in number of system clock ticks since 0 is: %0.12e ", (double)CPU_CLK*start_time);
printf("End time in number of system clock ticks since 0 is: %0.12e ", (double)CPU_CLK*end_time);
printf("Number of system clock ticks in measurement is : %0.12e ", CPU_CLK*(end_time - start_time)); */
//time() is returning an integer value representing the number of seconds since time 0 .. so for a valid measurement,
//the entire computation must take at lease 1 second, otherwise the timer won't be able to capture the event.
//print execution time of multiply add operation
// printf("Execution time per multiply-add operation = %e : ", ElapsedTime/NumOfMultAdds);
ElapsedTimePerVector = (double)ElapsedTime / (double)LOOP;
printf("Execution time per vector is: %0.4e seconds ", ElapsedTimePerVector);
ElapsedTimePerVectorElement = ElapsedTimePerVector / ((double)SIZE);
printf("Execution time per vector element is: %0.4e seconds ", ElapsedTimePerVectorElement);
printf("Execution time per arithmetic Op is: %0.4e seconds ", ElapsedTimePerVectorElement / 2); /eed to divide by 2 for complete multiply add functionality
FLOPS = (2 / ElapsedTimePerVectorElement); /eed 2 in numerator for case of multiply add
printf("Estimated number of clock cycles per OP is %0.2f CPU Clock Cycles per OP ", (CPU_CLK) / (FLOPS));
printf("Estimated number of Ops per clock cycle is %0.2f OPs per CPU Clock Cycles ", (FLOPS) / (CPU_CLK));
printf("Estimated FLOPs per second is: %0.3e OPs per second ", FLOPS);
//getchar();
free(x);
free(y);
free(z);
return 0;
}
Your console program should print out ( using formatted printf commands )
3) Processor Clock frequency
4) the square matrix dimension, N
5) Compiler optimization setting
6) # of accuracy loops,
7) Algorithmically evaluated number of arithmetic operations required to compute the NxN
multiplication
8) total measured execution time,
9) evaluated execution time for the NxN matrix,
10) evaluated execution time per matrix element ,
11) evaluated execution time per arithmetic operation,
12) evaluated Cycles per arithmetic Op, CpOps
13) evaluated Ops per Cycle, and
14) evaluated double precision operations per second
Write a C console program of the matrix multiplication algorithm ( AjxB] [C] ) for double precision data types. Instrument and monitor and measure execution time for the computation a) Your C program should be single threaded, and sequential. All matrices [A], [B], and [C] are to be square, i.e. same number of rows and columns Execution should be scalable and be able to handle matrix dimension N x N, from 4 x 4, 16x16, 32x32, 64x64, 128x128, 256x256, 512x512, 1024x1024 and 2048x2048. Set the matrix dimension, N, number of accuracy improvement loops, and system CPU clock speed using DEFINE statements. Use a random number generator to fill the random data into the matrices.| Compiler optimizations should be configured for full optimization (-03 in gcc or /Ox in MS VS) Comment your code to explain what it is doing. As in programming exercise 1 and 2, make your code portable. Using malloc is the preferred, and recommended method b) and either the "time() or clock() timing functions You should use the time.h header file library, to capture the start and end execution times for your benchmark. You will need two nested loops to perform the matrix multiplication, and in addition, as in PE2, you will need to use a 3rd outer accuracy improvement loop. Adjust the total number of iterations of your inner and outer accuracy improvement loops so that total execution time is approximately constant as you vary the dimensions of your A, B and C matrices change from 4x4 to 2048x2048Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started