Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please help manipulate this code to answer the question below #include #include #include #define FILE DOUBLEPRECISION_VMA to demonstrate evaluating Cycles per Ops ( CpOPs )

Please help manipulate this code to answer the question below

#include

#include

#include

#define FILE "DOUBLEPRECISION_VMA to demonstrate evaluating Cycles per Ops ( CpOPs )"

#define SIZE 1000 //1000 base. If vector = 1 and LOOPs=1E10, and /O3 cycle per flop represent approx latency to L1

#define TOTAL_ITERATIONS 1E10 // total iterantions is the produst of SIZE and # of outer accuracy improvement loops (LOOP)

#define CPU_CLK 2.394455e9 //3.43E9 for Dell i7-6600, 2.8E9 for i5 Surface Pro, 3.48E9 for HP i7-7700; 2.394455e9 for Xeon for Sun 1.593E9

int main()

{

//declare vectors and variables

int i;

long long j, LOOP;

double * z;

double * x;

double * y;

double a;

double NumOfOps;

double FLOPS;

double ElapsedTime;

double ElapsedTimePerVector;

double ElapsedTimePerVectorElement;

//declare time.h counter variables

time_t start_time;

time_t end_time;

a = 0.001; // declare the axpy coefficient and variables. Allocate memory/stack space for them

z = (double*)malloc(SIZE * sizeof(double));

x = (double*)malloc(SIZE * sizeof(double));

y = (double*)malloc(SIZE * sizeof(double));

LOOP = TOTAL_ITERATIONS / SIZE;

//fill vectors with random values

for (i = 0; i

{

x[i] = (double)1 + rand() % 100;

y[i] = (double)1 + rand() % 100;

z[i] = 0.0;

}

printf("File being run : %s ", FILE);

printf("Program written by ***** ");

printf("Arrays filled with data... computation begins now... ");

//printf("Compiler flags: %s ", FLAGS); //print or suppress printing of compiler flags as needed

printf("Number of elements per vector is: %d ", SIZE);

printf("Number of resolution loops is: %e ", (double)LOOP);

printf("Total number of inner & outer loop iterations is: %e ", (double)TOTAL_ITERATIONS);

printf("Processor clock frequency is: %0.2e cycles per second ", CPU_CLK);

NumOfOps = (double)(2 * SIZE*(double)LOOP); //evaluate total number of multiply adds

printf("# of floating point adds is: %0.3e ", NumOfOps);

printf("Vectors populated.. computation begins now ... ");

start_time = (double)time(0);

for (j = 0; j

{

for (i = 0; i

{

//z[i] = a*x[i];

//z[i] = x[i] + y[i];

z[i] = (a*x[i]) + y[i]; //single line of code to implement daxpy

}

}

end_time = (double)time(0);

printf("Computation Done ! ");

ElapsedTime = ((end_time - start_time)); //elapsed time in double precision format

printf("Measured elasped time was: %0.4e seconds ", ElapsedTime);

/* printf("Start number of seconds since 0 is: %d ", start_time);

printf("End number of seconds since 0 is: %d ", end_time);

printf("Start time in number of system clock ticks since 0 is: %0.12e ", (double)CPU_CLK*start_time);

printf("End time in number of system clock ticks since 0 is: %0.12e ", (double)CPU_CLK*end_time);

printf("Number of system clock ticks in measurement is : %0.12e ", CPU_CLK*(end_time - start_time)); */

//time() is returning an integer value representing the number of seconds since time 0 .. so for a valid measurement,

//the entire computation must take at lease 1 second, otherwise the timer won't be able to capture the event.

//print execution time of multiply add operation

// printf("Execution time per multiply-add operation = %e : ", ElapsedTime/NumOfMultAdds);

ElapsedTimePerVector = (double)ElapsedTime / (double)LOOP;

printf("Execution time per vector is: %0.4e seconds ", ElapsedTimePerVector);

ElapsedTimePerVectorElement = ElapsedTimePerVector / ((double)SIZE);

printf("Execution time per vector element is: %0.4e seconds ", ElapsedTimePerVectorElement);

printf("Execution time per arithmetic Op is: %0.4e seconds ", ElapsedTimePerVectorElement / 2); /eed to divide by 2 for complete multiply add functionality

FLOPS = (2 / ElapsedTimePerVectorElement); /eed 2 in numerator for case of multiply add

printf("Estimated number of clock cycles per OP is %0.2f CPU Clock Cycles per OP ", (CPU_CLK) / (FLOPS));

printf("Estimated number of Ops per clock cycle is %0.2f OPs per CPU Clock Cycles ", (FLOPS) / (CPU_CLK));

printf("Estimated FLOPs per second is: %0.3e OPs per second ", FLOPS);

//getchar();

free(x);

free(y);

free(z);

return 0;

}

image text in transcribed

Your console program should print out ( using formatted printf commands )

3) Processor Clock frequency

4) the square matrix dimension, N

5) Compiler optimization setting

6) # of accuracy loops,

7) Algorithmically evaluated number of arithmetic operations required to compute the NxN

multiplication

8) total measured execution time,

9) evaluated execution time for the NxN matrix,

10) evaluated execution time per matrix element ,

11) evaluated execution time per arithmetic operation,

12) evaluated Cycles per arithmetic Op, CpOps

13) evaluated Ops per Cycle, and

14) evaluated double precision operations per second

Write a C console program of the matrix multiplication algorithm ( AjxB] [C] ) for double precision data types. Instrument and monitor and measure execution time for the computation a) Your C program should be single threaded, and sequential. All matrices [A], [B], and [C] are to be square, i.e. same number of rows and columns Execution should be scalable and be able to handle matrix dimension N x N, from 4 x 4, 16x16, 32x32, 64x64, 128x128, 256x256, 512x512, 1024x1024 and 2048x2048. Set the matrix dimension, N, number of accuracy improvement loops, and system CPU clock speed using DEFINE statements. Use a random number generator to fill the random data into the matrices.| Compiler optimizations should be configured for full optimization (-03 in gcc or /Ox in MS VS) Comment your code to explain what it is doing. As in programming exercise 1 and 2, make your code portable. Using malloc is the preferred, and recommended method b) and either the "time() or clock() timing functions You should use the time.h header file library, to capture the start and end execution times for your benchmark. You will need two nested loops to perform the matrix multiplication, and in addition, as in PE2, you will need to use a 3rd outer accuracy improvement loop. Adjust the total number of iterations of your inner and outer accuracy improvement loops so that total execution time is approximately constant as you vary the dimensions of your A, B and C matrices change from 4x4 to 2048x2048

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Authors: Michael L. Brodie

1st Edition

1947487167, 978-1947487161

More Books

Students also viewed these Databases questions

Question

=+ If strikes occur, are they legally regulated?

Answered: 1 week ago

Question

=+industrial action Under what circumstances can unions strike?

Answered: 1 week ago

Question

=+What forms of industrial action are common?

Answered: 1 week ago