Question

1 Approved Answer

Posted on Sep 22, 2024

Please make sure the code is catered to the specific 13-bit representation where the sign is 1-bit, the ex is 4-bits, and the frac is

image text in transcribed

Please make sure the code is catered to the specific 13-bit representation where the sign is 1-bit, the ex is 4-bits, and the frac is 8-bits

In class, we talked about the IEEE standard for floating point representation and did examples using different sizes for exponent and fraction fields so that you could learn how to do the conversions. For this assignment, you are going to write code to do this, allowing you to convert the floating point numbers to a bit-level representation You will also write code to perform addition and multiplication of floating point numbers, using the techniques discussed in class INPUT: You will read in a 'program' and call your functions to implement these programs. The language is very simple with only 4 different kinds of statements assignment, print, add and multiply. An example of a program is given below zeus-1:P1$ cat sampleprogram x0.26 Prine26cat 15.25 print y print a print z OUTPUT: The output will be the current values of the given variables at the print statements. For the above program, the output would be zeus-1:P1$ ./fp Xe.2597656250 y 15.2500000000 > a15.5000000000 > z 3.9609375000 Exiting Some of this task is already done for you. We will provide a program that reads in the given programs, saves the values (as integers that encode the corresponding bit-level representation in our floating-point format) and calls the functions (described next) that you will be implementing Encoding of our smaller Floating Point within a 32-bit int is in this format: Unused Bits Sex rac You are going to implement this 13-bit floating point representation, where 4 bits are for the exponent (exp) and 8 are for the fraction (frac). Using bit-level operators, you will write the code for the functions (shown below) to help implement the program statements . Assignment statement (variable value) -this operation calls your function computeFP(), which converts from a C float value to our 13-bit mini-float representation (which only uses the 13 lowest of the given 32 bits in an integer). The return value of the function will be the 32-bit integer that encodes the corresponding bit representation. For example, if a floating-point number is represented by the "exp" field expressed in bits as 0100, and the "frac" field expressed in bits as "0000 0001", then the integer that must be returned is the one that corresponds to the 32-bit pattern "0000 0000 0000 0000 0000 0100 0000 0001"- specifically 0x00000401. Observe how the "exp" and "frac" bits are preceded by a sequence of leading Os to make the representation 32 bits that fit within an int int computeFP(float val) ) // input: float value to be represented // output: 32-bit integer that encodes the input float value in our IEEE-like format Given the number of bits, the rounding you will have to do for this representation may be substantial. In this assignment, we will simply truncate the fraction (i.e., round down) For example, the closest representable value for e.26 (rounding down) is 0.2597656250, as can be seen in the program output. This means that when 0.26 is converted to the binary floating-point representation in our format, some precision is lost, and the resulting bit pattern corresponds to 0.2597656250 when printed by the getFPO function below .Print statement (print variable) uses your getFP) function to convert from our mini-float representation to a regular C float value, and formats/prints it out nicely. Return the converted C float. (For Infinity, you can simply return -1) float getFP(int val) // Using the defined representation, compute and // return the floating point value Add statement - for this statement, you are going to take two values in our representation and use the same tec add these values and return the result converted back into our representation (i.e., ifEl> E2:Align M2, then M- M1+M2, E-EI, and adjust M & E as needed) . hnique as described in class/comments to When implementing this statement, DO NOT convert the numbers back to float, add them directly as C floats, and then convert to the new representation (doing so will not bring any credit) int addVals (int source1, int source2) . Multiply statement for this statement, you are going to take two values in our representation and use the same technique as described in class/comments to multiply these values and return the result in our representation (i.e. M- MI*M2, E-El+E2, and adjust M & E as needed) When implementing this statement, DO NOT convert the numbers back to floats, multiply them directly as C floats, and then convert to the new representation (doing so will not bring any credit) int multVals (int source1, int source2) t) Assumptions To make your life a little easier, we are going to make the following assumptions . No negative numbers. The sign bit can be ignored (always has a 0 value) . Only one Special Number (Positive Infinity). For your getFP) function, you will be returning a 1 for Infinity Note that your program should be able to process/manipulate both normalized and denormalized numbers, as long as they are non-negative