Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this assignment, you are going to write code to do this, allowing you to convert the floating point numbers to a bit-level representation. You

For this assignment, you are going to write code to do this, allowing you to convert the floating point numbers to a bit-level representation. You will also write code to perform addition and multiplication of floating point numbers.

INPUT: You will read in a program and call your functions to implement these programs. The language is very simple with only 4 different kinds of statements: assignment, print, add and multiply. An example of a program is given below:

cat sampleprogram

x = 0.26 print x y = 15.25

print y

a=x+y

print a

z=x*y

print z

OUTPUT: The output will be the current values of the given variables at the print statements. For the above program, the output would be:

./fp < sampleprogram

> > x = 0.2597656250 > > y = 15.2500000000 > > a = 15.5000000000

> > z = 3.9609375000

> Exiting

Some of this task is already done for you. We will provide a program that reads in the given programs, saves the values (as integers that encode the corresponding bit-level representation in our floating-point format) and calls the functions (described next) that you will be implementing.

Encoding of our smaller Floating Point within a 32-bit int.

You are going to implement this 13-bit floating point representation, where 4 bits are for the exponent (exp) and 8 are for the fraction (frac).

Using bit-level operators, you will write the code for the functions (shown below) to help implement the program statements:

  • Assignment statement (variable = value) this operation calls your function computeFP(), which converts from a C float value to our 13-bit mini-float representation (which only uses the 13lowest of the given 32 bits in an integer). The return value of the function will be the 32-bit integer that encodes the corresponding bit representation.

For example, if a floating-point number is represented by the exp field expressed in bits as 0100, and the frac field expressed in bits as 0000 0001, then the integer that must be returned is the one that corresponds to the 32-bit pattern 0000 0000 0000 0000 0000 0100 0000 0001 specifically 0x00000401. Observe how the exp and frac bits are preceded by a sequence of leading 0s to make the representation 32 bits that fit within an int.

int computeFP(float val) { } // input: float value to be represented // output: 32-bit integer that encodes the input float value in our IEEE-like format

Given the number of bits, the rounding you will have to do for this representation may be substantial. In this assignment, we will simply truncate the fraction (i.e., round down).

For example, the closest representable value for 0.26 (rounding down) is 0.2597656250, as can be seen in the program output. This means that when 0.26 is converted to the binary floating-point representation in our format, some precision is lost, and the resulting bit pattern corresponds to 0.2597656250 when printed by the getFP() function below.

  • Print statement (print variable) uses your getFP() function to convert from our mini-float representation to a regular C float value, and formats/prints it out nicely. Return the converted C float. (For Infinity, you can simply return -1)

float getFP(int val) { } // Using the defined representation, compute and // return the floating point value

  • Add statement for this statement, you are going to take two values in our representation and use the same technique as described in class/comments to add these values and return the result converted back into our representation. (i.e., if E1 > E2:Align M2, then M = M1+M2, E=E1, and adjust M & E as needed)

When implementing this statement, DO NOT convert the numbers back to float, add them directly as C floats, and then convert to the new representation (doing so will not bring any credit).

int addVals(int source1, int source2) {}

  • Multiply statement for this statement, you are going to take two values in our representation and use the same technique as described in class/comments to multiply these values and return the result in our representation. (i.e. M = M1*M2, E=E1+E2, and adjust M & E as needed)

When implementing this statement, DO NOT convert the numbers back to floats, multiply them directly as C floats, and then convert to the new representation (doing so will not bring any credit).

int multVals(int source1, int source2) {}

Assumptions

To make your life a little easier, we are going to make the following assumptions:

  • No negative numbers. The sign bit can be ignored (always has a 0 value).
  • Only one Special Number (Positive Infinity). For your getFP() function, you will be

returning a -1 for Infinity.

Note that your program should be able to process/manipulate both normalized and denormalized numbers, as long as they are non-negative.

#include

#include

#include

#include "fp.h"

int

computeFP(floatval) {

// input: float value to be represented

// output: integer version in our representation

//

// Perform this the same way we did in class -

// either dividing or multiplying the value by 2

// until it is in the correct range (between 1 and 2).

// Your exponent is the number of times this operation

// was performed.

// Deal with rounding by simply truncating the number.

// You will only have Positive Values. Sign will always be 0.

// Check for overflow -

// with 4 exponent bits, we have overflow if the number to be

// stored is:

// for overflow (exp > 14), store the value as Positive Infinity

// Check for Denormalized values as well.

// If M is in the form of 1.X and E is < 1-Bias, it will be Denormalized

// If the number is too small to encode as denormalized, return 0.

return2;

}

floatgetFP(intval) {

// Using the defined representation, compute the floating point

// value

// For 0, simply return 0.

// For Infinity, return -1;

return2.0;

}

int

multVals(intsource1, intsource2) {

// You must implement this by using the algorithm

// described in class:

// Add the exponents: E = E1+E2

// multiply the fractional values: M = M1*M2

// if M too large, divide it by 2 and increment E

// save the result

// You will only deal with positive source values

// Be sure to check for overflow - store value as Infinity

// If the value is too small for denormalized, return 0

return2;

}

int

addVals(intsource1, intsource2) {

// Do this function last - it is the most difficult!

// You must implement this as described in class:

// If needed, adjust one of the two number so that

// they have the same exponent E

// Add the two fractional parts: F1' + F2 = F

// (assumes F1' is the adjusted F1)

// Adjust the sum F and E so that F is in the correct range

//

// As described in the handout, you only need to implement this for

// positive source values

// If the sum results in overflow, return Infinity

// If the sum is 0, return 0

return2;

}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential SQLAlchemy Mapping Python To Databases

Authors: Myers, Jason Myers

2nd Edition

1491916567, 9781491916568

More Books

Students also viewed these Databases questions