# Assignment 1 Floating-Point Numbers solved

\$35.00

## Description

1. [4 marks] Complete the Python function randfp in the YOU_a1q1 notebook so that it randomly
generates normalized binary floating-point numbers from the number system F(β = 2, t, L, U).
Your function should work for values of t up to 52, and −1022 ≤ L < U ≤ 1023. You can read the function’s documentation for more information (type “? randfp”). Hint: To append strings in Python, simply ‘add’ strings. For example, b = 'hi' + ' there ' + str(15) will construct the string ’hi there 15’. 2. [4 marks] Complete the function fp2dec (in the YOU_a1q2 notebook) so that it converts binary floating-point numbers in F to their decimal equivalents. An incomplete version of the function is supplied as starter code. Its input is a string representing a binary floating-point number (as described in What you need to know above). It is sufficient to output an IEEE double-precision number as the decimal value. Hints: For this question, you might find the Python functions find, and int useful. Also, you can extract substrings using indexing. For example, if b='+0.1001b3', then b will return the string '.', and b[6:] will return '1b3'. Furthermore, the Boolean expression b=='1' would return a value of True. You cannot, however, use any other function that does the conversion for you. You must implement it yourself based on first principles. 3. [4 marks] Consider the normalized floating-point number system F(β = 7, t = 4, L = −8, U = 8), with elements of the form ±0.d1d2d3d4 × 7 p where −8 ≤ p ≤ 8 and di ∈ {0, . . . , 6}. The number system is normalized, so d1 6= 0. The only exception is the zero element, in which all the mantissa digits are zero and the exponent is zero. For the following questions, state your answers in base-7. (a) What is the largest value in F? (b) What is the value of 265307 × 100007 using this number system. (c) Derive machine epsilon for F from first principles (not using the general formula). In other words, what is the smallest value E ∈ F such that fl(1 + E) > 1.
(d) What fraction of the values in F are smaller in magnitude than 1?
4. [4 marks] Let F be a floating-point number system with machine epsilon E, and suppose that a and
b are numbers that may or may not be elements of F. Show that the relative error for the expression
fl(a) ⊕ fl(b) has the upper bound
|(fl(a) ⊕ fl(b)) − (a + b)|
|a + b|

|a| + |b|
|a + b|
E(2 + E) .
Justify each inequality that you introduce.
5. [5 marks] Consider the function
F(x) = 1
1 − x

1
1 + x
© Jeff Orchard, Reinhold Burger 2019 v1.0 Page 2
CS 370 Numerical Computation Assignment 1