In this assignment you will construct a Float8 type, containing one field data, an 8-bit UInt8:
In [6]:
import Base: show, bits, exponent, significand,
+, -, *, /, Float64
type Float8
data::UInt8
end
For this types, we will interpret the 8-bits in data as: 1 sign bit, 3 exponent bits and 4 signficiand bits. That is, numbers will be represented by
x = ± 2q-s * (1.b1b2b3b4)2
Where S = 3, q is an insigned 3-bit integer in the 2nd through 4th bits and b1b2b3b4 are in the last 4 bits. We will not implement Inf.NaN or subnormal numbers. We will however implement both ±0.
We will use a globally defined constant s to represent S:
In
const S=UInt8(3) # The shift
Out[8]:
0x03
We will use the bits to get access to the bits. The following function defines bits for a Float8:
In [11]:
function bits(x::Float8)
bits(x.data)
end
Out[11]:
bits (generic function with 6 methods)
Exercise 1:
(a) Complete the following exponent function, that returns
q−S
q−S
(as an integer).
(b) Check that
exponent(Float8(UInt8(123)))
returns 4.
In [18]:
function exponent(x::Float8)
## TODO: interpret the bits and return an
integer
end
Out[18]:
exponent (generic function with 6 methods)
Exercise 2:
(a) Complete the following significand function, that returns the significand of a Float8. Don't forget to incorporate the sign bit.
(b) Add comments explaining the purpose of the first 4 lines.
(c) Check that
significand(Float8(UInt8(123)))
returns 1.6875.
In [28]:
function significand(x::Float8)
if x.data==0
0.0
elseif x.data==128
-0.0
else
bts=bits(x)
#TODO: convert to 8-bits in bts to a
Floating point
end
end
Out[28]:
significand (generic function with 6 methods)
Excercise 3:
(a) Use exponent and significand to complete the definition of Float64(x::Float8), which converts a Float8 to a Float64
(b) Check that
Float8(UInt8(123))
now displays as 27.0f8.
In [35]:
function Float64(x::Float8)
#TODO: return a Float8 that equals the input
end
function show(io::IO,x::Float8)
print(io,Float64(x))
print(io,"f8")
end
Out[35]:
show (generic function with 106 methods)
Exercise 4
(a) Complete the following chop_to_8_bits function that returns a string for normal numbers containing the 8-bits for the Float8 representation. For this question, you can simply chop the significand bits of a Float64. (Recall that a Float64 has 1 sign bit, 11 exponent bits and 52 significand bits.)
(b) Add comments explaining the definition of
Float8(::Float64).
(c) Check that
Float8(1.25)
returns 1.25f8.
(d) Explain why
Float8(1.3)
returns the same number.
In [47]:
function chop_to_8_bits(x::Float64)
#TODO: returns a string containing 8-bits for
the Float8 approximation to x
end
function Float8(x::Float64)
if x===0.0
Float8(UInt8(0))
elseif x===-0.0
Float8(UInt8(128))
else
Float8(parse(UInt8,chop_to_8_bits(x),2))
end
end
Out[47]:
Float8
Exercise 5
Complete the following function that negates a Float8:
In [51]:
function -(x::Float8)
#TODO: return the bits corresponding to -x
end
Exercise 6
(a) Complete the following algebra operations, ensuring that each
one returns a Float8. You can use Float64(x::Float8) and
Float8(x::Float64) to use the inbuilt Float64 arithmetic.
(b) Check that
Float8(1.25)+Float8(2.25)
returns 3.5f8
In [51]:
function +(x::Float8,y::Float8)
#TODO: return x+y as Float8
end
function *(x::Float8,y::Float8)
#TODO: return x*y as Float8
end
function /(x::Float8,y::Float8)
#TODO: return x/y as Float8
end
function -(x::Float8,y::Float8)
#TODO: return x-y as Float8
end
Out[51]:
- (generic function with 205 methods)
Exercise 7
(a) Implement the following routine round_to_8bits that rounds to the nearest Float8, rather than chops.
(b) Check that
Float8(parse(UInt8,round_to_8_bits(1.3),2))
returns 1.3125f8.
In [61]:
function round_to_8_bits(x::Float64)
# TODO: Round a Float64 to a Float8
end
Out[61]:
round_to_8_bits (generic function with 1 method)