IEEE Floating Point Standard
Floating point system
Floating point arithmetic
IEEE floating point standard
IEEE Floating Point Standard
 What is the IEEE floating
point standard?
 Floating point number
representation
 Special values in IEEE
floating point standard
The IEEE floating point standard is a floating point arithmetic
system adopted by the Institute for Electrical and Electronics Engineer
in the early 1980s.
Requirements for machines adopting the IEEE floating point standard
 Arithmetic should be correctly rounded
 floating point numbers should be consistently represented across
machines
 Exception handling should be sensible and consistent
Web Reference:
Single precision numbers
in a 32bit machine
The bit pattern
b_{1}b_{2}b_{3}...b_{9}b_{10}b_{11}...b_{32}
of a word in a 32bit machine represents the real number
(1)^{s} x 2^{e127} x
(1.f)_{2}
where s = b_{1}, e =
(b_{2}...b_{9})_{2}, and f =
b_{10}b_{11}...b_{32}.
sign
bit 
biased exponent 
fraction from normalized
mantissa 
1
bit 
8
bits 
23
bits 
s 
e 
f 
Note that only the fraction from the normalized mantissa is stored
and so there is a hidden bit and the mantissa is actually represented
by 24 binary digits.
Double precision numbers
in a 32bit machine
The bit pattern
b_{1}b_{2}b_{3}...b_{12}b_{13}b_{14}...b_{64}
of two words in a 32bit machine represents the real
number
(1)^{s} x 2^{e1023} x
(1.f)_{2}
where s = b_{1}, e =
(b_{2}...b_{12})_{2}, and f =
b_{13}b_{14}...b_{64}.
sign
bit 
biased exponent 
fraction from
normalized mantissa 
1
bit 
11
bits 
52
bits 
s 
e 
f 
Note that only the fraction from the normalized mantissa is stored
and so there is a hidden bit and the mantissa is actually represented
by 53 binary digits.
Decimal values of some normalized floating point numbers on a 32bit
machine:

Single Precision 
Double Precision 
Machine epsilon 
2^{23} or 1.192
x 10^{7} 
2^{52} or 2.220
x 10^{16} 
Smallest positive 
2^{126} or
1.175 x 10^{38} 
2^{1022} or
2.225 x 10^{308} 
Largest positive 
(2
2^{23})^{ }2^{127} or 3.403 x
10^{38} 
(2
2^{52}) 2^{1023} or 1.798 x
10^{308} 
Smallest subnormal 
2^{150} or 7.0
x 10^{46} 
2^{1075} or 2.5
x 10^{324} 
Decimal Precision 
6 significant digits 
15 significant digits 
Rounding in IEEE
standard
Round to the nearest mode is the most common choice. Basically,
given a real number x, its correctly rounded value is the
floating point number fl(x) that is closest to x.
Single Precision representation

sign
bit 
biased exponent 
fraction from
normalized mantissa 

1 bit 
8 bits 
23 bits 
7/4 
0 
0 1 1 1 1 1 1 1 
1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
34.432175 
1 
1 0 0 0 0 1 0 0 
0 0 0 1 0 0 1 1
0 1 1 1 0 1 0 1 0 0 0 1 1 0 0 
959818 
1 
1 0 0 1 0 0 1 0 
1 1 0 1 0 1 0 0
1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 
+
0 
0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

0 
1 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
macheps 
0 
0 1 1 0 1 0 0 0 
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
"smallest" 
0 
0 0 0 0 0 0 0 1 
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
"largest" 
0 
1 1 1 1 1 1 1 0 
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
infinity 
0 
1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
NaN 
0 
1 1 1 1 1 1 1 1 
Not all 0s or 1s 
2^{128**} 
0 
0 0 0 0 0 0 0 0 
0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 
**This is a subnormal
number. It is machine representable but is less accurate in
computation than a normalizable value.
