IEEE Floating Point Standard
Floating point system
Floating point arithmetic
IEEE floating point standard
IEEE Floating Point Standard
- What is the IEEE floating
point standard?
- Floating point number
representation
- Special values in IEEE
floating point standard
The IEEE floating point standard is a floating point arithmetic
system adopted by the Institute for Electrical and Electronics Engineer
in the early 1980s.
Requirements for machines adopting the IEEE floating point standard
- Arithmetic should be correctly rounded
- floating point numbers should be consistently represented across
machines
- Exception handling should be sensible and consistent
Web Reference:
Single precision numbers
in a 32-bit machine
The bit pattern
b1b2b3...b9b10b11...b32
of a word in a 32-bit machine represents the real number
(-1)s x 2e-127 x
(1.f)2
where s = b1, e =
(b2...b9)2, and f =
b10b11...b32.
sign
bit |
biased exponent |
fraction from normalized
mantissa |
1
bit |
8
bits |
23
bits |
s |
e |
f |
Note that only the fraction from the normalized mantissa is stored
and so there is a hidden bit and the mantissa is actually represented
by 24 binary digits.
Double precision numbers
in a 32-bit machine
The bit pattern
b1b2b3...b12b13b14...b64
of two words in a 32-bit machine represents the real
number
(-1)s x 2e-1023 x
(1.f)2
where s = b1, e =
(b2...b12)2, and f =
b13b14...b64.
sign
bit |
biased exponent |
fraction from
normalized mantissa |
1
bit |
11
bits |
52
bits |
s |
e |
f |
Note that only the fraction from the normalized mantissa is stored
and so there is a hidden bit and the mantissa is actually represented
by 53 binary digits.
Decimal values of some normalized floating point numbers on a 32-bit
machine:
|
Single Precision |
Double Precision |
Machine epsilon |
2-23 or 1.192
x 10-7 |
2-52 or 2.220
x 10-16 |
Smallest positive |
2-126 or
1.175 x 10-38 |
2-1022 or
2.225 x 10-308 |
Largest positive |
(2-
2-23) 2127 or 3.403 x
1038 |
(2-
2-52) 21023 or 1.798 x
10308 |
Smallest subnormal |
2-150 or 7.0
x 10-46 |
2-1075 or 2.5
x 10-324 |
Decimal Precision |
6 significant digits |
15 significant digits |
Rounding in IEEE
standard
Round to the nearest mode is the most common choice. Basically,
given a real number x, its correctly rounded value is the
floating point number fl(x) that is closest to x.
Single Precision representation
|
sign
bit |
biased exponent |
fraction from
normalized mantissa |
|
1 bit |
8 bits |
23 bits |
7/4 |
0 |
0 1 1 1 1 1 1 1 |
1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
-34.432175 |
1 |
1 0 0 0 0 1 0 0 |
0 0 0 1 0 0 1 1
0 1 1 1 0 1 0 1 0 0 0 1 1 0 0 |
-959818 |
1 |
1 0 0 1 0 0 1 0 |
1 1 0 1 0 1 0 0
1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 |
+
0 |
0 |
0 0 0 0 0 0 0 0 |
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
-
0 |
1 |
0 0 0 0 0 0 0 0 |
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
macheps |
0 |
0 1 1 0 1 0 0 0 |
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
"smallest" |
0 |
0 0 0 0 0 0 0 1 |
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
"largest" |
0 |
1 1 1 1 1 1 1 0 |
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |
infinity |
0 |
1 1 1 1 1 1 1 1 |
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |
NaN |
0 |
1 1 1 1 1 1 1 1 |
Not all 0s or 1s |
2-128** |
0 |
0 0 0 0 0 0 0 0 |
0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 |
**This is a subnormal
number. It is machine representable but is less accurate in
computation than a normalizable value.
|