The IEEE 754-1985 floating point standard is implemented on most computers as a way of representing non-integer numbers. It defines 4 types:
All of these work basicly the same, but with different sizes. Each of these is made of the same three pieces:
    Sign bit: if this is set, the number is negative. Always one bit
    Exponent: the size varies, but this stores the power of 2 to multiply the significand by. This uses a binary offset format, so the actual value the significand is multiplied by is equal to 2^(exponent-offset). The offset is equal to (2^(numberOfBitsInExponent))-1.
    Significand: This is stored with an implicit 1 bit before the start, so the value should be treated as 1.significandBits.


The problem with what has been outlined above is that there is no way to store 0, infinity or NaN. The special cases are outlined below:
    exponent=0:0 or -0, depending on the sign bit.
    exponent=0x1f for half, 0xff for single, 0x7ff for double or 0x7fff for quadruple.
      if the significand is 0, then infinity
      if the significand is not 0, then NaN

More information can be found at:

Log in or register to write something here or to contact authors.