The IEEE 754 standard deals with the representation of floating point numbers in computers. It also suggests four different rounding modes. They differ in accuracy and speed. Rounding is generally done after an floating point operation like addition or multiplication. The result of such an operation has two additional LSBs: the guard bit and the round bit. They help to maintain accurate intermediate results.

The easiest and quickest way to round is called rounding down, the last bits are just truncated. Obviously this is pretty inaccurate (011 would become 000 instead of the more exact 100).
The second way to round is called rounding up, this is nearly the opposite of rounding down, as every time the last bits are truncated and 1 is added to the last position. But this operation has a special case: if 0..0 is truncated 1 is not added. Having a special case and using an add this operation is slower, but a bit more accurate.
The third way is called rounding off. It is pretty accurate, as the last bits are truncated, but 1 is only add if the highest truncated bit was equal to 1.
The most accurate way to round by IEEE 754 is called round to nearest even. The last bits are truncated. 1 is added if either the truncated bits
were > 10...0 or the truncated bits were = 10...0 and the last maintained bit is equal to 1.
The last mode is standard as it is the most accurate. Accuracy is more important than speed, as floating point is often used in science and nothing is worse in science than a wrong result.

Computer Organization & Design: The Hardware / Software Interface, David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, San Francisco, California
Technische Grundlagen der Informatik II: Script, Prof. Dr. Michael Gössel, University of Potsdam
Spim Tutorial: Einführung in die Assemblerprogrammierung mit MIPS, Reinhard Nitzsche, available online
Grundlagen der Digitaltechnik, Prof. Dr.-Ing. Dr.-Ing h.c. Dr. h.c. Hans Martin Lipp, Oldenbourg Verlag, München and Wien
Technische Informatik I, Wolfram Schiffmann and Peter Schmitz, Springer Lehrbuch, Berlin
Spim Documentation, available at

The IEEE 754 standard was developed in the early 1980s in response to what had become a rather ugly situation. Each hardware vendor seemed to provide their own proprietary floating point implementation. Some implementations were based on base 16 arithmetic (i.e. their exponents indicated what power of 16 their fractional part should be multiplied by to get the actual value being represented) while others used base 2. How the different implementations handled programming errors and other exceptional situations varied wildly and round off behaviour ranged from reasonably well thought out to really quite brutal.

The end result was that porting scientific codes from one platform to another was potentially a gruesome task and even implementing algorithms with reasonably well understood error bounds was a massive undertaking on most platforms.

Into this chaos came William Kahan who, through force of will and intellect, managed to lead the development of an exceptionally well thought out approach to floating point arithmetic. Today's IEEE 754 standard is almost entirely the result of his efforts.

Considerable progress has occurred on the hardware front

The IEEE 754 standard has become, for all practical purposes, the way that floating point arithmetic is implemented on modern computers. Such is the power and influence of the standard that trying to launch a new incompatible floating point implementation is a fool’s quest which hasn’t been attempted in about fifteen years. The author of this w/u is not aware of any currently shipping floating point arithmetic implementations which can’t be configured to be IEEE 754 compliant (many implementations have minor deviations when used in their “default” modes which allow for somewhat better performance than one gets if the implementation is configured to be completely IEEE 754 compliant).

Progress on the programming language front is another story

Unfortunately, progress on the programming languages front has been considerably slower. Very few programming languages have been defined which require IEEE 754 compliance for their floating point arithmetic. As a consequence, scientific software developers are unable to safely assume that the floating point behaviour defined by IEEE 754 standard will be provided by all platforms that their code might encounter. Attempts to require full compliance with the IEEE 754 standard have even sometimes been met with quite hostile reactions from powerful players (see Sun Microsystem’s experience when they tried to mandate full IEEE 754 compliance in a variant of Java aimed at the scientific computing audience).

It is possible to write software today in many programming languages which takes advantage of most if not all aspects of the IEEE 754 standard. Sadly, writing truly portable software in any major programming language which tries to take advantage of more than the most basic aspects of the standard is still, over twenty years after the standard was adopted, nearly impossible.

What’s in a name?

The formal name of the standard is “IEEE Standard for Binary Floating-Point Arithmetic” and the standard’s full designation is ANSI/IEEE Std 754-1985. The standard has also been adopted by the International Electrotechnical Commission which designates the standard as “IEC 60559: Binary floating-point arithmetic for microprocessor systems”.


  • Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic by Professor W. Kahan, 1996 (available on the 'net at (last accessed 2003/03/02))

    This paper is definitely worth reading if you're interested in IEEE 754 floating point.

  • personal experience

Log in or registerto write something here or to contact authors.