Other

A Discussion of and Fix for the Pentium FDIV Bug

Wolfram Research, Inc.

Author

Wolfram Research, Inc.

Title

A Discussion of and Fix for the Pentium FDIV Bug

Description

This item contains a patch to the Pentium FDIV bug. This patch was created by Wolfram Research, Inc. This item also contains a Mathematica notebook that presents an in-depth analysis of the Pentium bug, and discusses the patch.

The Pentium FDIV Bug: Some Notes for the Mathematica Community and Others

(PRELIMINARY VERSION)

Illustration of the Bug

An illustration of the Pentium bug, created by running Mathematica on a Pentium.

The graphs show the function 1/x plotted as a function of x. The top graph is on a large scale, and has a familiar smooth shape. Each gray dot, however, represents a region in which bugs can occur, and an incorrect result can be generated.

The middle graph shows part of one of these dots, magnified by a factor of five billion. The correct result here is a smooth curve. On the Pentium, however, the sharp dip shown in the picture occurs.

The bottom graph shows what happens if one magnifies by another factor of 50,000. Each individual dot in this graph corresponds to a machine-precision number with a distinct digit sequence. At this level, the fact that each digit sequence is of limited length makes it inevitable that discrete jumps will occur. These jumps are what ultimately lead to roundoff error in numerical computation, but they are small compared to the glitches associated with the Pentium bug.

The various dips caused by the Pentium bug vary greatly in magnitude; there are dips both much larger and much smaller than the one shown here.

Introduction

Mathematica is widely used as a standard for mathematical computation, and we take great pride in ensuring that it does its computations as reliably and accurately as possible. However, Mathematica must run on top of existing hardware and operating systems, over which we have almost no control.

And in fact, in porting Mathematica to nearly a hundred computing environments over the past several years it has been our experience that particularly when they are new almost every kind of processor or operating system has bugs which affect the running of Mathematica.

The bug in the Pentium processor therefore comes as no surprise, but it has received so much attention that we thought people might find it useful to know our thoughts on it.

In the huge battery of tests that we routinely run on every version of Mathematica, we have not found a single "natural occurrence" of the Pentium FDIV bug. Indeed, as we describe below, our analysis suggests that the Pentium's bug will occur particularly rarely in the kinds of technical computation for which Mathematica is primarily used.

Not a single Mathematica user has reported encountering the Pentium bug in the course of their actual work. Indeed the only relevant technical support calls that we have received have been from users asking why Mathematica does not fail on the particular examples of floating point division described on the net and in newspapers. (The answer is that Mathematica often does symbolic rearrangement of expressions before evaluating them numerically so that it does slightly different hardware operations.)

Another important point is that the Pentium bug leads to very small fractional errors in results. It is certainly possible to construct circumstances where such errors are amplified, but these circumstances will also amplify the roundoff errors that are inevitable in computer arithmetic. Well-designed numerical algorithms therefore try to avoid amplifying any errors, and as a result will generally not show large effects from the Pentium bug.

Despite all this, we have now developed a software workaround which avoids the bugs, as described below.

Mathematica Versions Affected by the Bug

Executing the following command will determine whether your version of Mathematica is affected by the Pentium bug:

In[1]:=

3221224323. / 3221224323. == 1

If this returns True, then your system is operating correctly. If it returns False, then the system is exhibiting the bug.

The following versions of Mathematica can potentially be affected by a buggy Pentium chip:

- Microsoft Windows (Enhanced professional version) (also for Windows NT)

- MS-DOS (Enhanced professional version)

- OS/2

- NEXTSTEP for Intel

Student versions of Mathematica, as well as professional Standard versions, do not take advantage of hardware floating point instructions, and will therefore never be affected.

What the Bug Could Conceivably Do

Since none of our tests have ever found a naturally occurring manifestation of the Pentium bug, it is difficult for us to be sure about its possible consequences. However, the executable for the Mathematica kernel under Microsoft Windows involves a total of 959524 machine instructions, of which 568 -- or 1 in about 1700 -- are FDIV instructions that might exhibit the bug when run on a Pentium. (The source code of the Mathematica kernel involves about a third of a million lines of C and Mathematica language code.)

These FDIV instructions occur not only in strictly numerical code, but also in graphical and symbolic code. In principle, therefore, a wide range of Mathematica operations could be affected. However, since errors produced by the Pentium bug are fairly small, their effect can probably only be noticed in numerical code. In graphics, for example, the proportion of a worst case FDIV error is less than one device pixel per pagewidth on a 300 dpi printer, so even if the FDIV bug does happen to occur, it will probably have no effect. In exact symbolic code, FDIV is typically used only in internal heuristic estimates, and so slight errors in its output should not cause any exact symbolic results returned by Mathematica to be incorrect.

Our Workaround

The basic idea in working around the Pentium's bug is to insert software tests around every use of FDIV and other buggy Pentium instructions. The tests for example quickly check whether the result of a division has the potential of being wrong. If such a test fails, then a modified division computation must be done: for example one obtained by multiplying both numerator and denominator by 15/16, which guarantees that the the new division computation will be done correctly.

The primary problem in implementing a workaround is successfully to modify every single FDIV instruction.

The way the software engineering of Mathematica is done, the underlying source code for the system is written in an extension of C (as well as in the Mathematica language itself). This extension of C supports operator overloading, so it was quite straightforward for us to compile Mathematica with every (double)/(double) computation and so on replaced by an appropriate macro.

However, this approach does not actually fix all occurrences of FDIV. The problem is that out of the 568 occurrences of FDIV in the Windows Mathematica kernel executable, only 558 come from our C source -- the remainder are imported from standard runtime libraries.

To get a complete fix for the Pentium bug, therefore, one must operate not at the source code level, but instead at the level of machine code.

A machine code fix is also a more general one: it can potentially apply to any program, or at least any program with a certain format of executable. As a result, the work that we are doing to make a fix for Mathematica may turn out to be applicable to a wide range of other programs.

The following is the strategy that we are adopting. First, we scan the Mathematica executable to find where each FDIV instruction is. The main difficulty encountered here is that Pentium instructions are not of fixed length, so one needs an explicit table of possible opcodes and modifiers to be able to do disassembly.

Having found each FDIV instruction, the ultimate goal is to replace it with a sequence of other instructions. However, this cannot be done directly because the sequence of instructions would be a different length than the original FDIV instruction.

So what we do instead is to replace the byte corresponding to the FDIV opcode by a byte corresponding to an illegal instruction. The point of doing this slightly strange thing is that when the processor encounters the illegal instruction, it generates an exception.

We can then install an exception handler which executes the code we want whenever the exception is generated. The exception handler must emulate the FDIV instruction, but with extra tests. After the handler has done its job, it modifies the program counter so as to resume execution on the byte that follows the FDIV instruction.

There are many tricky details in implementing this approach. Among them is that there are really 8 different kinds of FDIV instructions, most with several register and data access modes, each of which must be handled in a slightly different way.

Another issue is where the exception handlers should be installed. If one does not know the structure of the program one is working with, then one needs to use code akin to a virus to install such handlers. But in working with a specific program such as Mathematica, definite places are known where the handlers can be installed.

The result of all this is that we have managed to produce a patching program that will produce a copy of Mathematica that works around the Pentium bug. The patching program and instructions on its use are given in MathSource item 0207-335.

Phenomenology of the Bug

Double precision floating point numbers on the Pentium are 64 binary bits long, storing 52 bits after the leading one of the mantissa.

Particularly with its arbitrary-precision arithmetic capabilities, Mathematica is set up to shield the user from internal details of the representation of numbers. However, it is possible to get this representation, and work with it. Indeed, a large fraction of the R&D groups around the world who develop algorithms for floating-point hardware use Mathematica in this way.

The following Mathematica functions convert to and from bit sequences of the mantissas of machine-precision numbers:

In[1]:=

ToBits[x_] := Part[$NumberBits[N[x]], 3]

FromBits[list_] :=
N[2 Fold[(2 #1 + #2)&, 0, list]/2^Length[list], 16]

To see a manifestation of the Pentium bug, try the following:

In[2]:=

u = FromBits[{1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1}]

Evaluate[1/u] u - 1

The Pentium bug does not just affect this specific number; as shown in Picture #1 above, it also affects all other numbers in the range {1.499999966503765, 1.499999966561496}, that is, numbers with bit sequences between {1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0}, and {1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,1,1,0,0,1,1,1,1,1}.

In each case, the value of the reciprocal obtained is wrong starting at the 27th bit.

In general, the Pentium bug affects only numbers whose bit sequences begin with {1,0,0,0,1}, {1,0,1,0,0}, {1,0,1,1,1}, {1,1,0,1,0} and {1,1,1,0,1}, and are then followed by many consecutive ones.

There is no simple characterization of what bit patterns yield errors. However, it appears that errors always occur in contiguous blocks, and not, for example on fractal-like sets.

Typically the blocks seem to come in clusters. Within each block the error is at a fixed bit position; the bit position varies, however, from one block to another.

Mathematica Analysis of the Bug

The following Mathematica program simulates exactly the algorithm used by the Pentium, bit by bit. The lookup table for the algorithm is stored in the array q, generated by:

In[1]:=

q = Table[Which[ 64 > n >= 2/3 (d + 1), 2, 64 > n >= 1/6 (d + 1), 1, 64 <= n <= 126 - 2/3 (d + 1), -2, 64 <= n <= 126 - 1/6 (d + 1), -1, True, 0], {n, 0, 127}, {d, 16, 31}] ;

The buggy values in the table are given by:

In[2]:=

q[[24,2]] = q[[28,5]] = q[[32,8]] = q[[36,11]] = q[[40, 14]] = 0 ;

The reason that the Pentium bug is so rare is that these particular entries in the lookup table are only very rarely accessed.

Next we define some simple operations to handle numbers represented as a list of bits:

In[3]:=

LeftShift[{_, bits__}] := {bits, 0}

AddOne[{bits__, n_}] := {bits, n+1}

CarrySave[bits_] := Mod[bits, 2] + LeftShift[Quotient[bits, 2]]

BitTimes[bits_List, 0] := 0 bits
BitTimes[bits_List, 1] := bits
BitTimes[bits_List, 2] := LeftShift[bits]
BitTimes[bits_List, -1] := AddOne[1 - bits]
BitTimes[bits_List, -2] := AddOne[1 - LeftShift[bits]]

The idea of so-called carry-save representation, which is used by this algorithm, is that by allowing digits of the binary number to be 0, 1, or 2 instead of just 0 or 1, one can avoid having to do arbitrary-length carries.

The CarrySave function essentially performs just one step of carrying.

The following function actually performs the long division, using the table to look up the correct digits for the quotient. (The possible digits -2, -1, 0, 1, and 2 are used to generate a base 4 quotient.)

In[4]:=

BitFDIV[num_List, den_List, steps_Integer:34] := Module[{n, d, a}, n = Join[{0,0,0}, num]; d = Join[{0,0,0}, den]; FixedPoint[CarrySave, Flatten[ Table[a = q[[1 + Mod[Take[n, 7] . (2^Range[6,0,-1]), 128], 1 + Take[d, {5, 8}] . {8,4,2,1}]]; n = LeftShift[LeftShift[CarrySave[n + BitTimes[d, -a]]]]; {a, 0}, {steps}]]]] /; Length[num] == Length[den]

If a bad table entry is fetched, it is never possible to recover. Starting at whatever bit corresponds to the bad table entry, the answer will be wrong.

Finally, this function lets you divide two numbers as a buggy Pentium would:

In[5]:=

PentiumDivide[n_, d_] := FromBits[BitFDIV[ToBits[n], ToBits[d]]] * 2 ^ (Last[RealDigits[N[n], 2]] - Last[RealDigits[N[d], 2]])

So for example you can try

In[6]:=

PentiumDivide[1, 3221224323]

to see what a buggy Pentium would give for that division. Since this is a simulation, you can try this with any version of Mathematica; you do not need an actual buggy Pentium chip.

For example, try

In[7]:=

PentiumDivide[5244795, 3932159] * 3932159

and you will see that the answer differs from 5244795 by quite a bit.

However, if you try several numbers at random, you will see that the division is almost always correct.

For those who prefer a mathematical expression of the algorithm, the following is a Mathematica program which mathematically implements the kind of division algorithm used in the Pentium processor. This algorithm is not too different from traditional long division, but each successive digit in the result is "guessed" by checking the lookup table based on the first few bits of each number.

In[8]:=

FDIV[x_Real, y_Real] :=
Module[{xm, xe, ym, ye, zm, strip, q},
{xm, xe} = MantissaExponent[x, 2];
{ym, ye} = MantissaExponent[y, 2];
xm *= 2; xe--;
ym *= 2; ye--;
If[Abs[xm] < ym, xm *= 2; xe--];
strip = q[[Floor[(ym-1)*16]+1]];
zm = Table[q = Sign[xm] strip[[Floor[8 Abs[xm]]+1]];
xm = 4(xm - q ym);
q, {27}];
N[Fold[(4 #1 + #2)&, 0, zm] 2^(xe-ye-52)]
]

A correct lookup table for this algorithm is given by:

In[9]:=

q = Table[Min[2, Round[(n/8)/(1+d/16)]], {d, 0, 15},
{n, 0, Ceiling[64/3 (1+(d+1)/16)]}]

The buggy values are as given for the bitwise algorithm above.

In a particular division computation, the table entries accessed all come from the same strip of qtab, since the denominator remains fixed during the division. Only those denominators that correspond to the five strips with table errors can therefore lead to incorrect results.

How Common Is the Bug?

There are a total of 2^52 or nearly five million billion distinct machine numbers between 1 and 2 available with the Pentium architecture.

If all possible machine numbers between 1 and 2 occurred in practice with equal probability, then the frequency of the Pentium bug would be around once in nine billion floating point divisions -- low enough to be of little real concern. This is the argument initially made by Intel for why the Pentium bug should be of little concern.

But whether this simple frequency estimate is correct depends on whether all machine numbers really do occur with equal probability.

And if it so happened that machine numbers which produced the Pentium bug were more common in practical calculations, then the Pentium bug could potentially be much more significant.

Traditional mathematics is heavily invested in the idea of a continuum of real numbers, and tends to ignore issues about the representation of numbers in terms of digits. Indeed, for the purposes of traditional mathematics, little distinction is made between a number like 0.5 with a very simple digit sequence, and a number like 0.547173189971261908462471 with a seemingly random digit sequence.

From this point of view, it seems almost inconceivable that certain patterns of digit sequences would occur more often than others, and as a result, it seems reasonable to conclude that the simple assumption of equal probabilities should be correct, and therefore that the Pentium bug should be of little significance.

There is however a possible flaw in this argument, and the flaw has to do with how numbers that appear in calculations are generated.

The point is that it is fairly unusual for a user to type in, say, a 15-digit number with a seemingly random digit sequence. Much more common is to start off with "simpler" numbers, and then to generate other numbers through calculations. The issue then is what kinds of digit sequences typically get generated by calculations.

A crucial observation is that rational numbers obtained by division always have either terminating or periodic digit sequences.

Thus, for example 1/3 has digit sequence 0.33333333... in base 10 and 0.0101010101... in base 2.

Numbers that have terminating digit sequences in one base can have repetitive digit sequences in another base. Thus, for example 0.8 in base 10 is 0.110011001100... in base 2.

If one starts with "simple" numbers in base 10, and then does elementary arithmetic operations, the numbers that one ends up with will thus tend to have periodic digit sequences -- which are certainly not completely random.

As a result, it might be that numbers obtained in this way would be more likely to produce the Pentium bug, as we shall discuss below.

The situation is rather different, however, as soon as one goes beyond elementary arithmetic operations. The key point is that in evaluating Sqrt[2], for example, one immediately gets a number whose digit sequence seems for practical purposes random.

Indeed, other than rational numbers, no numbers that arise from short mathematical descriptions (such as square roots, logarithms, trig functions, etc.) seem to yield digit sequences which deviate in any significant way from complete randomness.

In a typical scientific computation, therefore, one may start with "simple" numbers that have simple digit sequences, but one will quickly generate numbers that have seemingly random digit sequences. And once one has such numbers, the chance of running into the Pentium bug can be estimated by the probability arguments used by Intel, and is very small.

As a result, it is probably fair to conclude that the Pentium bug is extremely unlikely to occur in scientific computations.

But what about computations that involve only elementary arithmetic operations?

Evaluating 1/3221224323.0 will exhibit the Pentium bug. But in specifying this computation, we have to give explicitly all 10 digits in the number that appears in the denominator.

Is there however some computation that involves simpler numbers and fewer keystrokes that produces the Pentium bug? We have searched for expressions with smaller numbers of keystrokes that yield the bug, but so far we have failed to find any.

Relations to Other Studies and Statements

The original Intel study estimated probabilities on the assumption that all digit sequences would occur equally often. We believe that for scientific computations, the digit sequences that occur will tend to be quite random, and therefore are well modelled by an assumption of equal probabilities.

The IBM study pointed out that numbers which are generated in simple decimal computations yield leading digits which are more likely to correspond to the leading digits of numbers which produce bugs. However, the study failed to show how the rather intricate patterns of digits beyond the first 5 digits that yield the bug could be produced by simple computations. We have not seen any evidence that it is possible to generate them with simple computations.

Vaughan Pratt's postings on this topic point (through empirical testing) to "bruised" integers as being particularly likely culprits for exposing the FDIV bug, where a bruised integer is one that has had a very small amount (for example a random value of less than say one millionth) subtracted from it. He points out that these numbers may occur with abnormally high frequencies for two reasons: The first is that you can get a bruised number by subtracting two numbers such as 10.2 - 3.2, and the second is that your calculations may naturally tend to involve such numbers. For the former, he points out that even if the Pentium bug does occur for numerators and divisors such as these, the relative error will be negligibly small in such cases. For the latter, we are conducting further research is to see whether results like his apply to Mathematica calculations. (Preliminary tests have failed to find any examples, but we have not done enough testing to come to a conclusive result.) The main point here is that Mathematica generally performs division by multiplying by the reciprocal of the number. This avoids all instances mentioned by Pratt as being unlucky for dividing, since his numerators always have an odd factor of at least 3, whereas a reciprocal always has numerator 1. But if results similar to his turn out to apply to taking reciprocals, then this could present a problem for Mathematica users doing calculations involving sequences that slowly diverge from or approach exact integer or rational values, if having a slight error at some point would throw the calculation off. (When deciding whether an error will affect your algorithm, it is useful to know that Pratt's results seem to indicate that the Pentium's relative error will not be more than about two orders of magnitude greater than the amount of difference from the nearby integer, should the Pentium bug manifest itself.) If any users are doing calculations like this on a Pentium, they should keep an eye out for manifestations of this bug, and if they see any instances of it they should send simple code demonstrating the problem to research@wri.com.

As far as fixes for the bug are concerned, the most widely quoted seems to be a fragment of C code (and its derivatives) posted by MathWorks. The main problem is that just patching C source code does not fix occurrences of FDIV in runtime libraries. To fix these requires the kind of binary patch discussed above.

About This Document

This document was produced by part of the numerical analysis group at Wolfram Research. To give feedback, or to get more information about its contents, contact research@wri.com. For more information on Mathematica and on Wolfram Research products and services, contact info@wri.com.

This document, together with illustrations, is accessible on the Web at http://www.wri.com/.

Cite this as: Wolfram Research, Inc., "A Discussion of and Fix for the Pentium FDIV Bug" from the Notebook Archive (2002), https://notebookarchive.org/2018-10-10r833w