by Jerry D. Cavin, Senior Software Engineer
How many times have you coded an equation and the results produced are not what you expected? Then you rearrange the equation and get completely different results! If the computer is calculating correctly, why does it produce incorrect answers? Could it be the ‘Ghost in the Machine?’
Gilbert Ryle’s notion of the ‘Ghost in the Machine’ was introduced in his book, The Concept of Mind, a critique of Rene Descartes’ discussion of the relationship between the mind and the body (the mind-body dualism). The expression has been widely used in the context of a computer’s tendency to make unexplainable numerical errors. Are the errors the fault of the human, or is it the fault of the ‘Ghost in the Machine?’ To gain insight into these mathematical errors made by computers let us examine how the ‘mind’ of the computer views our concept of numbers.
The ‘mind’ of the computer perceives finite number sets. The size of the number set depends on the size of the space where the number is stored. If the number is stored in 8 bits, the number set starts with 0 and ends at 255. One of the bits can be reserved to represent the sign of the value (i.e. + or -). In this case, the numbers that exist are from -128 to 127. Numbers can also be stored in a larger space, but that does not change the fact the ‘mind’ of the computer still perceives finite number sets. So what happens when 1 is added to the maximum value in a finite integer set? It wraps around to the smallest negative value in the set. In our previous example for 8-bit values, when the computer adds 1 to 127 it would give the answer of -128. This overflow, or wrap around error, can occur if you are adding, subtracting, multiplying or dividing. Integer overflow errors are dangerous because they cannot be detected after it has happened. The ISO C99 standard states an integer overflow causes ‘Undefined Behavior.’ This allows compiler manufacturers, conforming to the standard, to implement anything from completely ignoring the overflow to causing the program to abort. Most compilers totally ignore the overflow, resulting in erroneous results being calculated. Depending upon how the integer variable is used, the overflow can cause a multitude of serious problems. If an overflow occurs with a loop index variable, it may cause an infinite loop. If an overflow occurs with a table index variable, the buffer overflow may cause data corruption. Another strange condition can arise with integers that we do not see in the human world. Under some circumstances an equation may give an answer of +0 or it may give an answer of -0. These two values of 0 are the same, but it may cause some confusion. This is a problem to many older compilers; some compilers detect this condition and simply change the answer to +0. Newer compilers use two’s complement mathematical operations and never encounter this issue.
What does your computer do?
FLOATING POINT NUMBERS
Another number representation used by the computer is floating point. The ‘mind’ of the computer views the floating point number, 1.234567, as two numbers — a number for the mantissa (1234567), and a number for the exponent (-6). Together in a format we mere humans understand, the number looks like 1234567.0 x 10-6. Like integers, the ‘mind’ of the computer perceives only a finite number of floating point numbers because the numbers are limited to the storage space in the computer. The computer does not recognize there are an infinite number of continuous values between any two floating point numbers. Like integer values, the ‘mind’ of the computer only understands a distinct and finite set of floating point numbers bounded by the amount of space in which they are stored.
ACCURACY AND PRECISION
Before we can understand the many dangers of floating point numbers we must discuss how the ‘mind’ of the computer perceives accuracy and precision. Humans look at accuracy as how close a measurement is to its actual real-world value. When we make repeated measurements precision is a measure of how consistently we can make that measurement. The ‘mind’ of the computer has the same definition for accuracy; it is defined as how close a floating point number is to its real world equivalent. But precision is different. To the ‘mind’ of the computer precision refers to the number of bits used to store a number or to make a calculation. To place a very large or a very small floating point number into storage we must deliberately lose precision. For example, let’s use our national debt. Our current national debt (at the time this writing) is 16.37116855964779 trillion dollars. The ‘mind’ of the computer does not have enough bits to store the entire number as a 32-bit floating point variable. Having low precision (number of bits) to store the numbers causes small errors to occur in the accuracy of the numbers. As we perform more and more operations with floating point numbers that contain a precision error, the errors accumulate causing the computer to end up with very wrong results. (Note: the examples throughout this paper were implemented in ANSI C using the Tiny C Compiler available at http://bellard.org/tcc/).
Another way we humans view numbers in our world is by the use of fractions. To enter these fractions into a format the computer can understand we convert fractional numbers into floating point numbers so that the ‘mind’ of the computer can use them. Since grade school, we have all used fraction such as 1/2 and 1/3. We can convert very few of the fractions without error: 1/2 becomes 0.5 and 1/4 becomes 0.25. But many other fractions require a floating point number with an infinite number of repeating digits to represent the complete value, such as 1/3 = 0.33333333…; or 1/9 = 0.11111111… These fractions cannot be accurately represented. For these fractional values the ‘mind’ of the computer only provides an approximation. For example, code for the 32-bit floating point number of the fraction 1/3, provides a value of 0.333333343267440800. If this value is multiplied repeatedly in an equation, the error will be compounded significantly. This implies that if a calculation is made with a fraction that is only an approximation, the answers provided by the ‘mind’ of the computer, at best, will only be approximately correct.
ADDING NUMBERS WITH DIFFERENT MAGNITUDE
The floating point numbers are made up of a mantissa and an exponent, each of finite range. Like humans, the ‘mind’ of the computer can add and subtract floating point numbers if, and only if, they have the same exponent. Before the addition or subtraction can occur, the computer must force the two numbers to have the same exponent. By convention, and by necessity in many cases, when the difference in magnitude is small, it doesn’t matter. But when a large difference occurs, it may not even be possible to ‘adjust’ the larger value where the resulting magnitude exceeds the allowed range. This results in a greater error being introduced into the result because of the loss of precision. An example is when the floating point number 81230000.0 is added to the floating point number 0.525. The result provided by the computer is 81230000.0. The smaller addend has completely disappeared.
SUBTRACTING NUMBERS WITH THE SAME MAGNITUDE
When the computer subtracts two nearly equal floating point numbers (therefore with the same exponent) the results will also be precise. However, if the two nearly equal floating point numbers have exponents, they have a precision error of less than half of the least bit. The difference will have an error of, at most, twice the absolute value. As the magnitude of the exponent increases, the precision error also dramatically increases. There is nothing that can be done to lessen the effect of this loss of precision. The human mind and the ‘mind’ of the computer are equally susceptible to this inherent problem of the arithmetic subtraction process. An example is when the floating point number 81232040.0 is subtracted from the floating point number 81232045.0. Most grade school children would answer 5.0; but in this case, the computer’s answer is 8.0.
REPEATING OPERATIONS MANY TIMES
Scientists often create programs that carry out computations over millions or billions of iterations. Astronomers may attempt to determine the second-by-second path of asteroids years into the future. Cosmologists attempt to simulate the universe over its 13.75 billion-year lifespan to understand the distribution of matter across the cosmos. What these projects have in common is that they carry out massive computations that iterate over and over again. If one of these calculations has a result with the slightest precision error, after millions or billions of iterations the error will accumulate resulting in an answer that will be worthless.
What can we do?
Precision errors caused by the ‘mind’ of the computer are nearly impossible to detect with software. There are still a few steps that can be used to prevent these errors from happening:
– REWRITE THE OPERATIONS that cause OVERFLOW and UNDERFLOW. If your compiler cannot detect these error conditions, rewrite addition, subtraction, multiplication, negation and division to check the processor’s hardware status bits to provide an indicator when an error occurs. This will only protect you from some errors.
– REWRITE THE ALGORITHM. By rewriting the numerical algorithms where the error occurs, you can avoid inaccurate results. (Writing stable numerical algorithms is a very difficult task).
– STOP USING FLOATING POINT NUMBERS. The U.S. Government has strict regulations against financial institutions from using floating point numbers for monetary values because of these problems. It may be appropriate to avoid floating point numbers in cases where life is at risk or severe financial loss is possible. Make an effort to understand the dangerous nature of using floating point numbers. But if you must…
- Use the highest available floating point precision available. Calculations with larger floating point numbers are less likely to be affected by errors.
- Do not code the problem yourself. Writing stable numerical algorithms is not a trivial task. Use a high precision mathematics library to code your equations to ensure you will compute high precision results. There are many well-known packages available such as BLAS, GMP, LinTomMath, TTMath and LAPACK to name a few.
- Test the calculations with problems to which you know the EXACT answer. Compare your EXACT answer with the answer your program’s calculations provide.
- Be very cautious about calculating the difference of two very similar floating point numbers and using their result in a subsequent calculation.
- Be very cautious about adding two floating point numbers of very different magnitudes.
- Be very cautious about repeating a slightly inaccurate computation many times.
I hope this article has convinced you that the ‘mind’ of the computer is very different from the human mind and the world of numbers the computer perceives is alien to our own. But we must be aware of the inner workings of the computer’s ‘mind’ if we are to successfully communicate. Recognizing this difference is a large step for many programmers. But it is one that must be made if we are to attain accurate numerical answers from a computer.