Standards
Sun engineer's response to 'How Sun can pull out of its slump'
Consistency is key
Feb. 24, 2003 12:00 AM
Dear Mr. Murphy,
We appreciate the high estimation in which you hold Sun Microsystems, as expressed in your LinuxWorld article of 1/29/03. We would like to address a small part of the article in which you call attention to what you see as a problem with SPARC and scientific computing, quoted below; in fact, although there is something to be learned from your square-root summation example, it is not what you conclude. More follows the quotation:
- The actual problem is that SPARC, like other RISC CPUs, defaults to the IEEE 754 64-bit floating-point specification. Therefore, it doesn't get the right answers in cases requiring extensive iterative computation. If, for example, you get it to sum the square roots of the first billion positive integers using the GCC compiler and default libraries, the result is off by 0.18.
Although completely meaningless in business processing, that error is devastating if you're trying to simulate the three-body problem over a long period. You can get around this by using extended precision libraries, but that cuts performance significantly.
- The self-inflicted version of this problem is that SPARC has an answer to floating-point performance problems in its SIMD instruction set, but Sun doesn't go out of its way to tell anyone about it and hasn't given the research community the support needed to motivate its use. The main problem is that researchers tend to develop for the box on their desk, and these days that's more likely to be an Intel box running FreeBSD or Linux than a SPARC. As a result, when they do run that code on a SPARC, there's nothing there to take advantage of the short-array capabilities provided in the hardware, and they get neither the performance nor the accuracy that the machine is capable of."
The main points we want to make are:
- The error as measured by the more suitable unit of ulps is not really large and it is further controllable by improving the summation algorithm.
- The SPARC results are uniform across systems while the IA32 results vary from somewhat better to much worse; uniformity is a virtue. Note that the results on other RISC systems match Sun's and also note that high performance scientific computing is still much more practiced on the RISC systems exemplified in the table.
- The variability of IA32 results stems from improper management of extended precision registers vis-a-vis memory precision, a subtlety that vitiates significantly the potential real advantage of extended precision intermediate computation. Note that IA32 does indeed comply with IEEE754; its differences with SPARC or other RISCs is within IEEE754 latitude.
- The example is typical of kernels in the solution of differential equations, like the three-body problem to which you refer, which will overwhelm eventually any finite-bounded precision representation, so better numerical approaches must be used anyway.
- The SIMD instruction set, VIS, available in successively improved versions in UltraSPARC processors, does not as yet implement floating-point arithmetic. Until recently this disadvantage with respect to the IA32 SIMD sets, SSE and SSE2, was not material since the latter was addressing a more egregious floating-point performance issue. VIS is used to accelerate some FP codes on SPARC by doing logical and bit-wise operations on FP registers. Note that SSE and SSE2 implement SPARC-style pure double and float precision FP arithmetic; there's no use of extended precision.
Next is a more detailed discussion. For a good foundation in floating-point computing we recommend the ACM article by David Goldberg "What Every Computer Scientist Should Know about Floating-Point Computing" especially with the addendum by Douglas Priest that addresses IA32. Also of interest might be Joe Darcy's JavaOne talk or even better, but requiring jdc registration, is What Everybody Using the Java Programming Language Should Know About Floating-Point Arithmetic
I) The Code and Compilation
The source code at the heart of the discussion:
------- sqr.c -------
#include
#include
main()
{
register i;
double f=0.0;
double sqrt(); for(i=0;i<=1000000000;i++)
{
f+=sqrt( (double) i);
}
printf("Finished %20.14f\n",f);
}
------- sqr.c -------
Investigators were asked to compile this program with the following command:
cc -o sqr -O sqr.c -lm
and to run the executable as follows:
time ./sqr
II) The Results: Correct versus IA32 versus SPARC
| RUN-Name | Result | Double Prec Rep |
|---|
| 1) Correct | : 21081851083600.37596259382529338 | =0x42b32c803ebb5060 |
| 2) SPARC-quad | : 21081851083600.37500000000000 | =0x42b32c803ebb5060 |
| 3) Correct+1ULP | : 21081851083600.3789 | =0x42b32c803ebb5061 |
| 4) IA32 | : 21081851083600.38281250000000 | =0x42b32c803ebb5062 |
| 5) SPARC double | : 21081851083600.55859375000000 | =0x42b32c803ebb508f |
The table below presents results from several sources/runs in order of accuracy.
Entry number 1 is the result from the Euler-Maclaurin formula as presented by Murphy. The third column of the table gives the IEEE-754 floating-point double-precision hexadecimal representation of the result.
Entry number 3 presents not the result of a run, but rather serves to illustrate the error in the decimal representation when the hexadecimal number varies by 1 bit, or unit in the last place, denoted ULP.
The best computed results are in error by 2 ULPs and are represented by entry number 4.
The SPARC results which are in error by 47 ULPs, are shown as entry number 5.
Entry number 2, shows a result that in decimal representation, is in error by -.00096259382529338. However, observe that the hex representation shows an error of 0 ULPs, an exact answer. The reason for this apparent contradiction is that IEEE-754 double precision represents only 16- or 17-decimal digits. Additional digits are used in rounding when converting from decimal to the double precision format. Entry number 2 presents the double-precision results when the sum is kept in IEEE-754 128-bit format.
I) Magnitude of the Errors
| RUN-Name | Decimal Error | ULP Error | err/correct |
|---|
| SPARC-quad | -0.00096259382529338 | 0 | 0 |
| Correct+1ULP | 0.00293740617470662 | 1 | 1.39333E-16 |
| BEST | 0.00684990617470662 | 2 | 3.24920E-16 |
| SPARC-double | 0.18263115617470662 | 47 | 86.6296E-16 |
Consider now the errors encountered in the survey, based on the table in section II. Consider #1 as perfect. Errors can be considered in terms of decimal representation, and ULPs.
When considering the absolute error alone, SPARC results appear bad. However, considering the ratio error/correct, the variance between results, in fact, is discovered to be very small.
About Gregory V. TarsyGregory V. Tarsy works in the Floating Point and Numerical Computing Group at Sun Microsystems.