The article seems a bit backward to me. The reason they use 3.141..793 is becaus...

ojnabieoot · on Sept 29, 2020

I think you are misreading the article, or at least taking an oddly limited "developer-only" view instead of considering the audience. NASA is not trying to answer your question about how many digits of pi they use. They are trying to answer a non-technical person on Facebook (likely a kid) who is wondering, vaguely, if NASA needs to use a highly precise representation of pi to fly spaceships, or if something coarse will work.

The answer the kid is looking for: NASA's most precise representation of pi is more precise than 3.14, but nowhere close to 500 digits like the question suggested. 15 is more than enough for most engineers at NASA, and anything that an astronomer might conceptually want to do would take at most 40 digits of pi to do with almost arbitrary precision. The fact that the current representation is architecturally convenient for modern FPUs is basically immaterial to the person's question, even if that's interesting for people with detailed knowledge about such things.

Dylan16807 · on Sept 29, 2020

They're still only answering half the question. Okay, 15 digits is more than enough. But how much more? What would the minimum be, and why are we using more than that?

Ideally the article would talk about what "more than enough" looks like (which it does), but also what minimums look like (which it doesn't), and then mention that they chose that specific size because most computers do two specific sizes really fast and that's the more accurate of the two.

azhenley · on Sept 28, 2020

I wish there were more details about the historical context in this.

I recently went down a rabbit hole of trying to implement a cosine function from scratch and found that for most applications where I use cosine (low-resolution graphs or 2d games), I need a shockingly low level of precision. Even four decimals was overkill!

For those that are interested, you can read about my adventures with precision and cosine: https://web.eecs.utk.edu/~azh/blog/cosine.html

romwell · on Sept 29, 2020

If you are really interested in approximating a cosine on the cheap with high precision, you should look into approximating polynomials.

The Taylor expansion produces approximating polynomials that aren't that good.

For instance, if you were to ask which degree-4 polynomial best approximates cos(x)?, you wouldn't end up with 1-x^2/2 + x^4/24.

In fact, this polynomial is 0.99958 - 0.496393 x^2 + 0.0372093 x^4; it pretty much coincides[2] with cos(x) on the interval (-π/2, π/2); the error is an order of magnitude smaller than with the Taylor polynomial (see [3] vs [4]).

How to do this? Linear algbera[1].

See, the polynomials form a Hilbert space (a vector space with an inner product), where a decent choice of one is

    <f(x), g(x>) := \int_π/2^π/2 f(x)g(x) dx

This is Do Gramm-Schmidt on 1, x, x^2, ...; obtain an orthonormal basis, and use the inner product to compute a projection on the first d elements to obtain a best degree-d polynomial approximation. Voila!

Linear Algebra saves the day yet again.

[1]https://www.math.tamu.edu/~yvorobet/MATH304-503/Lect4-04web....

[2] https://www.wolframalpha.com/input/?i=plot+0.0372093+x%5E4+-...

[3]https://www.wolframalpha.com/input/?i=plot+abs%281-x%5E2%2F2...

[4]https://www.wolframalpha.com/input/?i=plot+abs%281-x%5E2%2F2...

jacobolus · on Sept 29, 2020

Finding the least-squares best polynomial doesn’t get you the minimum possible worst-case error though, or the minimum worst-case relative error. For those you can use the Remez exchange algorithm, https://en.wikipedia.org/wiki/Remez_algorithm

And if you look at the NASA low-precision number routines, you’ll see that’s exactly what they did.

fractionalhare · on Sept 29, 2020

It is important to note two things, though, for those following along:

1. Approximations using orthogonal polynomial bases (the least squares method, and more generally the Chebyshev methods) are, for transcendental functions, typically as accurate as Remez exchange (including with range reduction) to 16 or so digits of precision. Remez exchange is more difficult to implement correctly than the simple linear algebra methods, the latter of which rely "only" on doing a comparatively simple change of basis. Practically speaking you gain nothing from using Remez exchange instead of Chebyshev to approximate exp(x), for example.

2. The Remez exchange (and more generally, the MiniMax methods) does not guarantee anything beyond minimizing the worst case error. For many practical applications you don't care about worst case error, you care about average case relative error. It's not uncommon for the Remez exchange to produce polynomials which actually have worse average case relative error.

This is also covered in considerable depth in Trefethen's Approximation Theory and Approximation Practice.

jacobolus · on Sept 29, 2020

Here’s a relevant example: http://www.chebfun.org/examples/approx/EightShades.html

Here are the first 6 chapters of ATAP http://www.chebfun.org/ATAP/atap-first6chapters.pdf

Also cf. https://people.maths.ox.ac.uk/trefethen/mythspaper.pdf

* * *

I’d say the bigger point is not about whether the error is marginally better or worse with a Chebyshev series vs. CF vs. Remez, etc., but rather that any of these will usually beat the others if you add just one more coefficient.

Usually convergence speed matters more than picking a number of coefficients and then exactly optimizing to the last bit.

And as you say, running the Remez algorithm is probably waste of time if you are trying to calculate a near-optimal approximation on the fly at runtime.

romwell · on Sept 29, 2020

Sure, and thanks for the link!

The least-squares solution is just a step-up from Taylor series that doesn't require anything deeper than the notion of an inner product (the parent comment I was responding to didn't go beyond Taylor).

jacobolus · on Sept 29, 2020

Here was Apollo’s sine/cosine code, a 5th degree polynomial which clearly had its coefficients optimized for minimax relative error (presumably by the Remez algorithm):

https://fermatslibrary.com/s/apollo-11-implementation-of-tri...

Here’s a comparison between its relative error vs. a 5th degree Taylor series (if you use high precision arithmetic; I’m sure on the machine itself rounding errors made things a bit choppier). In the worst case for the Taylor series the relative error is about 45 times lower in the optimized polynomial:

https://www.desmos.com/calculator/aelh9kf9ms

jart · on Sept 29, 2020

If anyone wants to do the thing romwell describes without an avalanche of math, try this one-liner:

    bash$ gnuplot -e 'set format x "%.15f"; set format y "%.15f"; set samples 100000; set table "cos.dat"; set xrange [-3*pi/8:3*pi/8]; plot cos(x); f(x) = a + b*x**2 + c*x**4; fit f(x) "cos.dat" using 1:2 via a,b,c'

It then gives you "optimized" values for a/b/c from your template formula, thus resulting in

    cos(x) ~= 0.999923 + -.498826*x*x + .0391071*x*x*x*x

Which is even more accurate but only to about four decimal places.

mooman219 · on Sept 28, 2020

Here's a java snippit that may be of interest to you [0] by a user called Riven [1]. It should be considerably faster than the lookup table with LERP (not that it really matters at this point since we're just counting nanoseconds on one hand). I recall going down this rabbit hole somewhere around High School as well five or so years ago, and ended up brainstorming potential faster implementations on an old java forum with several users. I believe you have a Intel-Core-i7-8559U, which if userbenchmark is to be trusted, leads me to believe the snippit I linked should be in the 3ns range assuming a warm cache for the lut. Accuracy is configurable based on the sin bits.

[0]: https://gist.github.com/mooman219/9698a531c932c08ccefd86ba7c...

[1]: https://jvm-gaming.org/u/riven/summary

jandrese · on Sept 28, 2020

In games it is very common to see lookup tables for trig functions. Not big ones either, maybe a few hundred entries.

richrichardsson · on Sept 28, 2020

Audio plugin programming makes extensive use of lookup tables for expensive operations. Pre-calculating is a useful technique for such real-time sensitive work (like games that have to render a new frame every 16ms @ 60hz or audio plugins that need to return potentially a buffer every 0.72ms @ 32 samples/44.1kHz).

TwoBit · on Sept 29, 2020

Calculators actually use lookup tables themselves. I don't know if FPUs do too, but I wouldn't be surprised, as it's faster than series expansion.

jacobolus · on Sept 29, 2020

Handheld calculators generally use the CORDIC algorithm, https://en.wikipedia.org/wiki/CORDIC

ThePadawan · on Sept 30, 2020

Using the rough equivalence of sin(x) ~= x also works shockingly well for smallish x.

jacobolus · on Sept 29, 2020

If you have access to modern processors (CPUs, GPUs) then a lookup table makes no sense. The polynomial is faster, more accurate, and needs less space to store precomputed values.

BeetleB · on Sept 29, 2020

When you peek at the code for computing trig functions in most standard libraries (e.g. C Standard math library in the GNU C compiler), you'll see they typically use a lookup table somewhere in the calculation.

As an example, the LUT will get you in the ballpark of the answer, and then you compute a polynomial to calculate the delta and add the two together.

You can always find a polynomial that is extremely accurate, but it likely will be higher order, etc. A LUT + polynomial is faster. A pure LUT is the fastest, but takes too much memory.

taeric · on Sept 29, 2020

I wouldn't be shocked if lookup tables win on massively it of order CPUs. Of course, I also wouldn't be surprised if it is the it of order nature that makes the polynomial faster.

Would be interesting to see benchmarks. On to my list of things I have a low chance of completing...

exged · on Sept 29, 2020

Once you take SIMD into account, lookup tables frequently lose. Vector gathers are not cheap at all.

MaxBarraclough · on Sept 28, 2020

Is there a well-optimised library out there for extremely fast low-precision trigonometric functions? Could it make practical sense to do this?

jlarcombe · on Sept 29, 2020

Yeah, the classic 3D game Elite (1984) uses the small angle approximation throughout, no look-up tables at all (except for drawing the planets).

https://www.bbcelite.com/explore/articles/deep_dive_pitching...

kleiba · on Sept 29, 2020

This seems unlikely - if a specific number of bits in a standardized floating point representation would imply relevant calculation errors, Nasa would certainly not use it.

Sure, if 24 bits were already sufficient, they would not go out of their way to avoid the extra 8 bits. So in that sense you're right of course. But it's not just "Hey, single precision floats are 32 bits, so why don't we just use that!"

ivanbakel · on Sept 28, 2020

The article doesn't try to justify the exact number of decimal places - and, by eye, the arguments it uses are likely to work to 14 decimal places as well, since the error would be similarly small.

Instead, it tries to answer the thrust of the prompt question: given the massive numbers used in spaceflight, is pi calculated to the greatest possible practical accuracy? Going over the history of the 15-digit version would divert from the interesting part of the article (the effect of precision on calculation) and dilute a nice teachable moment.

Though that fact about the Apollo Computer would make an interesting part of a follow-up.

kens · on Sept 28, 2020

The Apollo Guidance Computer wasn't 15 decimal places; it was 15 bits. The point is that they didn't use power-of-two word sizes back then; they used whatever word size fit the problem, even though that now seems bizarre.

ivanbakel · on Sept 28, 2020

But the number in the article is to 15 decimal places. Pointing out that that precision comes from the size of the modern double-precision floating point representation doesn't really answer why that representation is enough.

CydeWeys · on Sept 28, 2020

It's a fair point. They would use more decimal places than that if it were necessary, regardless of whatever a double-precision floating point does. Since it's not necessary, the double is adequate (and already widely available by default across programming languages and system architectures).

olliej · on Sept 29, 2020

Double precision from ieee754 is 64bit, of which 53bits are the significand.

X87 uses an 80bit format with functionally a 63 bit significand (they actually use a 64 mantissa, but for that actually doesn’t gain you anything, and adds many terrible edge cases).

They use double precision values because that presumably provides more precision than they need, but 32 bit float would be woefully inadequate.

They could use more (x87 or software float for instance) but there’s not really an any advantage to the increased precision you get, and there are many downsides.

Const-me · on Sept 28, 2020

The coprocessor uses 80-bit floats, that’s more than double precision. It even has a special instruction, fldpi, to get π in the full precision: https://www.felixcloutier.com/x86/fld1:fldl2t:fldl2e:fldpi:f...

olliej · on Sept 29, 2020

It does however completely bollocks up all the subsequent trigonometry functions. Even intel says not to use them.

Also alas, 80bit is a bit of a misnomer - when comparing to the more regular ieee754 representations, it’s actually fp79 :D

jart · on Sept 29, 2020

Intel's Manual says to use FLDPI. The x87 FPU represents pi internally at extended-extended-precision, so it's more like fp82 :D Were you thinking about the argument reduction gotcha? http://galaxy.agh.edu.pl/~amrozek/AK/x87.pdf

olliej · on Sept 30, 2020

Fldpi only loads the register, rounding according to the current rounding mode, so you still only have 63 bits of precision when you load the value.

For intermediate calculations it incorporates the internal extended pi via the GRS bits.

And yeah the train wreck of argument reduction was my reference