Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article seems a bit backward to me. The reason they use 3.141..793 is because that's the value of pi in double precision floating point. And that was set by the Intel 8087 coprocessor. So to me, the real answer is that Intel (and Kahan) decided on the number of decimals in 1980, and NASA uses that because it's the standard.

On the other hand, in the 1960s NASA figured out exactly how many bits of precision they needed to get to the Moon and came up with the 15-bit Apollo Guidance Computer; 14 bits wasn't enough and 16 bits was more than they needed. (The computer supports double precision and triple precision for tasks that needed more.)

The point is that in the 1960s, aerospace systems would carefully consider exactly how many bits they needed and build systems with bizarre numbers of bits like 15 or 27. Now, they use standard systems, which generally have way more accuracy than needed.



I think you are misreading the article, or at least taking an oddly limited "developer-only" view instead of considering the audience. NASA is not trying to answer your question about how many digits of pi they use. They are trying to answer a non-technical person on Facebook (likely a kid) who is wondering, vaguely, if NASA needs to use a highly precise representation of pi to fly spaceships, or if something coarse will work.

The answer the kid is looking for: NASA's most precise representation of pi is more precise than 3.14, but nowhere close to 500 digits like the question suggested. 15 is more than enough for most engineers at NASA, and anything that an astronomer might conceptually want to do would take at most 40 digits of pi to do with almost arbitrary precision. The fact that the current representation is architecturally convenient for modern FPUs is basically immaterial to the person's question, even if that's interesting for people with detailed knowledge about such things.


They're still only answering half the question. Okay, 15 digits is more than enough. But how much more? What would the minimum be, and why are we using more than that?

Ideally the article would talk about what "more than enough" looks like (which it does), but also what minimums look like (which it doesn't), and then mention that they chose that specific size because most computers do two specific sizes really fast and that's the more accurate of the two.


I wish there were more details about the historical context in this.

I recently went down a rabbit hole of trying to implement a cosine function from scratch and found that for most applications where I use cosine (low-resolution graphs or 2d games), I need a shockingly low level of precision. Even four decimals was overkill!

For those that are interested, you can read about my adventures with precision and cosine: https://web.eecs.utk.edu/~azh/blog/cosine.html


If you are really interested in approximating a cosine on the cheap with high precision, you should look into approximating polynomials.

The Taylor expansion produces approximating polynomials that aren't that good.

For instance, if you were to ask which degree-4 polynomial best approximates cos(x)?, you wouldn't end up with 1-x^2/2 + x^4/24.

In fact, this polynomial is 0.99958 - 0.496393 x^2 + 0.0372093 x^4; it pretty much coincides[2] with cos(x) on the interval (-π/2, π/2); the error is an order of magnitude smaller than with the Taylor polynomial (see [3] vs [4]).

How to do this? Linear algbera[1].

See, the polynomials form a Hilbert space (a vector space with an inner product), where a decent choice of one is

    <f(x), g(x>) := \int_π/2^π/2 f(x)g(x) dx 
This is Do Gramm-Schmidt on 1, x, x^2, ...; obtain an orthonormal basis, and use the inner product to compute a projection on the first d elements to obtain a best degree-d polynomial approximation. Voila!

Linear Algebra saves the day yet again.

[1]https://www.math.tamu.edu/~yvorobet/MATH304-503/Lect4-04web....

[2] https://www.wolframalpha.com/input/?i=plot+0.0372093+x%5E4+-...

[3]https://www.wolframalpha.com/input/?i=plot+abs%281-x%5E2%2F2...

[4]https://www.wolframalpha.com/input/?i=plot+abs%281-x%5E2%2F2...


Finding the least-squares best polynomial doesn’t get you the minimum possible worst-case error though, or the minimum worst-case relative error. For those you can use the Remez exchange algorithm, https://en.wikipedia.org/wiki/Remez_algorithm

And if you look at the NASA low-precision number routines, you’ll see that’s exactly what they did.


It is important to note two things, though, for those following along:

1. Approximations using orthogonal polynomial bases (the least squares method, and more generally the Chebyshev methods) are, for transcendental functions, typically as accurate as Remez exchange (including with range reduction) to 16 or so digits of precision. Remez exchange is more difficult to implement correctly than the simple linear algebra methods, the latter of which rely "only" on doing a comparatively simple change of basis. Practically speaking you gain nothing from using Remez exchange instead of Chebyshev to approximate exp(x), for example.

2. The Remez exchange (and more generally, the MiniMax methods) does not guarantee anything beyond minimizing the worst case error. For many practical applications you don't care about worst case error, you care about average case relative error. It's not uncommon for the Remez exchange to produce polynomials which actually have worse average case relative error.

This is also covered in considerable depth in Trefethen's Approximation Theory and Approximation Practice.


Here’s a relevant example: http://www.chebfun.org/examples/approx/EightShades.html

Here are the first 6 chapters of ATAP http://www.chebfun.org/ATAP/atap-first6chapters.pdf

Also cf. https://people.maths.ox.ac.uk/trefethen/mythspaper.pdf

* * *

I’d say the bigger point is not about whether the error is marginally better or worse with a Chebyshev series vs. CF vs. Remez, etc., but rather that any of these will usually beat the others if you add just one more coefficient.

Usually convergence speed matters more than picking a number of coefficients and then exactly optimizing to the last bit.

And as you say, running the Remez algorithm is probably waste of time if you are trying to calculate a near-optimal approximation on the fly at runtime.


Sure, and thanks for the link!

The least-squares solution is just a step-up from Taylor series that doesn't require anything deeper than the notion of an inner product (the parent comment I was responding to didn't go beyond Taylor).


Here was Apollo’s sine/cosine code, a 5th degree polynomial which clearly had its coefficients optimized for minimax relative error (presumably by the Remez algorithm):

https://fermatslibrary.com/s/apollo-11-implementation-of-tri...

Here’s a comparison between its relative error vs. a 5th degree Taylor series (if you use high precision arithmetic; I’m sure on the machine itself rounding errors made things a bit choppier). In the worst case for the Taylor series the relative error is about 45 times lower in the optimized polynomial:

https://www.desmos.com/calculator/aelh9kf9ms


If anyone wants to do the thing romwell describes without an avalanche of math, try this one-liner:

    bash$ gnuplot -e 'set format x "%.15f"; set format y "%.15f"; set samples 100000; set table "cos.dat"; set xrange [-3*pi/8:3*pi/8]; plot cos(x); f(x) = a + b*x**2 + c*x**4; fit f(x) "cos.dat" using 1:2 via a,b,c'
It then gives you "optimized" values for a/b/c from your template formula, thus resulting in

    cos(x) ~= 0.999923 + -.498826*x*x + .0391071*x*x*x*x
Which is even more accurate but only to about four decimal places.


Here's a java snippit that may be of interest to you [0] by a user called Riven [1]. It should be considerably faster than the lookup table with LERP (not that it really matters at this point since we're just counting nanoseconds on one hand). I recall going down this rabbit hole somewhere around High School as well five or so years ago, and ended up brainstorming potential faster implementations on an old java forum with several users. I believe you have a Intel-Core-i7-8559U, which if userbenchmark is to be trusted, leads me to believe the snippit I linked should be in the 3ns range assuming a warm cache for the lut. Accuracy is configurable based on the sin bits.

[0]: https://gist.github.com/mooman219/9698a531c932c08ccefd86ba7c...

[1]: https://jvm-gaming.org/u/riven/summary


In games it is very common to see lookup tables for trig functions. Not big ones either, maybe a few hundred entries.


Audio plugin programming makes extensive use of lookup tables for expensive operations. Pre-calculating is a useful technique for such real-time sensitive work (like games that have to render a new frame every 16ms @ 60hz or audio plugins that need to return potentially a buffer every 0.72ms @ 32 samples/44.1kHz).


Calculators actually use lookup tables themselves. I don't know if FPUs do too, but I wouldn't be surprised, as it's faster than series expansion.


Handheld calculators generally use the CORDIC algorithm, https://en.wikipedia.org/wiki/CORDIC


Using the rough equivalence of sin(x) ~= x also works shockingly well for smallish x.


If you have access to modern processors (CPUs, GPUs) then a lookup table makes no sense. The polynomial is faster, more accurate, and needs less space to store precomputed values.


When you peek at the code for computing trig functions in most standard libraries (e.g. C Standard math library in the GNU C compiler), you'll see they typically use a lookup table somewhere in the calculation.

As an example, the LUT will get you in the ballpark of the answer, and then you compute a polynomial to calculate the delta and add the two together.

You can always find a polynomial that is extremely accurate, but it likely will be higher order, etc. A LUT + polynomial is faster. A pure LUT is the fastest, but takes too much memory.


I wouldn't be shocked if lookup tables win on massively it of order CPUs. Of course, I also wouldn't be surprised if it is the it of order nature that makes the polynomial faster.

Would be interesting to see benchmarks. On to my list of things I have a low chance of completing...


Once you take SIMD into account, lookup tables frequently lose. Vector gathers are not cheap at all.


Is there a well-optimised library out there for extremely fast low-precision trigonometric functions? Could it make practical sense to do this?


Yeah, the classic 3D game Elite (1984) uses the small angle approximation throughout, no look-up tables at all (except for drawing the planets).

https://www.bbcelite.com/explore/articles/deep_dive_pitching...


This seems unlikely - if a specific number of bits in a standardized floating point representation would imply relevant calculation errors, Nasa would certainly not use it.

Sure, if 24 bits were already sufficient, they would not go out of their way to avoid the extra 8 bits. So in that sense you're right of course. But it's not just "Hey, single precision floats are 32 bits, so why don't we just use that!"


The article doesn't try to justify the exact number of decimal places - and, by eye, the arguments it uses are likely to work to 14 decimal places as well, since the error would be similarly small.

Instead, it tries to answer the thrust of the prompt question: given the massive numbers used in spaceflight, is pi calculated to the greatest possible practical accuracy? Going over the history of the 15-digit version would divert from the interesting part of the article (the effect of precision on calculation) and dilute a nice teachable moment.

Though that fact about the Apollo Computer would make an interesting part of a follow-up.


The Apollo Guidance Computer wasn't 15 decimal places; it was 15 bits. The point is that they didn't use power-of-two word sizes back then; they used whatever word size fit the problem, even though that now seems bizarre.


But the number in the article is to 15 decimal places. Pointing out that that precision comes from the size of the modern double-precision floating point representation doesn't really answer why that representation is enough.


It's a fair point. They would use more decimal places than that if it were necessary, regardless of whatever a double-precision floating point does. Since it's not necessary, the double is adequate (and already widely available by default across programming languages and system architectures).


Double precision from ieee754 is 64bit, of which 53bits are the significand.

X87 uses an 80bit format with functionally a 63 bit significand (they actually use a 64 mantissa, but for that actually doesn’t gain you anything, and adds many terrible edge cases).

They use double precision values because that presumably provides more precision than they need, but 32 bit float would be woefully inadequate.

They could use more (x87 or software float for instance) but there’s not really an any advantage to the increased precision you get, and there are many downsides.


The coprocessor uses 80-bit floats, that’s more than double precision. It even has a special instruction, fldpi, to get π in the full precision: https://www.felixcloutier.com/x86/fld1:fldl2t:fldl2e:fldpi:f...


It does however completely bollocks up all the subsequent trigonometry functions. Even intel says not to use them.

Also alas, 80bit is a bit of a misnomer - when comparing to the more regular ieee754 representations, it’s actually fp79 :D


Intel's Manual says to use FLDPI. The x87 FPU represents pi internally at extended-extended-precision, so it's more like fp82 :D Were you thinking about the argument reduction gotcha? http://galaxy.agh.edu.pl/~amrozek/AK/x87.pdf


Fldpi only loads the register, rounding according to the current rounding mode, so you still only have 63 bits of precision when you load the value.

For intermediate calculations it incorporates the internal extended pi via the GRS bits.

And yeah the train wreck of argument reduction was my reference




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: