Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
20M digits of pi in 1 minute using Julia (gist.github.com)
109 points by juliusgeo on March 3, 2023 | hide | past | favorite | 77 comments


I've found Julia to be an excellent language for Project Euler [1]. Besides the speed, you can use Unicode identifiers, so the solution can closer follow the math.

1. https://projecteuler.net/


That doesn't seem to be a particularly distinguishing feature. I know that C#, python, and javascript can all use non-ascii identifiers. I believe most widely used languages can also, but I'm less certain of the details.


I think the distinction is that Julia's parser does a relatively good job assigning code-points to unary operators/binary operators/identifiers as appropriate, combined with the fact that multiple dispatch inherently looks more like math.


The difference is that Julia makes it easy to insert these characters (\symbol) so programming with them becomes more natural.



Julia will read 2π (fixed:thx kps) as 2 times pi which follows the mathematical notation. In all the other languages you have mentioned, I believe even if you can represent pi in Unicode (π) you will have to write 2*π. This can become very irritating when you get to for ex. python/numpy where you will litter your code with * and ** instead of the conventional mathematical notation. Gets a bit tiring if you are looking at more than a page of code. Aesthetics matter.


If you really care about mathematical notations, you should be using Mathematica, not Julia.


Mathematica is both closed source, and very idiosyncratic. It is primarily useful as a way to check things, but I have never seen anyone in actual research use Mathematica.


Mathematica is definitely better than Julia for symbolic math, but for computational math or algorithm development, Matlab is basically unusable.


I find that really surprising, would you elaborate? I love julia but I haven't found a repl or pluto based workflow that I am anywhere near as productive in for prototyping algorithms as I am in Matlab. And it typically doesn't even run that much slower.


typo, sorry. that Matlab was supposed to be Mathematica. IMO Julia is also better than Matlab here (I use editor+repl, but Pluto is also pretty good), but Matlab is definitely a contender in this area.


Editor from within the repl or outside? If outside, what triggers the repl to reload files? If inside, which editor? I never found a good way to copy and paste a line I got right and liked into a file. I tried drdocstrings but it never seemed to feel smooth bit doesn't help all the keys in julia default to being emacslike and ally muscle memory is vimish. So even after making my repl editor vim, there is stuff in the repl that feels foreign.


editor outside the repl. Revise means that you get auto-reload on save.


Thanks! I thought revise only worked if I used the internal editor.


Why would you use Matlab to "prototype algorithms"? Maybe we have different notions what that means but Matlab is certainly not aimed at that. It's aimed at engineers who want to evaluate their data or design/control embedded devices, and who are experts at their field of engineering while having very little computer science, numerics or programming background.


I know no one who uses Matlab for the uses you mention. Basically everyone is using it for algorithm development, that includes both students and companies. And the way I understand it, algorithm development/prototyping also its core application.

Not that I like doing this in Matlab, but that's what everyone uses, though some are moving to python.


I work in embedded DSP and controls, and even with a very strong programming background, I find Matlab to be a lot faster to verify my approach for something is going to work than julia or numpy. Not because of all the toolboxes, which I mostly don't use, but because the work flow makes it easier to try stuff fast and visualize it in meaningful ways more easily.


the Julia REPL allows you to type unicode with \blah, also if you use `?` help mode, you can give it a unicode sequence and it will tell you how to type it, so it's nice quality of life

    help?> Ψ₂ˣ
    "Ψ₂ˣ" can be typed by \Psi<tab>\_2<tab>\^x<tab>


While true, one distinguishing features is the sheer number of unicode binary and unary operators that can be used and overloaded. Take for example ∪ for set union and ∩ for set intersection (eg `set1 ∪ set2`).

(You can also type these easily via latex-style completions, e.g. with \cup<TAB> for ∪ or \cap<TAB> for ∩)


Nitpick: is that really computing 20M digits correctly? Reading the code gives me the impression that this does all float computations in 20 million digits (or 20 million-ish? If BigFloat internally is binary, it might use a bit more), but I would expect that you need to compute using a few more digits to avoid rounding errors in the last digits.

Edit: also, reading https://stackoverflow.com/a/67919533, it seems you can do

  BigFloat(π, precision=20000000)
How does that perform?


I ran the following in my Julia REPL

julia> @time BigFloat(π, precision=20000000); 11.208211 seconds (60.53 k allocations: 2.932 GiB, 0.16% gc time)

Which is pretty good considering I ran this in WSL on my laptop and Mathematica in a different comment took 10 seconds. (Plain Windows took ~26 seconds for some reason?)


I’m too late to update my own comment, but I now suspect BigFloat(π, precision=20000000) may allocate a Bigfloat with precision of two million digits and then store the constant π which has much lower precision in it.

What value does that print?


It definitely does calculate it otherwise it would diverge significantly at lower digits from the Gauss Legendre and Chudnovsky algorithm that I compared it to. It just defers calculation to MPFR, the C library Julia binds to for BigFloat calculations.


Also one additional comment is that you are setting precision in number of bits, not in decimal digits. It obviously runs much faster when it is only computing log2(20000000) not 20000000 digits of precision.


> (Plain Windows took ~26 seconds for some reason?)

Maybe the allocations were the reason?


BigFloat internally is not binary, it uses an array because it is made for arbitrary precision. There will absolutely be small errors at the end but it’s easier to say 20M than 1.999999M digits of pi.


In Wolfram Mathematica on my desktop PC with a 7 year old processor:

    Timing[N[\[Pi], 20000000];]
    {10.1094, Null}
On a fairly recent laptop CPU I got 7.6 seconds.

A fair ways to go still!


While this is impressive, it is neither the same algorithm, nor even the same type of language (I already know Julia is much slower than C). I don’t think it is much of a relevant comparison for a benchmark when almost nothing is the same except for the expected answer. Perhaps if Gauss Legendre were implemented in Mathematica it would only take 3 seconds! There’s no way for us to know.


does Mathematica actually compute it or is it just loading the value that it already knows?


Definitely computing. For arbitrary precision evaluation like this, Mathematica has always had cutting-edge algorithms built in.

As a random example: large matrix operations have been multi-threaded for ages.


7.6 second seems too much to be just loading a value from disk


it's not using the same algorithm right?


This is pretty cool. One minor gripe is that there were a lot of useless type asserts cluttering things up. every usage of ::BigFloat here wasn't needed.


This is absolutely true—I was just putting them everywhere possible in an attempt to avoid allocations.


I don’t know much about Julia, why would it potentially cause allocations?


if the code was type unstable that can cause allocations, but in cases like this it wouldn't. Type instability is the ability of the compiler to predict the types of variables from the types of the inputs. Type unstable code can be found using `@code_warntype` and will cause allocations do to boxing and dynamic dispatch.


Thank you for the feedback. I will try to remove as many of the explicit type hints as possible. The main reason I added them everywhere is because I found sometimes there would be instability at super low decimal ranges because of type promotion


Yeah this is a minor gripe with Julia. It has type inference but it's risky to use.


True, but you can remove a lot of the risk by inspecting the results of type inference with `@code_warntype` in places where you suspect it might be not be working.

I believe JET.jl can automate global program analysis, but I haven't used it.


Just wanted to mention that SuperPi - the benchmark that is very widely used for at least a decade for measuring PC / CPU single threaded performance uses the same algorithm so not really obscure. That said, multi threaded algorithms make more sense for modern multi core computer architectures, something like this for example:

http://www.numberworld.org/y-cruncher/


I still remember when the SuperPI 1M <10s record was beaten... with a CPU overclocked to 5GHz+: https://tweakers.net/ext/i/1160952453.gif

The world record remains at just under 4s, with 8GHz+ oerclocks: https://hwbot.org/benchmark/superpi_-_1m/halloffame

Clock cycle efficiency doesn't seem to have improved much in the last decade.



The following implementation of the Chudnovsky algorithm takes about the same time in Gambit-Scheme on M1 Mac for 20M digits (and allocates 16GB):

  (define (pi digits)
    (let* ((A 13591409)
           (B 545140134)
           (C 640320)
           (C^3/24 (quotient (expt 640320 3) 24))
           (D 12))
      (define (-1^n*g n g) (if (odd? n) (- g) g))
      (define (split m n)
        (if (= 1 (- n m))
          (let* ((6n (* 6 n)) (g (* (- 6n 5) (- (+ n n) 1) (- 6n 1))))
            (list g (* C^3/24 (expt n 3)) (* (-1^n*g n g) (+ (* n B) A))))
          (let* ((mid (quotient (+ m n) 2))
                 (gpq1 (split m mid))
                 (gpq2 (split mid n))
                 (g1 (car gpq1)) (p1 (cadr gpq1)) (q1 (caddr gpq1))
                 (g2 (car gpq2)) (p2 (cadr gpq2)) (q2 (caddr gpq2)))
            (list (* g1 g2) (* p1 p2) (+ (* q1 p2) (* q2 g1))))))
      (let* ((num-terms (inexact->exact (floor (+ 2 (/ digits 14.181647462)))))
             (sqrt-C (integer-sqrt (* C (expt 100 digits))))
             (gpq (split 0 num-terms))
             (g (car gpq)) (p (cadr gpq)) (q (caddr gpq)))
        (quotient (* p C sqrt-C) (* D (+ q (* p A)))))))
I am surprised it is much slower in Julia (as per what is noted in the gist).


Interesting! What does Gauss Legendre look like in Gambit-Scheme? I also wonder if perhaps there is a way to get the Julia implementation up to par…


I will give it a try, time permitting. The scheme code above is a variation of something I converted from CommonLisp (orignal code by Robert Smith) in 2013. Its speed depends on bignum multiplication speed. I imagine most serious bignum implementations use the Schönhage–Strassen algorithm for very large numbers.


Thanks for looking into this! That is interesting, I wonder if it varies a great deal based on the implementation of the BigFloat rather than the language that it is implemented in.


Here is a video [1] and associated java source code [2] from pi day in 2018 to generate 1M (or more) digits of pi using the same Gauss-Legendre algorithm.

[1] https://www.youtube.com/watch?v=8RONJPOgZjw

[2] https://github.com/R-e-t-u-r-n-N-u-l-l/Java-Algorithms/blob/...


My algorithm only takes about 1.5 seconds for 1M digits. These are very cool resources though, I will read them.


Thanks! I haven't run any timings on the link I posted. I suspect julia would be much faster than java, though! I just thought it was neat that your post in March reminded me of an earlier pi day post on the same algorithm. I like the explosion from 1M to 20M.


I would love to see the timings for Java runtimes. The various implementations of BigFloats seem to be very customized to the language, so maybe you could do it faster!


On my computer the Julia algorithm gives .98 seconds for 1M digits and 40.7 seconds for 20M (zen 2 cpu).


Thanks for the data!


Isn't this is more hardware dependent? A 2012 Intel notebook vs a 2022 AMD laptop would yield different results with same code. Also a well tune assembly can do these way faster. A python wrapped with assembly functions and run on a machine 6ghz Intel machine with ramdisk would also outperform this. I always feel this kind of claim is not helpful in day to day usage.


That is why I compared it to the stdlib implementation.


Is 1-minute fast?


I've done 2.5 billion digits in 88 seconds with ycruncher. That's about 28.4 million digits a second?

http://ewams.net/?date=2020/04/17&view=sockets_vs_cores_for_...

Though that is with a broadwell CPU and about 3 years ago. Might see if I can get my hands on a 6348 ice lake CPU and see how she burns.


All variables held the same, Julia often beats C, NumPy, and most other languages.

Julia also supports transparent Cluster API and CUDA integration. i.e. the hot-mess you hope to never have to maintain professionally.

In my humble opinion, the utility of this high-level language makes it fundamentally different. Thus, it is slowly becoming more mainstream as the ecosystem stabilizes.

Remember to have fun =)


I just ran the same thing in Mathematica and I got 20 million digits in just over 7 seconds.

You can have "very high level functional programming" and 8x the speed of Julia in a mathematics-oriented language!


Mathematica does several optimizations before execution, and as such can exploit a priori knowledge of many problems.

The danger of course is when Wolfram takes those assumptions into a room of mathematicians. It is controversial for all the wrong reasons. =)

It is a common question by the way =)

https://stackoverflow.com/questions/60121757/julia-vs-mathem...


Julia already is a very high-level mathematics-oriented programming language, though.

The reason Mathematica is so much faster here is it’s using a different algorithm. When you compare using the same algorithm, Julia is 10-100x faster than Mathematica. https://julialang.org/benchmarks/


But last I checked there's Julia for Python? So you could get the best of both world (nice syntax of Python and the speed of Julia).


I feel like you’re misunderstanding how Julia syntax works. Julia syntax is Python syntax, for the most part. They’re not identical, but the differences are very small, and typically favor Julia—for example, compare:

x*(y.^2)

To the Python equivalent:

np.matmul(x, map(lambda x : x^2, y))

Note also that the first will be much faster because it can fuse the matrix multiplication with the exponentiation. That’s because Julia is a single language, and the whole thing is compiled at the same time. Python can’t fix the above code because NumPy is written in C, not Python.

Julia is just Python but fast.


Yes, Julia has wrappers for many other languages, so there are options if you really need some specific chunk of C/C++, Fortran, Python, Octave or R etc.

Yet a polyglot project is bad design, and tends to become an abomination in time. This is one reason many Go programmers rewrote libraries rather than saturate their code with cgo/C and SWIG/C++ wrappers. Similarly, people are writing scalable versions of libraries in Julia to improve transparent parallel performance for problems too big for a single machine to feasibly handle.

Things are still undergoing change, but of course the commercial nvidia binary blobs will haunt everyone for awhile. =)


In the article I compare my implementation to the MPFR pi implementation which is in pure C. So the fact I’m able to get close to that performance in Julia is impressive(I think).


When I googled digits of pi I found http://www.numberworld.org/y-cruncher/ where they get 25M digits in about one second (rather than one minute) depending on what kind of computer you use.


y-cruncher is the state of the art but it is also not written in a language like julia, and doubtlessly occupies many more lines of code


Y-cruncher is closed source, there’s no way to know the number of lines of code it uses compared to Julia.


That is true, but isn’t it then more useful to provide an open source method of computation at these ranges?


genuine question: would that amount of digits be useful for any application?



Thank you for posting this! This is exactly what I was looking for.


TLDR: "For JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793"...Using pi to 15 decimal places, you'd get the distance of Voyager 1 wrong by less than 1cm!


No, it is basically only useful as a programming exercise. You only need like 40 digits to compute the circumference of the universe to the precision of a hydrogen atom.


What is the circumference of the universe?


2.8×10^39 picometers is circumference of the observable part of our universe, roughly. A sole hydrogen atom has a bohr radius of about 25 picometers.

So 40 digits of pi would indeed be in the ballpark for computing anything distance-wise at the circumference of the observable universe down to the hydrogen electron cloud size.


One of the siblings posted a link from JPL about the number of digits they use for their highest precision calculations. In that article they also state that 37 decimals would be sufficient for calculating the cirumference of the observable universe down to the accuracy of the diameter of a hydrogen atom.[0]

[0] https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimal... (Right before the last inline image)


https://github.com/philipl/pifs

but this is a joke hahaha




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: