I'll take this opportunity to point out that if you're doing anything numpy related that seems too slow you should run numba on it, in my case we were doing a lot of cosine distance calculations and our inference time sped up 10x by simply running the cosine distance function from numpy through numba and it's as easy as adding a decorator.
Taichi vs. Numba: As its name indicates, Numba is tailored for Numpy. Numba is recommended if your functions involve vectorization of Numpy arrays. Compared with Numba, Taichi enjoys the following advantages:
Taichi supports multiple data types, including struct, dataclass, quant, and sparse, and allows you to adjust memory layout flexibly. This feature is extremely desirable when a program handles massive amounts of data. However, Numba only performs best when dealing with dense NumPy arrays.
Taichi can call different GPU backends for computation, making large-scale parallel programming (such as particle simulation or rendering) as easy as winking. But it would be hard even to imagine writing a renderer in Numba.
Except some people don't read the article and already assume numpy is "very" optimized that they might gloss over that line without reading much into it. That line also doesn't say that you might get a 10x speed-up while using numba. I remember when I first came across numba I searched HN for references and didn't find many stories or comments praising it so I skipped over it initially so having HN comments might be useful for future HN'ers.
There is Numba and then there is Nutika if you want to compile to a binary. I’m not sure the two work together. But Taichi may work with Nutika for runtime optimization as a binary.
The equivalent of Nutika for numerical code is pythran. Which compiles to highly optimized c++ code. I have been getting the best speedup with the least changes when using it compared to numba or cython (haven't tested taichi yet).
Do jax/tensorflow/pytorch work with numba? I.e. can you pass one of their arrays through a numba function and have it (a) not crash (b) support backprop?
1) "Diffussion" is species vs time equals species spatial laplacian.
2) The "reaction" equations are non-painfully derived from Baez stochastic Petri nets/chemical reaction networks in [1] (species vs time = multivariate polynomial in species, "space dependant rate equation")
So Reaction-Diffusion is just adding up. Species vs time = species spatial laplacian plus multivariate polynomial in species. One more for the toolbox!
It's just alternating between sharpen and smoothen. Sharpen hallucinates new information. Smoothen diffuses and erases information. Thus there is this interplay with constant hallucinations that take place over previous ones.
Mathematicians and biologists have a hammer, so everything looks like a nail.
I'm interested in understanding this comment but I don't know where to start. I love the way Reaction Diffusion simulations look and I've coded it up a few times. But I don't understand what you mean by "species vs time". (Some of the other technical language seems more Google-able, but "species vs time" isn't turning up anything obvious.)
I think GP is referring to the heat equation. "species" is the concentration of the "stuff" that you're describing mathematically. Call that u(x), where is a spatial coordinate and u is a real valued function.
Then the diffusive part says
du(x, t)/dt = \nabla_x u(x, t).
The \nabla term is the laplacian: a multivariate form of the second derivative.
The equation says that a short time from now, u(x, t) will change in proportion to the average value of u, calculated over a small ball surrounding the point x, minus the value of u at the point x itself.
If there's less "stuff" in the points that neighbour x than at x itself, the function will decrease over time. Similarly if there's more stuff at the neighbours of x, u(x, t) will increase. This is the basis of diffusive behaviour.
(Edit: I think the equation in the article is wrong, unless I've misunderstood something: they have a delta (first derivative) when they should have a nabla (laplacian))
I think you might have gotten your symbols mixed up a litte.
From what I understand, just \nabla(f) describes the gradient of a function f -- meaning all the first order partial derivatives (of a certain point) in a vector.
\nabla * f describes the divergence of the function f -- meaning a scalar field of the quantity of the vector field's sources at each point.
\nabla * \nabla(f), divergence of gradient, then describes the Laplace operator.
Also written as \nabla^2(f) or \Delta(f) -- note that it's an uppercase delta.
HN doesn't support LaTeX in comments sadly, so I am just writing it out as you would before rendering it with LaTeX.
Maybe there are add-ons that detect valid LaTeX math symbols and convert them whenever possible but as-is you can't get it to render within the comments.
An extremely interesting area. I keep wanting to use it for something but haven't had a good use case yet, nor frankly do I think I really understand it.
This code is recursive and generate set partitions for large N values (N larger than 12), it essentially works by skipping small partitions and small subsets to target desirable set partitions. Solutions that don't skip those suffer from "combinatory explosion".
I did not write this code, I want to test it later with taichi, but I'm curious if taichi can run this faster.
Slightly off topic but the choice of name is interesting given that Tai chi is well-known for its slow movements and being practiced by the elderly at the park.
I practice tai chi and I'm not elderly (though not exactly young either :-) ) tai chi is actually very hard to do well because it requires a lot of flexibility, I mean a lot).
You should see what Taichi lessons in Chinese colleges look like: full with students who like no kinds of sports but must choose a PE class, and yes, I was one of them.
Unfortunately, phytran is missing in the comparison. Phytran works in a lot of cases and it easy to use by just using python types. I would like to see a comparison with taichi, as taichi also seems to be interesting.
What do you mean disappointing? I have consistently been getting the best results with pythran. That said it is strongly focused on numerical code, so your milage might vary for other code. You also should add compiler optimisation flags to get the best performance.
Regarding Nutika AFAIK its goal is not a speed up and performance gains are pretty modest in most cases.
Maybe I didn't have appropriate compiler flags. I didn't devote much time to it, as it didn't seem promising from my initial attempts and I found it easier to get faster performance with other methods (see reply to sibling comment). This was integer numerical code filling in a 2D array with lots of individual comparisons and backtracking (no whole-array operations). Terrible for plain Python, but Cython and C ate it up.
Pythran and Nuitka have very different philosophies and goals. Pythran aims for performance first, and to achieve that it is willing to sacrifice a lot of compatibility and supports only a small subset of python. Taking a random piece of python code and trying to run it under pythran will almost certainly fail. To get the most out of Pythran you really have to write 'pythran' code rather than 'python' code.
Nuitka aims for 100% compatibility first. If you have some random python code that works under CPython, but not Nuitka, then that is a bug that will be fixed. To achieve this compatibility there are a lot of optimisations that cannot be done. If you have code that works under both Nuitka and Pythran then it will almost certainly be faster in Pythran.
For the most recent application I tried Pythran on, significantly annotated Cython was the winner (compared to plain Python, Numba, Pythran, and more readable Cython). Plain Python was the slowest, and Pythran was much slower than the others. I didn't try Nuitka for it. I ended up rewriting the key code in C anyway, which was faster still. This was integer numerical code filling in a 2D array with lots of comparisons and backtracking.
I thought it was a parsing issue in Python when doing "import taichi as ti" vs "import taichi". No it's just presenting Taichi, a Python package to do parallel computation.
EDIT: title of the thread was "Accelerate Python code 100x by import taichi as ti" like TFA
Me too - it wouldn't be unheard of in a language where referencing multiple.levels.of.variable in a loop is orders of magnitude slower than doing "a = multiple.levels.of.variable" outside the loop and referencing a inside of it.
*may have been fixed in recent versions of Python - I heard of this many years ago!
Isn’t that expected behaviour, as you’re only looking up “a” once when you do it outside the loop, while doing it every time when inside the loop?
Because any reference in the whole hierarchy could change during the looping (e.g. one could say “multiple.levels = {}” at some point), the interpreter really would need to check it every time unless it can somehow “prove” that these changes will never happen / haven’t happened.
Just keeping a reference to “a” is semantically very different, and I’d consider that a normal optimisation.
The issue is just how slow python is, it takes a very long time to resolve those references. So it's expected that the multiple.levels.of.dereference would be slower but it's perhaps orders of magnitude slower where in another language it might be much less of a performance hit.
Which entity is behind this? Usually you have some details on the genesis of projects like this.
I will never get approval to use this on the HPC systems of my org if its developed by an unknown entity in China ( as the wechat link might indicate ), which is sad since it looks useful.
Taichi vs. Numba: As its name indicates, Numba is tailored for Numpy. Numba is recommended if your functions involve vectorization of Numpy arrays. Compared with Numba, Taichi enjoys the following advantages:
Taichi supports multiple data types, including struct, dataclass, quant, and sparse, and allows you to adjust memory layout flexibly. This feature is extremely desirable when a program handles massive amounts of data. However, Numba only performs best when dealing with dense NumPy arrays.
Taichi can call different GPU backends for computation, making large-scale parallel programming (such as particle simulation or rendering) as easy as winking. But it would be hard even to imagine writing a renderer in Numba.
The only ones that I know of are in https://arxiv.org/abs/2012.06684 (with Julia DifferentialEquations.jl and DiffTaichi), but those are more algorithmic. There, Julia does extremely well, but the conclusion that I would draw from it is more: use a programming language with robust and differentiable differential equation solvers rather than writing simple Euler loops by hand (as this article does).
My ansible-pull runs are horribly slow (git pull is fast, running a few hundred tasks on the local system is not). I suspect the problem is in unnecessary copies of large dicts and lots of them. How would I confirm or refute that? If that is the problem, what techniques are available to address it?
In python, and many other languages, there are plenty of performance hacks you can apply that might make your code faster, but simultaneously less readable - often by stripping away levels of abstraction, or restructuring your code/data in unintuitive ways.
For example, the prime-counting example in the article could've been optimised slightly by inlining the is_prime function into the body of the count_primes function (thus removing the overhead of calling a function).
That’s true, but it’s not really a characteristic of Python specifically. You could replace “Python” with basically any other language (yes, even languages with “zero cost” abstractions like Rust or C++) and the statement “readability comes at the cost of performance” would still hold true. Poor performance is simply not an intrinsic property of the syntax/grammar of Python.
In most part, but not only. There are biological constraints on how the eye can move, and there are cognitive barriers you cannot overcome (how many "things" you can hold in your working memory at once). The code can be extremely familiar, but still hard to read, if it's too dense, too sparse, too long, involves too many concepts, and so on. You should still optimize the code you write along these axes to make it readable. It's true, however, that most of what people call "readability" is actually just familiarity - whether you should optimize the code for those unfamiliar with the language/domain/idiom is a separate question (the answer being: it depends).
I strongly disagree. Truly readable code ought to be understandable to a newcomer to a particular codebase (or even, the language) - usually the sweet-spot is somewhere slightly terser than that, though.
What you're saying, in other words, is that your definition of "readable code" is "code which is understandable to a newcomer to a particular codebase" - which is a perfectly valid take, and each team has full rights to set their definition on a level that suits them the best.
My comment simply observes the reality, and does not recommend any given practice.
This probably just refers to the current reality of using Python in many cases. If you want it's readability, you'll (most likely) need to deal with it's relatively slow runtime.
That said, the readability itself shouldn't have much to do with the performance, as many reimplementations of the Python runtime already show.
To answer the question, JavaScript is 31x faster out of the box on size 1000000 (compared to the 6x claimed in the post). 71x on 10M.
bwasti@bwasti-mbp code % time python3 prime.py
78498
python3 prime.py 3.86s user 0.02s system 98% cpu 3.938 total
bwasti@bwasti-mbp code % time bun prime.js
78498
bun prime.js 0.07s user 0.02s system 74% cpu 0.125 total
I was curious about node/jitless node versus python/pypy:
src time node primes.js
78498
node primes.js 1.75s user 0.05s system 102% cpu 1.761 total
src time node --jitless primes.js
78498
node --jitless primes.js 5.06s user 0.04s system 100% cpu 5.088 total
src time python primes.py
78498
python primes.py 4.27s user 0.03s system 100% cpu 4.276 total
src asdf shell python pypy3.9-7.3.9
src time python primes.py
78498
python primes.py 0.91s user 0.07s system 100% cpu 0.970 total
> I am a loyal C++/Fortran user but would like to try out Python as it is gaining increasing popularity. However, rewriting code in Python is a nightmare - I feel the performance must be more than 100x slower than before!
I think I found your problem. TBH you might like Julia more than Python and you won't have to invent a new DSL in the process.
If you switch to julia you'd have to port over the entire pytorch ecosystem as well if you want to use this together with that (taichi has great compatability via `to_tensor`). Also, the kernels you write in taichi is hardly a DSL, they're nearly indistinguishable from normal python code (minus the rare occasional caveat).
> I feel the performance must be more than 100x slower than before!
Not too far off. This paper [0] compared 27 languages for speed and energy efficiency (which is very interesting).
Python was 72 times slower than C and consumed 76 times more energy.
I think Python is a very useful language at many levels. Great for prototyping stuff. No doubt about that. If performance and energy efficiency are important, it seems obvious one has to look elsewhere.
I tried Julia and I had to wait several seconds for my program to start due to JIT even for very trivial applications. Did I do anything wrong? Seemed like the worst of both worlds. Python runs slow but at least it starts fast.
Python is rising in popularity? I can't remember the last time I reached to Python for something. Not that it says anything about the language but it kinda disappeared from my bubble.