Accelerate Python code by importing Taichi

jack_pp · on Sept 8, 2022

I'll take this opportunity to point out that if you're doing anything numpy related that seems too slow you should run numba on it, in my case we were doing a lot of cosine distance calculations and our inference time sped up 10x by simply running the cosine distance function from numpy through numba and it's as easy as adding a decorator.

learndeeply · on Sept 9, 2022

This is mentioned directly in the article:

Taichi vs. Numba: As its name indicates, Numba is tailored for Numpy. Numba is recommended if your functions involve vectorization of Numpy arrays. Compared with Numba, Taichi enjoys the following advantages:

Taichi supports multiple data types, including struct, dataclass, quant, and sparse, and allows you to adjust memory layout flexibly. This feature is extremely desirable when a program handles massive amounts of data. However, Numba only performs best when dealing with dense NumPy arrays. Taichi can call different GPU backends for computation, making large-scale parallel programming (such as particle simulation or rendering) as easy as winking. But it would be hard even to imagine writing a renderer in Numba.

jack_pp · on Sept 9, 2022

Except some people don't read the article and already assume numpy is "very" optimized that they might gloss over that line without reading much into it. That line also doesn't say that you might get a 10x speed-up while using numba. I remember when I first came across numba I searched HN for references and didn't find many stories or comments praising it so I skipped over it initially so having HN comments might be useful for future HN'ers.

gautamdivgi · on Sept 9, 2022

There is Numba and then there is Nutika if you want to compile to a binary. I’m not sure the two work together. But Taichi may work with Nutika for runtime optimization as a binary.

cycomanic · on Sept 9, 2022

The equivalent of Nutika for numerical code is pythran. Which compiles to highly optimized c++ code. I have been getting the best speedup with the least changes when using it compared to numba or cython (haven't tested taichi yet).

vram22 · on Sept 9, 2022

s/Nutika/Nuitka

https://en.m.wikipedia.org/wiki/Nuitka

krastanov · on Sept 8, 2022

Do jax/tensorflow/pytorch work with numba? I.e. can you pass one of their arrays through a numba function and have it (a) not crash (b) support backprop?

p1esk · on Sept 9, 2022

Seems like pytorch does: https://stackoverflow.com/questions/63169760/how-can-i-use-n...

krastanov · on Sept 9, 2022

It seems it would not crash if you use some interface function, but it does not seem to support autodiff.

Siira · on Sept 9, 2022

Jax already has its own tracing compiler which produces optimized code.

jesuslop · on Sept 8, 2022

I now get that Reaction-Diffusion business.

1) "Diffussion" is species vs time equals species spatial laplacian.

2) The "reaction" equations are non-painfully derived from Baez stochastic Petri nets/chemical reaction networks in [1] (species vs time = multivariate polynomial in species, "space dependant rate equation")

So Reaction-Diffusion is just adding up. Species vs time = species spatial laplacian plus multivariate polynomial in species. One more for the toolbox!

[1] https://arxiv.org/abs/1209.3632

127 · on Sept 9, 2022

It's just alternating between sharpen and smoothen. Sharpen hallucinates new information. Smoothen diffuses and erases information. Thus there is this interplay with constant hallucinations that take place over previous ones.

Mathematicians and biologists have a hammer, so everything looks like a nail.

astroalex · on Sept 8, 2022

I'm interested in understanding this comment but I don't know where to start. I love the way Reaction Diffusion simulations look and I've coded it up a few times. But I don't understand what you mean by "species vs time". (Some of the other technical language seems more Google-able, but "species vs time" isn't turning up anything obvious.)

soVeryTired · on Sept 8, 2022

I think GP is referring to the heat equation. "species" is the concentration of the "stuff" that you're describing mathematically. Call that u(x), where is a spatial coordinate and u is a real valued function.

Then the diffusive part says

du(x, t)/dt = \nabla_x u(x, t).

The \nabla term is the laplacian: a multivariate form of the second derivative.

The equation says that a short time from now, u(x, t) will change in proportion to the average value of u, calculated over a small ball surrounding the point x, minus the value of u at the point x itself.

If there's less "stuff" in the points that neighbour x than at x itself, the function will decrease over time. Similarly if there's more stuff at the neighbours of x, u(x, t) will increase. This is the basis of diffusive behaviour.

(Edit: I think the equation in the article is wrong, unless I've misunderstood something: they have a delta (first derivative) when they should have a nabla (laplacian))

xkevio · on Sept 9, 2022

I think you might have gotten your symbols mixed up a litte.

From what I understand, just \nabla(f) describes the gradient of a function f -- meaning all the first order partial derivatives (of a certain point) in a vector.

\nabla * f describes the divergence of the function f -- meaning a scalar field of the quantity of the vector field's sources at each point.

\nabla * \nabla(f), divergence of gradient, then describes the Laplace operator. Also written as \nabla^2(f) or \Delta(f) -- note that it's an uppercase delta.

soVeryTired · on Sept 9, 2022

You’re absolutely right - how embarrassing! It’s been a few years since I’ve touched this stuff.

jesuslop · on Sept 9, 2022

Glitch apart, thanks for expanding, that was what I meant.

qorrect · on Sept 9, 2022

How can I get these \nabla s to render, is there a plugin or something ?

xkevio · on Sept 9, 2022

HN doesn't support LaTeX in comments sadly, so I am just writing it out as you would before rendering it with LaTeX.

Maybe there are add-ons that detect valid LaTeX math symbols and convert them whenever possible but as-is you can't get it to render within the comments.

soVeryTired · on Sept 9, 2022

Yeah, it’s a brain-render unfortunately. Once you’ve written enough latex you work pretty comfortably with just the text for communication.

IanCal · on Sept 8, 2022

I thought I recognized the name. Taichi also is linked with differentiable programming https://docs.taichi-lang.org/docs/differentiable_programming

An extremely interesting area. I keep wanting to use it for something but haven't had a good use case yet, nor frankly do I think I really understand it.

jokoon · on Sept 8, 2022

question to OP:

how faster can the code in those SO answers be?

https://stackoverflow.com/questions/73473074/speed-up-set-pa...

This code is recursive and generate set partitions for large N values (N larger than 12), it essentially works by skipping small partitions and small subsets to target desirable set partitions. Solutions that don't skip those suffer from "combinatory explosion".

I did not write this code, I want to test it later with taichi, but I'm curious if taichi can run this faster.

windsignaling · on Sept 9, 2022

Slightly off topic but the choice of name is interesting given that Tai chi is well-known for its slow movements and being practiced by the elderly at the park.

tiagod · on Sept 9, 2022

I think it might have to do with making your code faster without touching it

wiz21c · on Sept 9, 2022

I practice tai chi and I'm not elderly (though not exactly young either :-) ) tai chi is actually very hard to do well because it requires a lot of flexibility, I mean a lot).

Bimos · on Sept 9, 2022

You should see what Taichi lessons in Chinese colleges look like: full with students who like no kinds of sports but must choose a PE class, and yes, I was one of them.

tomthe · on Sept 8, 2022

Unfortunately, phytran is missing in the comparison. Phytran works in a lot of cases and it easy to use by just using python types. I would like to see a comparison with taichi, as taichi also seems to be interesting.

mkl · on Sept 8, 2022

I'm pretty sure you mean Pythran. That's been disappointing in my experiments with it. Nuitka is another one that's missing.

cycomanic · on Sept 9, 2022

What do you mean disappointing? I have consistently been getting the best results with pythran. That said it is strongly focused on numerical code, so your milage might vary for other code. You also should add compiler optimisation flags to get the best performance.

Regarding Nutika AFAIK its goal is not a speed up and performance gains are pretty modest in most cases.

mkl · on Sept 9, 2022

Maybe I didn't have appropriate compiler flags. I didn't devote much time to it, as it didn't seem promising from my initial attempts and I found it easier to get faster performance with other methods (see reply to sibling comment). This was integer numerical code filling in a 2D array with lots of individual comparisons and backtracking (no whole-array operations). Terrible for plain Python, but Cython and C ate it up.

tomthe · on Sept 9, 2022

Yes, I mean Pythran ( https://github.com/serge-sans-paille/pythran ). Thank you.

Was Nuitka better? Pythran is quite simple to install and use in Jupyter.

dagw · on Sept 9, 2022

Pythran and Nuitka have very different philosophies and goals. Pythran aims for performance first, and to achieve that it is willing to sacrifice a lot of compatibility and supports only a small subset of python. Taking a random piece of python code and trying to run it under pythran will almost certainly fail. To get the most out of Pythran you really have to write 'pythran' code rather than 'python' code.

Nuitka aims for 100% compatibility first. If you have some random python code that works under CPython, but not Nuitka, then that is a bug that will be fixed. To achieve this compatibility there are a lot of optimisations that cannot be done. If you have code that works under both Nuitka and Pythran then it will almost certainly be faster in Pythran.

mkl · on Sept 9, 2022

For the most recent application I tried Pythran on, significantly annotated Cython was the winner (compared to plain Python, Numba, Pythran, and more readable Cython). Plain Python was the slowest, and Pythran was much slower than the others. I didn't try Nuitka for it. I ended up rewriting the key code in C anyway, which was faster still. This was integer numerical code filling in a 2D array with lots of comparisons and backtracking.

NotYourLawyer · on Sept 8, 2022

Sounds kind of like the old psyco, but with GPU support.

sitkack · on Sept 9, 2022

Or JAX

speps · on Sept 8, 2022

I thought it was a parsing issue in Python when doing "import taichi as ti" vs "import taichi". No it's just presenting Taichi, a Python package to do parallel computation.

EDIT: title of the thread was "Accelerate Python code 100x by import taichi as ti" like TFA

rjh29 · on Sept 8, 2022

Me too - it wouldn't be unheard of in a language where referencing multiple.levels.of.variable in a loop is orders of magnitude slower than doing "a = multiple.levels.of.variable" outside the loop and referencing a inside of it.

*may have been fixed in recent versions of Python - I heard of this many years ago!

nerdponx · on Sept 8, 2022

Historically this was a legitimate performance micro-optimization in Python:

    def f(i):
         ...

    def g():
        _f = f
        for i in range(100000):
            _f(i)

because looking up a local variable was faster than looking up a global. I'm not sure if that's still true in newer versions.

nequo · on Sept 8, 2022

Sounds like Lua. But I see no difference in execution time with Python 3.10.

stingraycharles · on Sept 8, 2022

Isn’t that expected behaviour, as you’re only looking up “a” once when you do it outside the loop, while doing it every time when inside the loop?

Because any reference in the whole hierarchy could change during the looping (e.g. one could say “multiple.levels = {}” at some point), the interpreter really would need to check it every time unless it can somehow “prove” that these changes will never happen / haven’t happened.

Just keeping a reference to “a” is semantically very different, and I’d consider that a normal optimisation.

20after4 · on Sept 8, 2022

The issue is just how slow python is, it takes a very long time to resolve those references. So it's expected that the multiple.levels.of.dereference would be slower but it's perhaps orders of magnitude slower where in another language it might be much less of a performance hit.

stingraycharles · on Sept 8, 2022

Maybe, but then the loop has nothing to do with it, and the problem is rather the slow lookups.

niemandhier · on Sept 9, 2022

I am missing the 'about us' part of the website.

Which entity is behind this? Usually you have some details on the genesis of projects like this.

I will never get approval to use this on the HPC systems of my org if its developed by an unknown entity in China ( as the wechat link might indicate ), which is sad since it looks useful.

niemandhier · on Sept 9, 2022

I found https://taichi.graphics/ which is the related company. On their website there is also no obvious info about where they are incorporated.

Crunchbase lists their headquarter as Beijing.

ipunchghosts · on Sept 8, 2022

How does this differ from numba?

garyrob · on Sept 8, 2022

Taichi vs. Numba: As its name indicates, Numba is tailored for Numpy. Numba is recommended if your functions involve vectorization of Numpy arrays. Compared with Numba, Taichi enjoys the following advantages:

Taichi supports multiple data types, including struct, dataclass, quant, and sparse, and allows you to adjust memory layout flexibly. This feature is extremely desirable when a program handles massive amounts of data. However, Numba only performs best when dealing with dense NumPy arrays. Taichi can call different GPU backends for computation, making large-scale parallel programming (such as particle simulation or rendering) as easy as winking. But it would be hard even to imagine writing a renderer in Numba.

make3 · on Sept 8, 2022

did they show benchmark to support that it does better

insane_dreamer · on Sept 8, 2022

Impressive performance gains.

any Taichi v Julia benchmarks?

ChrisRackauckas · on Sept 8, 2022

The only ones that I know of are in https://arxiv.org/abs/2012.06684 (with Julia DifferentialEquations.jl and DiffTaichi), but those are more algorithmic. There, Julia does extremely well, but the conclusion that I would draw from it is more: use a programming language with robust and differentiable differential equation solvers rather than writing simple Euler loops by hand (as this article does).

tomaszsobota · on Sept 8, 2022

I enjoyed learning about Reaction-Diffusion from this article more than actually finding out I can run them 93475x faster with Taichi ;D

salty_biscuits · on Sept 8, 2022

Interestingly enough, Alan Turing is originator of the study of reaction diffusion equations in biology

https://www.semanticscholar.org/paper/The-chemical-basis-of-...

thyrsus · on Sept 9, 2022

My ansible-pull runs are horribly slow (git pull is fast, running a few hundred tasks on the local system is not). I suspect the problem is in unnecessary copies of large dicts and lots of them. How would I confirm or refute that? If that is the problem, what techniques are available to address it?

okasaki · on Sept 8, 2022

Would be even more interesting if the examples were also implemented in C.

catlifeonmars · on Sept 9, 2022

> Yet its [python] easy readability comes at the cost of performance.

This does not sound correct. Why would there be a tradeoff between readability and performance?

Retr0id · on Sept 9, 2022

In python, and many other languages, there are plenty of performance hacks you can apply that might make your code faster, but simultaneously less readable - often by stripping away levels of abstraction, or restructuring your code/data in unintuitive ways.

For example, the prime-counting example in the article could've been optimised slightly by inlining the is_prime function into the body of the count_primes function (thus removing the overhead of calling a function).

catlifeonmars · on Sept 14, 2022

That’s true, but it’s not really a characteristic of Python specifically. You could replace “Python” with basically any other language (yes, even languages with “zero cost” abstractions like Rust or C++) and the statement “readability comes at the cost of performance” would still hold true. Poor performance is simply not an intrinsic property of the syntax/grammar of Python.

BerislavLopac · on Sept 9, 2022

"Readability" is solely a function of familiarity.

klibertp · on Sept 9, 2022

In most part, but not only. There are biological constraints on how the eye can move, and there are cognitive barriers you cannot overcome (how many "things" you can hold in your working memory at once). The code can be extremely familiar, but still hard to read, if it's too dense, too sparse, too long, involves too many concepts, and so on. You should still optimize the code you write along these axes to make it readable. It's true, however, that most of what people call "readability" is actually just familiarity - whether you should optimize the code for those unfamiliar with the language/domain/idiom is a separate question (the answer being: it depends).

Retr0id · on Sept 10, 2022

I strongly disagree. Truly readable code ought to be understandable to a newcomer to a particular codebase (or even, the language) - usually the sweet-spot is somewhere slightly terser than that, though.

BerislavLopac · on Sept 10, 2022

What you're saying, in other words, is that your definition of "readable code" is "code which is understandable to a newcomer to a particular codebase" - which is a perfectly valid take, and each team has full rights to set their definition on a level that suits them the best.

My comment simply observes the reality, and does not recommend any given practice.

KronisLV · on Sept 9, 2022

This probably just refers to the current reality of using Python in many cases. If you want it's readability, you'll (most likely) need to deal with it's relatively slow runtime.

That said, the readability itself shouldn't have much to do with the performance, as many reimplementations of the Python runtime already show.

brrrrrm · on Sept 8, 2022

how does this compare to a normal JIT compiled language? Hacking on performance to Python seems really brittle

brrrrrm · on Sept 8, 2022

To answer the question, JavaScript is 31x faster out of the box on size 1000000 (compared to the 6x claimed in the post). 71x on 10M.

    bwasti@bwasti-mbp code % time python3 prime.py
    78498
    python3 prime.py  3.86s user 0.02s system 98% cpu 3.938 total
    bwasti@bwasti-mbp code % time bun prime.js
    78498
    bun prime.js  0.07s user 0.02s system 74% cpu 0.125 total

zeugmasyllepsis · on Sept 8, 2022

I was curious about node/jitless node versus python/pypy:

      src time node primes.js             
    78498
    node primes.js  1.75s user 0.05s system 102% cpu 1.761 total
      src time node --jitless primes.js
    78498
    node --jitless primes.js  5.06s user 0.04s system 100% cpu 5.088 total
      src time python primes.py
    78498
    python primes.py  4.27s user 0.03s system 100% cpu 4.276 total
      src asdf shell python pypy3.9-7.3.9
      src time python primes.py
    78498
    python primes.py  0.91s user 0.07s system 100% cpu 0.970 total

Actually relatively on par with one another.

skykooler · on Sept 8, 2022

This page is unreadable if your system theme is dark mode - dark grey text on a black background.

MagerValp · on Sept 8, 2022

Just a wild guess - are you using Firefox with the facebook container and seeing font issues on multiple sites? Upgrade to the newly released 2.3.4.

skykooler · on Sept 9, 2022

Indeed I am. Thank you, that helped!

djvdorp · on Sept 9, 2022

Thanks, this also helped here!

sysop073 · on Sept 8, 2022

The text shows up white for me.

visarga · on Sept 8, 2022

Does it support dicts? I guess not

stevenhuang · on Sept 8, 2022

Is this doing automatic memoization or actually emitting optimized JIT?

cyber_kinetist · on Sept 9, 2022

It’s an AOT compiler that compiles a Python DSL to assembly via LLVM.

v3ss0n · on Sept 8, 2022

How many time this has been posted.

bragr · on Sept 8, 2022

6 times: https://news.ycombinator.com/from?site=taichi-lang.org

forgotpwd16 · on Sept 8, 2022

This shows everything from the domain. The specific article was posted once but went unnoticed.

xiaodai · on Sept 9, 2022

Obligatory metnion of Julia. BAE

bragr · on Sept 8, 2022

> I am a loyal C++/Fortran user but would like to try out Python as it is gaining increasing popularity. However, rewriting code in Python is a nightmare - I feel the performance must be more than 100x slower than before!

I think I found your problem. TBH you might like Julia more than Python and you won't have to invent a new DSL in the process.

naillo · on Sept 8, 2022

If you switch to julia you'd have to port over the entire pytorch ecosystem as well if you want to use this together with that (taichi has great compatability via `to_tensor`). Also, the kernels you write in taichi is hardly a DSL, they're nearly indistinguishable from normal python code (minus the rare occasional caveat).

bragr · on Sept 8, 2022

>If you switch to julia you'd have to port over the entire pytorch ecosystem

Not if you're porting from Fortran in the first place as the author claims.

robomartin · on Sept 8, 2022

> I feel the performance must be more than 100x slower than before!

Not too far off. This paper [0] compared 27 languages for speed and energy efficiency (which is very interesting).

Python was 72 times slower than C and consumed 76 times more energy.

I think Python is a very useful language at many levels. Great for prototyping stuff. No doubt about that. If performance and energy efficiency are important, it seems obvious one has to look elsewhere.

[0] https://haslab.github.io/SAFER/scp21.pdf

yazzku · on Sept 8, 2022

I tried Julia and I had to wait several seconds for my program to start due to JIT even for very trivial applications. Did I do anything wrong? Seemed like the worst of both worlds. Python runs slow but at least it starts fast.

nkozyra · on Sept 9, 2022

From my limited experience that's the trade-off, yes. Each of the last three Julia releases has touted startup optimizations.

pjmlp · on Sept 9, 2022

Plus while it still can't do wonders in productivity, using something like C++20 alongside VC++ or Clion, is quite approachable.

Specially with SYCL or CUDA with standard C++ latest efforts.

stackbutterflow · on Sept 8, 2022

Python is rising in popularity? I can't remember the last time I reached to Python for something. Not that it says anything about the language but it kinda disappeared from my bubble.

BerislavLopac · on Sept 8, 2022

Python is pretty much the second best available language for everything.

PufPufPuf · on Sept 8, 2022

Precisely. Python often isn't the best choice, but it's always a good enough choice.

debo_ · on Sept 9, 2022

The Red Mage of programming languages.

azinman2 · on Sept 8, 2022

It’s the first thing I’ll pick for most problems.

ShamelessC · on Sept 8, 2022

Python is extremely popular, yes.

mountainriver · on Sept 8, 2022

Ever heard of machine learning?

19h · on Sept 8, 2022

Pretty sure isPrime can't simply be replaced with k % 2 == 0 ;)

coldpie · on Sept 8, 2022

Great, now you tell me.

* Tosses doctoral thesis in the trash.

tzot · on Sept 8, 2022

Yes, the identation is incorrect in the blog post (running the code would throw an IndentationError anyway); it's correct in the github code though.

MobiusHorizons · on Sept 8, 2022

What makes you think that it was?

cyphar · on Sept 8, 2022

The indentation is broken, so one possible way to interpret it is:

  def is_prime(n: int):
      result = True
      for k in range(2, int(n ** 0.5) + 1):
          if n % k == 0:
              result = False
          break
      return result

Which is just a more complicated (n % 2 != 0). Obviously the break should be in the if block.

graton · on Sept 8, 2022

Wouldn't also need to change the function name to `isEven`?