Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It really is remarkable that Python's numerical computing libraries have such poor performance. When doing chains of elementwise operations on large arrays (such as toy example `elementwise cos(sqrt(sin(pow(array, 2))))`), Julia appears to outperform Python by a factor 2! Numpy cannot avoid computing each intermediate array, which means it has to allocate a ton of wasteful memory. Meanwhile Julia does the smart thing and coalesces all operations into one and applies that single operation elementwise, allocating only a single new array.

Pandas does not also defer computations, which means computing Boolean functions that include the same data multiple times must make multiple passes over said data. Absurd.



numpy appears to have optional arguments for the storage location of outputs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.c... - Could you elaborate a little further? The syntax might not be as nice as another language or framework, but it's not "unavoidable".

Disclaimer: have never used numpy, have used python fairly extensively.


You are right about the `out` argument, I'd forgotten about that. But even avoiding the wasteful memory allocations, numpy still is about 60% slower than Julia, as it makes multiple passes over the input data. (If there's a way to get numpy to just make a single pass over the data and remain performant, I'd love to know.)


The first thing I'd reach for is to refactor into a list comprehension. Looks like this is the proper way to do it:

  for x in np.nditer(a, op_flags = ['readwrite']):
    x[...] = cos(sqrt(sin(pow(x, 2))))
That's some knarly syntax though. I've never seen that ellipsis operator?

Edit: just read about `Ellipsis`. I'm a fan, even if it's sort of nonstandard across libraries. Those readwrite flags are a travesty though, but at least you can paper over it with a helper function.

Something like:

  def np_apply(a, f):
    for x in np.nditer(a, op_flags = ['readwrite']):
      x[...] = f(x)

  np_apply(x, lambda x: return cos(sqrt(sin(pow(x, 2))))))
or

  def np_apply(a, *argv):
    for x in np.nditer(a, op_flags = ['readwrite']):
      for f in argv:
        x = f(x)
      x[...] = x

  np_apply(lambda x: return pow(x, 2), sin, sqrt, cos)
Edit3: There's a way to turn this into "pythonic" list comprehension code, but it would probably only make it look prettier rather than more performant.


Yes, you can use the out arguments, but doing so 1) complicates your code, and 2) isn't necessarily a win. You have to do the same number of memory write bus cycles whether you write to a specified output array or to some new array.


I agree with point (1), wholeheartedly - by default the syntax is ugly unless you wrap it in a utility. (2) isn't true as far as I can tell - you can keep reusing the same output buffer, or double buffer if the output can't be the same as the input.

Either way see https://news.ycombinator.com/item?id=22786958 where I figured out the syntax to do it in place, which shouldn't result in any allocations at all and should solve both your concerns.


Regarding #2: whether you use the same buffer or a new buffer, you still have to actually write to the memory. The bottleneck at numpy scale isn't the memory allocation (mmap or whatever) but the actual memory writes (saturating the memory bus), and you need to perform the same number of writes no matter which destination array you use.


It should still be a perf gain:

1. Memory allocation isn't free 2. Doing multiple loops over the buffer (once per operation) is going to be slower than doing all the operations at once, both because of caching and the opportunity to do the operations on a register and write at the end (though who knows if the python interpreter will do that).


The Python interpreter is too stupid to do operation coalescing. That's the whole point of my initial comment.


JAX supports this; as do other performance-targeting numerical libraries. There are hoops of course.


> Julia appears to outperform Python by a factor 2

Now I have some more reading to do. Thanks.


Julia is very, very good. I say this as a person who has invested about a decade in the Python data science ecosystem. Numpy will probably always be copy-heavy, whereas Julia's JIT can optimize as aggressively as the impressive type system allows.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: