It really is remarkable that Python's numerical computing libraries have such po...

a_t48 · on April 5, 2020

numpy appears to have optional arguments for the storage location of outputs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.c... - Could you elaborate a little further? The syntax might not be as nice as another language or framework, but it's not "unavoidable".

Disclaimer: have never used numpy, have used python fairly extensively.

bobbylarrybobby · on April 5, 2020

You are right about the `out` argument, I'd forgotten about that. But even avoiding the wasteful memory allocations, numpy still is about 60% slower than Julia, as it makes multiple passes over the input data. (If there's a way to get numpy to just make a single pass over the data and remain performant, I'd love to know.)

a_t48 · on April 5, 2020

The first thing I'd reach for is to refactor into a list comprehension. Looks like this is the proper way to do it:

  for x in np.nditer(a, op_flags = ['readwrite']):
    x[...] = cos(sqrt(sin(pow(x, 2))))

That's some knarly syntax though. I've never seen that ellipsis operator?

Edit: just read about `Ellipsis`. I'm a fan, even if it's sort of nonstandard across libraries. Those readwrite flags are a travesty though, but at least you can paper over it with a helper function.

Something like:

  def np_apply(a, f):
    for x in np.nditer(a, op_flags = ['readwrite']):
      x[...] = f(x)

  np_apply(x, lambda x: return cos(sqrt(sin(pow(x, 2))))))

or

  def np_apply(a, *argv):
    for x in np.nditer(a, op_flags = ['readwrite']):
      for f in argv:
        x = f(x)
      x[...] = x

  np_apply(lambda x: return pow(x, 2), sin, sqrt, cos)

Edit3: There's a way to turn this into "pythonic" list comprehension code, but it would probably only make it look prettier rather than more performant.

quotemstr · on April 6, 2020

Yes, you can use the out arguments, but doing so 1) complicates your code, and 2) isn't necessarily a win. You have to do the same number of memory write bus cycles whether you write to a specified output array or to some new array.

a_t48 · on April 6, 2020

I agree with point (1), wholeheartedly - by default the syntax is ugly unless you wrap it in a utility. (2) isn't true as far as I can tell - you can keep reusing the same output buffer, or double buffer if the output can't be the same as the input.

Either way see https://news.ycombinator.com/item?id=22786958 where I figured out the syntax to do it in place, which shouldn't result in any allocations at all and should solve both your concerns.

quotemstr · on April 6, 2020

Regarding #2: whether you use the same buffer or a new buffer, you still have to actually write to the memory. The bottleneck at numpy scale isn't the memory allocation (mmap or whatever) but the actual memory writes (saturating the memory bus), and you need to perform the same number of writes no matter which destination array you use.

a_t48 · on April 6, 2020

It should still be a perf gain:

1. Memory allocation isn't free 2. Doing multiple loops over the buffer (once per operation) is going to be slower than doing all the operations at once, both because of caching and the opportunity to do the operations on a register and write at the end (though who knows if the python interpreter will do that).

quotemstr · on April 6, 2020

The Python interpreter is too stupid to do operation coalescing. That's the whole point of my initial comment.

nmca · on April 5, 2020

JAX supports this; as do other performance-targeting numerical libraries. There are hoops of course.

quotemstr · on April 5, 2020

> Julia appears to outperform Python by a factor 2

Now I have some more reading to do. Thanks.

timClicks · on April 5, 2020

Julia is very, very good. I say this as a person who has invested about a decade in the Python data science ecosystem. Numpy will probably always be copy-heavy, whereas Julia's JIT can optimize as aggressively as the impressive type system allows.