Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's discussed to death in the comments on that blog and others, but reading it might be very boring, so I'll repeat my stance on it (I'm a guy implementing numpy on pypy):

NumPy that's faster is already very interesting for many people, because you don't have to go to great lenghts to shift code to C or Cython to experiment. Besides it integrates seamlessly with your current stack that might be in python.

Regarding low-level API: Calling C/fortran from PyPy's numpy should be dead easy, over say ctypes. You should be able to call to whatever C libraries you wish.

Matplotlib, SciPy and scikits should be relatively easy to get working to some extend using hacks like this - http://morepypy.blogspot.com/2011/12/plotting-using-matplotl...

As for other stuff - well if it depends too much on CPython C API PORT IT. It's not that hard and once you have a respectable Python runtime, you can do it, it has been done.

Just because we won't support all possible users from day one does not mean we should not try. There are very valid usecases where people shy away from Python because as soon as you try to write a loop in Python, stuff gets to such a crawl that you can't even run experiments. I personally believe Cython is not an answer here and you actually need full python to do most, especially for unexperienced users, so we're primarily targeting the niche that can't be possibly attacked by any solution that's based on CPython.

As for other stuff - numpy even if you vectorize stuff is nowhere near the speed of C. We try to attack that as well and even surpass C eventually.

This is pretty much it, feel free to ask more questions.



Your statement that NumPy is "nowhere near the speed of C" is false and misleading. For some people "NumPy" is actually just a front-end to the vendor-optimized libraries that actually do the work (and your C-coded loops are going to be much slower than those). For most operations (with large-enough vectors) NumPy is only 2x slower than a specific crafted C-loop.

Yes, there are generic operations in NumPy that you can speed -up with specific code in C (or any-other compiled language). In addition, there is much low-hanging fruit to optimize in NumPy as well (which we at Continuum are working on as I write this).

Fijal, I know you are enthusiastic about PyPy and you should be --- it's a cool system. But, please don't spread mis-information. There are a lot of people who don't understand enough about the details of what you are talking about, and you are just going to alienate them once they realize that you don't have all your facts about NumPy clear.

For people who only make occasional use of NumPy, PyPy and it's version of numpy will likely be fine. But, those people should be well-aware that they are intentionally remaining outside the larger Python/NumPy ecosystem (Matplotlib, SciPy, scikits, etc.) and it will be a long-haul to build the features in PyPy to enable that ecosystem to migrate (and that assumes the individual projects decide that it's even worthwhile to do so).


I think I disagree pretty much about every single point you make. First for something as simple as laplace equation solver numpy vectorized loop is 35ms per loop vs 6.3ms for C. As your list of operations increase, your need of intermediates grow and your speed decreases, but let's not go to details. Obviously if you just call a vendor-optimized library, you can use whatever you feel like and it'll be equally good, be it PyPy, be it numpy, be it matlab.

You consistently spread rumor that we intend to reimplement all of scipy/matplotlib/scikits etc in RPython and this is plain false. I think those projects are completely reusable using one hack or another, for example the blog post I posted where within a day I was able to draw basic stuff using matplotlib on PyPy. We seriously want to reuse as much code as possible from the entire ecosystem, but also a part of the project is to provide people with a really fast python that can perform numeric computations.

Also, which facts about numpy I didn't get clear?


First for something as simple as laplace equation solver numpy vectorized loop is 35ms per loop vs 6.3ms for C.

Isn't numexpr a good solution for that problem? numexpr is much much simpler than PyPy, so if it can reduce the performance overhead of complex vector operations with loop fusion, that seems a huge win.


Yes; numexpr and weave are pretty reasonable solutions, although a little "weird" because they take opaque strings.

One bit of context that is missing from this discussion (although Travis did allude to it earlier) is that we are actively working on building robust deferred computation support into Numpy, and to make these run much faster than what hand-tuned C can provide, via a variety of mechanisms.

(Disclaimer: I also work with Travis at http://continuum.io, along with the author of Numexpr and PyTables. :-)


[citation needed] as for numexpr being simpler than PyPy. Using the dumbest measure of simplicity, total code size, it fails: PyPy's NumPy is 167KB of code, NumExpr is 181KB.


Um, that's not a dumb measure of simplicity, it's just a measure of code size. You might as well look at the average number of characters in the names of the authors of the two projects; it would be equally irrelevant.

Here is a real measure of simplicity: how long does it take to explain to a Numpy user how to wrap an array expression in a string, versus explaining how a JITting compiler compiler works and how to interface its runtime to their existing Python installation and how to build it and what the limitations of RPython are.

Heck, I'm an actual developer (not a scientific programmer) and it took me a little while to understand what PyPy does.


>I'm an actual developer (not a scientific programmer) and it took me a little while to understand what PyPy does.

I'm a scientist, not a scientific programmer or a developer and this is all I really care about: PyPy is currently--in it's partially implemented state--much, much faster than CPython on the vast majority of things it can do. If I am able to use PyPy's NumPy and it's faster than traditional NumPy I will do so as long as the opportunity cost doesn't outweigh the speed increases (NumPy is pretty useless to me--maybe not some--without SciPy and matplotlib).

I don't care that PyPy is written in RPython any more than I care that it has a JIT, or that CPython is written in C. I also don't care how that JIT works or how CPython compiles to byte code or how Jython does magic to make my Python code run on the JVM. I do care that it "works," as a scientist. I do care that they are "correct" implementations, as a scientist. As an individual, I am interested in the inner workings of PyPy and CPython, CoffeeScript, Go, and Brain Fuck, but when I'm working on research the only thing that actually is important as far as the language implementation is concerned is that it just works. The interpreter is just a brand of the particular tool that I'm using.

I would certainly prefer it if PyPy was 100% compatible with the CPython C API even if it was at 80% (maybe even 60%) of the CPython C API speed because then I don't even have to think. I'd be using PyPy because it's faster overall and I can do the analyses I want faster.

Anyway, I think if you're explaining all of what you mentioned to a NumPy user or a PyPy NumPy user you'd be doing it wrong. Or maybe the PyPy folks would be doing it wrong. Because this is how that conversation would go with my peers.

  Sad Panda: "Ugh my code is running slowly I think I have to jump into C"
  Me: "Have you tried PyPy's NumPy yet?"
  Sad Panda: "What's that?"
  Me: "It's faster Python and NumPy. Go here [link] and download it see if it runs your code faster"
  Sad Panda: "Okay I'll do that"
  ..a while later..
  Sad Panda: "It was a little faster, but I ended up getting one of the CS guys to help me run it on a tesla cluster with OpenCL. But I think I can use it on the spiny lumpsucker data I'm collecting."


>I would certainly prefer it if PyPy was 100% compatible with the CPython C API even if it was at 80% (maybe even 60%) of the CPython C API speed because then I don't even have to think. I'd be using PyPy because it's faster overall and I can do the analyses I want faster.

While part of me agrees with this, if PyPy starts sacrificing performance for CPython compatibility then pretty soon it'll degenerate into CPython.


why the people wrapping stuff in strings have to understand the limitations of RPython? It's "wrap expression in strings" vs "do nothing".


numexpr is much simpler than pypy. I never said that it was much simpler than pypy's incomplete numpy support.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: