Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PyPy2.7 and PyPy3.5 v5.7 released (morepypy.blogspot.com)
230 points by mattip on March 21, 2017 | hide | past | favorite | 51 comments


This project keeps amazing me. It seems like a R&D lab going for moonshots on the python language, and yet achieving incredible practical results.

Does anyone know why it still hasn't managed to get enough traction to become the official path for the language, and why it still needs mozilla support to get funds ? My intuition is that it's just too complex. But haskell,scala,swift and rust also have a lot of complexity so that can't be the only reason.


> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language

I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build.

It's healthy for a language to have more than one implementation. A big win would be a linux distro selecting pypy for use as the default Python. But IMO many python scripts that folks use just don't run long enough to see a benefit from PyPy.


> "I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build."

You're doing good wagering, Guido mentioned just this at several keynotes (source: skim through his PyCon/EuroPython keynotes, there's a good chance he addresses the point or answers a question about it). I'd paraphrase him with: although now a big project with optimizations here and there obscuring the purpose of many parts, CPython remains a boring well-understood "no rocket science" C project. As an example of this, I remember him mentioning how the opcode dispatcher remains a giant `switch` statement rather than something more intricate.

That doesn't mean he dislikes or wants to hinder PyPy (or IronPython or Jython or Pyston or ...), just that he's fine with CPython reach and trade-offs as the "default python", and if you prefer another Python with different performance/compatibility/featureset/xyz characteristics, you're welcome to grab it :)


But it sounds like he implicitly hinders CPython! It sounds like “we won't merge code I don't understand” and no brave intelligent soul will risk improving the interpreter.


Isn't that the explicit job of a BDFL though? To enforce standards? His happens to be simplicity at all costs, and while some might disagree, I think it's generally a good pursuit; as others state, alternative implementations are available if you really need some JITing black magic or Java integration or green threads or...


What's wrong with "we won't merge code we don't understand"? I think a lot of people should hold code to that standard. I think almost everyone does hold code to that standard.


It's a problem when leadership is not ~~merit~~ engagement based (BFDL) and when said leadership is lacking the vision and gut to push Python across new boundaries. I would like to see something more “aggressive” here.


The leadership of Python, in the form of Guido, is certainly merit- and engagement-based. Nothing needs to be 'more aggressive' in Python development. It's perfectly fine for the standard, canonical version of Python to be the version that:

* Is most up-to-date and has all the latest features * Everyone is most familiar with * Already is packaged in all operating systems * Is the most stable * Is compatible with all C extensions


> I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build.

At a EuroPython Keynote, the BDFL mentioned that he hasn't had a closer look at PyPy (he mentioned downloading it and playing with it for a few minutes). I.e. there is a certain disinterest. Also, remember that the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/#id3) was written about the design principles of the Python interpreter, and PyPy is not exactly the Zen of python.

Personally, I'd love to see Python 4 to be based entirely on PyPy.


> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language

One reason is that CPython is kept relatively simple on purpose, so it abstains from implementing things like a jit or other very complex optimizations. The biggest obstacle for pypy adoption is that it can't serve as a drop in replacement in the majority of cases, be it because its behind (until now far behind) in supporting new versions of the language or because a lot of popular packages rely on C extensions. The CPython way of binding to C on which these packages rely on does not fit the architecture of pypy very well so they did not really support it at all until recently.


> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language

The people who really care about Python performance (sci. comp & HPC) are the ones driving the (non-)adoption of PyPy. One, the existing solution (CPython + Cython) is pretty mature. Two, they require the holy trinity: Numpy, Scipy and Matplotlib. None of these are supported on PyPy, and only Numpy is even close.


numpy works, scipy & matplotlib are very close. Stop spreading false information please.

That said, the cpython+cython solution is not going away any time soon.


> numpy works, scipy & matplotlib are very close.

Care to elaborate? I agree that numpypy is pretty much complete, though I have not found it to be faster in my limited testing. I was under the impression that a great deal of work would be required to get the other two running, even using CPyExt (which would not be faster, which was the whole point to begin with). I'd love to be proved wrong though.


Real world use cases are large and messy. Now that the cython/pandas/numpy stack works out of the box on PyPy, we can begin to see how often the mix of python code (jittable, fast) and c code (fast) needs to cross the c-to-python barrier (slow). There may be some cases, like NLP where speeding up parsing and io can make a difference


numpy, pandas (mostly) and (scipy) mostly work through cpyext. We're considering how to make them fast now. It would never be faster on large_array + large_array, but it should be faster (by a lot) with element access etc., bit like numba


What is the experience for someone trying it out for the first time and trying to install packages in it? What is the UX for someone building packages and verifying that they work?


I have a performance-critical Python program that I started running in pypy for an extra speedup. It was trivial: I downloaded the package, untarred it, and then ran my program with ~/Downloads/pypy-whateverversion/bin/pypy ./myprogram.py and it worked. Fancier stuff may require more effort, but the basics are really simple.


PIL


Expound, please?

Do you mean to refer to the Global Interpreter Lock as the factor which prevents it from becoming the official path for the language? CPython, the reference implementation, has the GIL as well.

Or the PIL library (though everyone seems to use Pillow now?)


oops, I meant GIL. I work with scientific computing. In my experience, the only justification I have of using other language (mostly C++) over python is to get over the interpreter lock, which over complicates your life if you want parallelisation. Multiprocessing is just awful, specially if you work with big chunks of data (I do mostly medical imaging). PyPy, even if it's faster, still has the same GIL constraints as CPython, so not really a big advantage IMO.


We're working on it :-) In fact there are two projects - one is to just remove the GIL, one is to add STM-capabilities.


For my (and I imagine many among us) use case, this is the takeaway:

PyPy2 can now import and run many C-extension packages, among the most notable are Numpy, Cython, and Pandas. Performance may be slower than CPython, especially for frequently-called short C functions. Please let us know if your use case is slow, we have ideas how to make things faster but need real-world examples (not micro-benchmarks) of problematic code.


It would probably be better to port packages like Numpy to pure PyPy[0], but of course the CPython extensions compatibility layer is by itself important to increase the adoption of PyPy.

The problem besides the jit not being able to inline external functions is that the CPython structures, and especially the refcounting used in CPython is problematic [1]

[0]: Actually specifically numpy is being rewritten in PyPy as a RPython Mixed Module: http://doc.pypy.org/en/latest/extending.html#rpython-mixed-m... [1]: http://doc.pypy.org/en/latest/discussion/rawrefcount.html


There's an attempt to create such NumPy fork already: https://bitbucket.org/pypy/numpy


If it's useful at all, PyPy 5.7.0 was noticeably slower than CPython 2.7.11 x64 when I was using SymPy. Haven't tried out the new release though.


Hi

I've tried optimizing sympy and it's hard - there is a chance that it uses cython (which will slow things down on pypy) but also the gains were small (up to 2x) and only noticable if you run it for a while (at least few seconds).

SymPy is a bit of an example that while python as a language is not necesarilly slow, since you can make a plan for most parts, it breeds culture of unnecessary meta-programming, tons of copies and dict lookups that makes it REALLY HARD to optimize. ORMs go in the same category


Did you try x64 version of PyPy? x86 version is necessarily much slower (both CPython and PyPy) if your program is long-heavy. SymPy is actually part of benchmark used by PyPy and PyPy should be faster if you compare x64 to x64.


Nope, I'm on Windows.


subset of sympy can run with symengine for speedup


This is amazing! I had no idea we were so close to 3.x and cpyext. Congratulations to the pypy team for their brilliant work.

Perhaps it's almost time for some numpy / pandas stuff in speed.pypy.org :)


The Mac binary from the download page works perfectly for me. I tried a piece of numerical code I have been working on which makes some use of numpy but also uses element-by-element loops. The script takes 2 minutes with CPython, 3 minutes using pypy with NumPy via cpyext, and less than 2 SECONDS with pypy using NumPyPy. Wow!

However, when running the linux binaries on CentOS, they both (32- and 64-bit) fail for me with shared library errors:

./pypy2-v5.7.0-linux32/bin/pypy: error while loading shared libraries: libexpat.so.1: cannot open shared object file: No such file or directory

The 64-bit version gives the same library error but regarding libssl.so.1.0.0. I verified that I do have both expat and openssl installed (using yum), although I don't think that should be needed anyway. Anyone know what is happening here?


From the download page:

"[1]: stating it again: the Linux binaries are provided for the distributions listed here. If your distribution is not exactly [Ubuntu 12.04 and 14.04], it won't work, you will probably see: pypy: error while loading shared libraries: …. Unless you want to hack a lot, try out the portable Linux binaries." https://github.com/squeaky-pl/portable-pypy#portable-pypy-di...


Congratulations and thank you!

One question to PyPy developers: CPython 3.5 [1] introduced math.gcd which uses Lehmer's algorithm [2], are there any plans to include it in PyPy, too?

[1] https://bugs.python.org/issue22486 [2] https://en.wikipedia.org/wiki/Lehmer%27s_GCD_algorithm


Hi

Yes, absolutely. This is why it's marked as "beta", so not absolutely all the features are implemented.


What isn't supported by pypy? Thinking in terms of a modern Django app, what should I be looking out for? Is libxml supported? Database drivers like pyscopg?


Very little, IMO. I'm rarely disappointed at whatever I throw at it. I have seen Django apps work correctly before and I would expect libxml and psycopg to work correctly too.


psycopg2cffi is available to provide Postgres support.


Thanks for that. I just ran a quick django shell test, and fetching 50,000 objects took 6.62 seconds in cpython, and 3.62 seconds in pypy. I haven't done a thorough test of the full app yet, but the basics appear to be working just fine.

Definitely something I'll be looking into more.


My respect for pypy grows with them nearing CPython's current reality (3.x features). This is great news!


What is the current recommended MySQL driver to use with PyPy? And please don't tell me to use Postgres.


Still, use postgres ;) There are a few pure-python drivers you can use if you need that (oursql?).


Will matplotlib ever be supported?


yes, it "mostly works". There is an issue about graphic backends (you can either save it or use jupyter), but otherwise yes.


Whoops, I meant for Windows. I can't get it to install on Windows...


is pyqt supported?


no


pypy2.7? python 2 should just die.

Because of these people doing work for the ecosystem in 10 years we will still see this monstruosity: "for python2 do this and for python3 do this"


well, I'm very sorry you wished we stopped supporting our users, but this is the reality which we'll be stuck in for a while. python 3 adoption has been speeding up, but still the vast majority of old big projects are based in python 2 and those are potential uses for pypy.


Although I disagree with the sentiment that PyPy2/Python2 should die, I also disagree with the PyPy team treating PY3 as a second class citizen.

For this release PyPy2 shipped with binaries for 11 platforms, PyPy3 shipped with 1.


Mozilla helped to solve this problem and one is orogress. Please encourage your company to donate to ooen source in general (maybe it already does?) and to PyPy in particular


Thanks for the sentiment. We are major sponsors of PyCon and have donated to PyPy.

What I'm trying to suggest is that the amount they've received to support Python 3 ($250,000 from Mozilla, $66,677 in public donations) is signal that there is major demand for this.

Could they use the infra that is building binaries for PyPy2 for PyPy3?

I understand where they are coming from "Existing project are using Python2" but I'd like to see them focus more on the future.

I love what PyPy is doing and I want them to succeed but I can't imagine their success coming out of Python2. I see them more competing with more modern performant languages that are currently stealing from Python (i.e Go/Rust).

If they focused on the future they could directly compete in that market of projects that are rewriting in other languages for performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: