This project keeps amazing me. It seems like a R&D lab going for moonshots on the python language, and yet achieving incredible practical results.
Does anyone know why it still hasn't managed to get enough traction to become the official path for the language, and why it still needs mozilla support to get funds ? My intuition is that it's just too complex. But haskell,scala,swift and rust also have a lot of complexity so that can't be the only reason.
> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language
I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build.
It's healthy for a language to have more than one implementation. A big win would be a linux distro selecting pypy for use as the default Python. But IMO many python scripts that folks use just don't run long enough to see a benefit from PyPy.
> "I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build."
You're doing good wagering, Guido mentioned just this at several keynotes (source: skim through his PyCon/EuroPython keynotes, there's a good chance he addresses the point or answers a question about it). I'd paraphrase him with: although now a big project with optimizations here and there obscuring the purpose of many parts, CPython remains a boring well-understood "no rocket science" C project. As an example of this, I remember him mentioning how the opcode dispatcher remains a giant `switch` statement rather than something more intricate.
That doesn't mean he dislikes or wants to hinder PyPy (or IronPython or Jython or Pyston or ...), just that he's fine with CPython reach and trade-offs as the "default python", and if you prefer another Python with different performance/compatibility/featureset/xyz characteristics, you're welcome to grab it :)
But it sounds like he implicitly hinders CPython! It sounds like “we won't merge code I don't understand” and no brave intelligent soul will risk improving the interpreter.
Isn't that the explicit job of a BDFL though? To enforce standards? His happens to be simplicity at all costs, and while some might disagree, I think it's generally a good pursuit; as others state, alternative implementations are available if you really need some JITing black magic or Java integration or green threads or...
What's wrong with "we won't merge code we don't understand"? I think a lot of people should hold code to that standard. I think almost everyone does hold code to that standard.
It's a problem when leadership is not ~~merit~~ engagement based (BFDL) and when said leadership is lacking the vision and gut to push Python across new boundaries. I would like to see something more “aggressive” here.
The leadership of Python, in the form of Guido, is certainly merit- and engagement-based. Nothing needs to be 'more aggressive' in Python development. It's perfectly fine for the standard, canonical version of Python to be the version that:
* Is most up-to-date and has all the latest features
* Everyone is most familiar with
* Already is packaged in all operating systems
* Is the most stable
* Is compatible with all C extensions
> I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build.
At a EuroPython Keynote, the BDFL mentioned that he hasn't had a closer look at PyPy (he mentioned downloading it and playing with it for a few minutes). I.e. there is a certain disinterest. Also, remember that the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/#id3) was written about the design principles of the Python interpreter, and PyPy is not exactly the Zen of python.
Personally, I'd love to see Python 4 to be based entirely on PyPy.
> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language
One reason is that CPython is kept relatively simple on purpose, so it abstains from implementing things like a jit or other very complex optimizations. The biggest obstacle for pypy adoption is that it can't serve as a drop in replacement in the majority of cases, be it because its behind (until now far behind) in supporting new versions of the language or because
a lot of popular packages rely on C extensions. The CPython way of binding to C on which these packages rely on does not fit the architecture of pypy very well so they did not really support it at all until recently.
> Does anyone know why it still hasn't managed to get enough traction to become the official path for the language
The people who really care about Python performance (sci. comp & HPC) are the ones driving the (non-)adoption of PyPy. One, the existing solution (CPython + Cython) is pretty mature. Two, they require the holy trinity: Numpy, Scipy and Matplotlib. None of these are supported on PyPy, and only Numpy is even close.
Care to elaborate? I agree that numpypy is pretty much complete, though I have not found it to be faster in my limited testing. I was under the impression that a great deal of work would be required to get the other two running, even using CPyExt (which would not be faster, which was the whole point to begin with). I'd love to be proved wrong though.
Real world use cases are large and messy. Now that the cython/pandas/numpy stack works out of the box on PyPy, we can begin to see how often the mix of python code (jittable, fast) and c code (fast) needs to cross the c-to-python barrier (slow). There may be some cases, like NLP where speeding up parsing and io can make a difference
numpy, pandas (mostly) and (scipy) mostly work through cpyext. We're considering how to make them fast now. It would never be faster on large_array + large_array, but it should be faster (by a lot) with element access etc., bit like numba
What is the experience for someone trying it out for the first time and trying to install packages in it? What is the UX for someone building packages and verifying that they work?
I have a performance-critical Python program that I started running in pypy for an extra speedup. It was trivial: I downloaded the package, untarred it, and then ran my program with ~/Downloads/pypy-whateverversion/bin/pypy ./myprogram.py and it worked. Fancier stuff may require more effort, but the basics are really simple.
Do you mean to refer to the Global Interpreter Lock as the factor which prevents it from becoming the official path for the language? CPython, the reference implementation, has the GIL as well.
Or the PIL library (though everyone seems to use Pillow now?)
oops, I meant GIL.
I work with scientific computing.
In my experience, the only justification I have of using other language (mostly C++) over python is to get over the interpreter lock, which over complicates your life if you want parallelisation. Multiprocessing is just awful, specially if you work with big chunks of data (I do mostly medical imaging).
PyPy, even if it's faster, still has the same GIL constraints as CPython, so not really a big advantage IMO.
For my (and I imagine many among us) use case, this is the takeaway:
PyPy2 can now import and run many C-extension packages, among the most notable are Numpy, Cython, and Pandas. Performance may be slower than CPython, especially for frequently-called short C functions. Please let us know if your use case is slow, we have ideas how to make things faster but need real-world examples (not micro-benchmarks) of problematic code.
It would probably be better to port packages like Numpy to pure PyPy[0], but of course the CPython extensions compatibility layer is by itself important to increase the adoption of PyPy.
The problem besides the jit not being able to inline external functions is that the CPython structures, and especially the refcounting used in CPython is problematic [1]
I've tried optimizing sympy and it's hard - there is a chance that it uses cython (which will slow things down on pypy) but also the gains were small (up to 2x) and only noticable if you run it for a while (at least few seconds).
SymPy is a bit of an example that while python as a language is not necesarilly slow, since you can make a plan for most parts, it breeds culture of unnecessary meta-programming, tons of copies and dict lookups that makes it REALLY HARD to optimize. ORMs go in the same category
Did you try x64 version of PyPy? x86 version is necessarily much slower (both CPython and PyPy) if your program is long-heavy. SymPy is actually part of benchmark used by PyPy and PyPy should be faster if you compare x64 to x64.
The Mac binary from the download page works perfectly for me. I tried a piece of numerical code I have been working on which makes some use of numpy but also uses element-by-element loops. The script takes 2 minutes with CPython, 3 minutes using pypy with NumPy via cpyext, and less than 2 SECONDS with pypy using NumPyPy. Wow!
However, when running the linux binaries on CentOS, they both (32- and 64-bit) fail for me with shared library errors:
./pypy2-v5.7.0-linux32/bin/pypy: error while loading shared libraries: libexpat.so.1: cannot open shared object file: No such file or directory
The 64-bit version gives the same library error but regarding libssl.so.1.0.0. I verified that I do have both expat and openssl installed (using yum), although I don't think that should be needed anyway. Anyone know what is happening here?
"[1]: stating it again: the Linux binaries are provided for the distributions listed here. If your distribution is not exactly [Ubuntu 12.04 and 14.04], it won't work, you will probably see: pypy: error while loading shared libraries: …. Unless you want to hack a lot, try out the portable Linux binaries." https://github.com/squeaky-pl/portable-pypy#portable-pypy-di...
One question to PyPy developers: CPython 3.5 [1] introduced math.gcd which uses Lehmer's algorithm [2], are there any plans to include it in PyPy, too?
What isn't supported by pypy? Thinking in terms of a modern Django app, what should I be looking out for? Is libxml supported? Database drivers like pyscopg?
Very little, IMO. I'm rarely disappointed at whatever I throw at it. I have seen Django apps work correctly before and I would expect libxml and psycopg to work correctly too.
Thanks for that. I just ran a quick django shell test, and fetching 50,000 objects took 6.62 seconds in cpython, and 3.62 seconds in pypy. I haven't done a thorough test of the full app yet, but the basics appear to be working just fine.
well, I'm very sorry you wished we stopped supporting our users, but this is the reality which we'll be stuck in for a while. python 3 adoption has been speeding up, but still the vast majority of old big projects are based in python 2 and those are potential uses for pypy.
Mozilla helped to solve this problem and one is orogress. Please encourage your company to donate to ooen source in general (maybe it already does?) and to PyPy in particular
Thanks for the sentiment. We are major sponsors of PyCon and have donated to PyPy.
What I'm trying to suggest is that the amount they've received to support Python 3 ($250,000 from Mozilla, $66,677 in public donations) is signal that there is major demand for this.
Could they use the infra that is building binaries for PyPy2 for PyPy3?
I understand where they are coming from "Existing project are using Python2" but I'd like to see them focus more on the future.
I love what PyPy is doing and I want them to succeed but I can't imagine their success coming out of Python2. I see them more competing with more modern performant languages that are currently stealing from Python (i.e Go/Rust).
If they focused on the future they could directly compete in that market of projects that are rewriting in other languages for performance.
Does anyone know why it still hasn't managed to get enough traction to become the official path for the language, and why it still needs mozilla support to get funds ? My intuition is that it's just too complex. But haskell,scala,swift and rust also have a lot of complexity so that can't be the only reason.