What this actually does right now is run some internal Dropbox scripts that exercises an unknown subset of Python + python stdlib, plus the perf benchmark.
Unless I am missing something, the stuff you guys achieved have absolutely nothing to do with "self-hosting".
I am curious - how does this compare to pypy, performance wise?
Is there a big win with this approach over pypy?
The use of "self-hosting" was addressed in post. I think its use is actually right on the mark, but will grant you that it's slightly different than its use in the context of a static compiler written in the target language.
It basically means "able to host its own development". This means something different in different contexts, e.g. JS JITs "self-host" when some part of their builtin code is written in JS. We share that property with JS JITs, but in this particular case it means we can host our own build/test infrastructure. So we're using pyston to develop pyston. ergo, "self-hosting (qualified)."
---
We're definitely slower than pypy at present - we're only a small fraction faster than cpython after all on our benchmarks, and pypy is scary fast :)
I think we're still in the part of our work where the answer to the "big win with this approach" question is "we don't know." We're confident that we can/will be much faster than we are now, but we have different constraints than pypy, so it will always be something of an apples/oranges comparison.
I see. I think it's a huge stretch :)
That's like Go claiming they are self-hosting because some of their tools are written in Go.
On a related note, I have to say that I still don't understand the point of this project.
Were you guys unhappy with pypy's performance or development?
I understand that in theory, the approach you guys are taking is incompatible with pypy's, but I can't help but wonder how thing would play out if you Dropbox put their weight behind PyPy. Maybe it could've been the start of every Python programmer's wet dream: Python 2.8.
It says in the blog post that it's 1% faster than CPython (which means that it's as fast as CPython really) on their benchmarks, while PyPy is much faster than that on the same benchmarks.
I remember when Unladen Swallow project was active, they were encountering some issues with LLVM which slowed their progress down. They often had to stop to fix LLVM.
Wonder if Pyston team got a chance to look or learn from Unladen Swallow's idea or even use any of the code?
A lot of the barriers that Unladen Swallow / Reid and friends ran into were fixed as part of their efforts (e.g. MB limits on emitted code, no gdb support, etc.). Building a dynamic language runtime on top of LLVM is much much easier today than it was when Unladen Swallow tried, and I'd say it's partly thanks to their initial effort.
Right, I should have mentioned that. Though as the WebKit-related patchpoint and other bits happened more recently, I had (still?) assumed that Pyston hasn't gotten to leverage those much yet. Turns out I'm wrong: https://github.com/dropbox/pyston/blob/master/src/codegen/pa...
Currently, Pyston targets Python 2.7, only runs on x86_64 platforms, and only has been
tested on Ubuntu. Support for more platforms -- along with Python 3 compatibility -- is
planned for the future, but this is the initial target due to prioritization constraints.
...but if pypy has taught us anything, surely it's that implementing python3 after you have a working python2 implementation is a seriously huge piece of work.
Especially if you're wholesale copying unicode naive cpython code into your code base.
I'm deeply skeptical python 3 support will ever land for this.
Basically, this is 'modernize python 2 and make it faster'; there's been a lot of talk about no one being willing to pickup and maintain a 2.8 version, but this is it, effectively.
Major backer, major new features.
So, lots of good things here, but there no doubt that its going to be divisive in the community, and I'm not sure I really support that. ...promising py3 support nebulously at some point in the future doesn't fix anything.
tldr; If you ever plan on supporting python3, do it already. Otherwise don't make fake promises.
It seems the main feature to Python3 over 2.x is that it enables you to beg projects to port to Python3. But Python3 people- it's opensource. Isn't that wonderful?
There's a reason the python 2.x line is 'patch only' by the developers.
Python 3 made fundamental changes to underlying string operations internally for UFT16, which you can easily argue, was a huge mistake, but there you go.
You can't just 'patch python3 support' in. You literally have to rip out anything that uses char * in the code base and replace it with a unicode supported alternative, which is both more complex, and breaks python 2 backwards compatibility.
Pypy is very clever in how they handle this through rpython, which is why they can kind of support both; but randomly dropping cpython 2.x code into the project is completely not forward looking.
I'd be happy with: "We never intend to support python 3, sorry".
If that's the path you want to walk for all the complicated reasons you choose it, fair enough.
These goals don't have to be mutually exclusive. It would be a bit of a gamble for sure but if they had gone with Python3 they may have nudged the community to switch by the time the engine is ready for production some time in the future.
10% of Python devs is a very large number of people and it'll only grow larger. Python 3.4 is very pleasant to work with and library support is pretty good (I'm working on a py3 code base daily.)
The recent Py2 vs Py3 survey over Christmas suggests that approx. 32% of respondents write in Python 3 (increasing on last year), 68% write in Python 2 (decreasing on last year). Py3 usage is up approx. 12% on last year's survey. For personal projects Py2 and Py3 have roughly equal popularity: http://www.randalolson.com/2015/01/30/python-usage-survey-20...
At monthly PyDataLondon meetings I remind the audience to switch to Py3 (a few do each month) as Py2's sunset date is less than 5 years away now.
Yes, but the "survey" has a terrible bias of people who actually care enough to go and respond to such survey. It also means overrepresentation of python-dev people and so on. There is a very heavy bias towards Python 3 in such a survey (as opposed to say pypi download stats)
Hey fijal. Agreed that the survey has a self-selecting audience, I'd also argue that they're the more forward-thinking folk rather than jobbing background users.
Back in April 2013 (the last time I saw python.org download stats - where did they go?!) I wrote a blog post noting that fresh downloads of Windows Python 3.3 were greater than downloads of Windows Python 2.7, for 3 months running. Windows is useful as Python isn't bundled (unlike e.g. Linux and Mac). I presume this trend has continued but have no firm evidence either way: http://ianozsvald.com/2013/04/15/more-python-3-3-downloads-t...
Does anybody know if they considered a similar thing to what HotPy (2) [0] did?
I'm not anywhere near to understanding the implementation details, but HotPy stood up most when I looked into different Python implementations. Basically, it was a fork of an early development version of CPython 3.3 code, with more bytecodes added (which were easier to optimize than the originals), a tracing engine and some other things I don't understand. The best thing is you get language compatibilitity for free and it is also binary compatible with CPython extensions.
HotPy is Mark Shannon's research project, he's not making a 'new better Python for the masses', he's using it to test his assumptions about ways CPython could be improved. Ask him about it if you bump into him at a python conference!
Is it expected that Pyston play well with numpy/scipy? This is the biggest barrier that I see towards adoption of PyPy for numerical computing. If Pyston can work with numpy and scipy then it would be a huge accomplishment.
Their focus is on getting the dropbox application working under pyston. It would probably take contributions from someone outside of dropbox to make this a reality. Presumably it will be easier on pyston than with pypy since pyston has designed in better compatibility with the CPython C API.
Not downplaying how important our esteemed BDFL Guido is to Python, but I'll be flat out honest, aiming their sights on Python 2 just further entrenches its eventual place as the COBOL of 2050.
Python 2 is DEAD to me. It should be dead to you. It's Dead with a capital D. Node.js wound up with all this fork nonsense in HALF the time Python 2.7 has been stagnating the Python ecosystem.
Python 2 only code is nuclear waste grade technical debt.
I haven't written a line of Python 2 in 6 months after 2 years of slow decline.
I now use Python 2 vs 3 as an interview question.
I run my tests on 3.4.x, 3.5-dev, and versions of HEAD that pass the Python test suite.
Why does Guido working on Pyston do anything but hurt the future of Python by legitimising the position that it's ok as a community of Python programmers, to keep accruing this technical debt ?
> Python 2 is DEAD to me. It should be dead to you
You mean the kind of DEAD that has thousands of tested libraries supporting it and that runs and makes money in the bank. I like that kind of DEAD. Sign me up.
> Python 2 only code is nuclear waste grade technical debt.
It is nuclear fusion kind of code for me that just keeps making me money.
> I now use Python 2 vs 3 as an interview question.
What could you meangingfully interview about it? "Show me how to read this unicode file"?
> I run my tests on 3.4.x, 3.5-dev, and versions of HEAD that pass the Python test suite.
I run my tests on my code base and make sure they pass and have decent coverage.
> Why does Guido working on Pyston do anything but hurt the future of Python by legitimising the position that it's ok as a community of Python programmers, to keep accruing this technical debt ?
What technical debt. My code is clean and nice looking. There is no technical debt.
But pray tell what are these great features you are using in Python 3 that Python 2 doesn't have and that just call for all this great excitement?
I like me some fancy tuple unpacking and binary literals but not enough to go around yelling Python 2 is DEAD at anyone or to destabilize nice working code for it.
I recently visited a website which back end could not parse non-ascii character for credit card information. Here I am, willing to spend perfectly good money, and the stone age website only want money from people whose name is made from a-z.
Treating text and ascii as the same thing is bad. Its bad for technological reasons, its bad for economical reasons, and its just plain bad. If you think it "runs", it simply because all those customer who get "invalid character" error has gone over to a competitor and didn't bother to tell you.
> If you think it "runs", it simply because all those customer who get "invalid character" error has gone over to a competitor and didn't bother to tell you.
Silly, that's not how sales of $500K-$1M products work.
There aren't that many dependencies without py3 support anymore. Of course, if your project uses any then being stuck on py2 is a given.
I still don't think projects should port to python 3 unless they need to be moved forward or continually developed. Greenfield projects should most definitely start on py3 though.
I can't source the comment with a quick googling so take the recollection at face value, but I recall one of either the Pyston team members or Guido himself in an interview saying that he hadn't been involved so far.
As of release 0.2 he was referring to it as being "built by some of my colleagues"[1]. Looking at the list of contributors for the Pyston repo on github[2], I don't see gvanrossum among them.
One difference that was originally highlighted by the Pyston team as being significant is that this is a method based JIT, where PyPy is a tracing JIT. I think it also prioritises support for C extensions, which PyPy does not.
Unless I am missing something, the stuff you guys achieved have absolutely nothing to do with "self-hosting".
I am curious - how does this compare to pypy, performance wise? Is there a big win with this approach over pypy?