Pyston 0.3: Self-hosting Sufficiency

eiopa · on Feb 24, 2015

What this actually does right now is run some internal Dropbox scripts that exercises an unknown subset of Python + python stdlib, plus the perf benchmark.

Unless I am missing something, the stuff you guys achieved have absolutely nothing to do with "self-hosting".

I am curious - how does this compare to pypy, performance wise? Is there a big win with this approach over pypy?

ctoshok · on Feb 25, 2015

The use of "self-hosting" was addressed in post. I think its use is actually right on the mark, but will grant you that it's slightly different than its use in the context of a static compiler written in the target language.

It basically means "able to host its own development". This means something different in different contexts, e.g. JS JITs "self-host" when some part of their builtin code is written in JS. We share that property with JS JITs, but in this particular case it means we can host our own build/test infrastructure. So we're using pyston to develop pyston. ergo, "self-hosting (qualified)."

---

We're definitely slower than pypy at present - we're only a small fraction faster than cpython after all on our benchmarks, and pypy is scary fast :)

I think we're still in the part of our work where the answer to the "big win with this approach" question is "we don't know." We're confident that we can/will be much faster than we are now, but we have different constraints than pypy, so it will always be something of an apples/oranges comparison.

eiopa · on Feb 25, 2015

I see. I think it's a huge stretch :) That's like Go claiming they are self-hosting because some of their tools are written in Go.

On a related note, I have to say that I still don't understand the point of this project. Were you guys unhappy with pypy's performance or development?

I understand that in theory, the approach you guys are taking is incompatible with pypy's, but I can't help but wonder how thing would play out if you Dropbox put their weight behind PyPy. Maybe it could've been the start of every Python programmer's wet dream: Python 2.8.

rian · on Feb 25, 2015

You really could have just said "dream."

eiopa · on Feb 25, 2015

I think "wet dream" captures it better.

vidarh · on Feb 25, 2015

That's the first time I've seen someone use self-hosting in that manner, or in the manner you describe in your examples.

To me it was totally misleading.

illumen · on Feb 25, 2015

The author(s) admitted it was click bait. Not sure why they would do that... it seems strange. Why should we believe anything else they write?

rguillebert · on Feb 25, 2015

It says in the blog post that it's 1% faster than CPython (which means that it's as fast as CPython really) on their benchmarks, while PyPy is much faster than that on the same benchmarks.

Check speed.pypy.org and speed.pyston.org

rdtsc · on Feb 24, 2015

Great work!

I remember when Unladen Swallow project was active, they were encountering some issues with LLVM which slowed their progress down. They often had to stop to fix LLVM.

Wonder if Pyston team got a chance to look or learn from Unladen Swallow's idea or even use any of the code?

boulos · on Feb 24, 2015

A lot of the barriers that Unladen Swallow / Reid and friends ran into were fixed as part of their efforts (e.g. MB limits on emitted code, no gdb support, etc.). Building a dynamic language runtime on top of LLVM is much much easier today than it was when Unladen Swallow tried, and I'd say it's partly thanks to their initial effort.

acdha · on Feb 24, 2015

I'd bet they've also benefited considerably from the work done with the WebKit FTLJIT project, too:

https://trac.webkit.org/wiki/FTLJIT

boulos · on Feb 24, 2015

Right, I should have mentioned that. Though as the WebKit-related patchpoint and other bits happened more recently, I had (still?) assumed that Pyston hasn't gotten to leverage those much yet. Turns out I'm wrong: https://github.com/dropbox/pyston/blob/master/src/codegen/pa...

shadowmint · on Feb 25, 2015

Unless I've deeply misunderstood the code on github, this appears to be a python 2 compatible implementation.

I have mixed feelings about that.

coldtea · on Feb 25, 2015

Python 2 isn't going away any time soon. Less than 10% of the community uses 3 according to pip version usages...

shadowmint · on Feb 25, 2015

I know, but the project page says:

    Currently, Pyston targets Python 2.7, only runs on x86_64 platforms, and only has been 
    tested on Ubuntu. Support for more platforms -- along with Python 3 compatibility -- is 
    planned for the future, but this is the initial target due to prioritization constraints.

...but if pypy has taught us anything, surely it's that implementing python3 after you have a working python2 implementation is a seriously huge piece of work.

Especially if you're wholesale copying unicode naive cpython code into your code base.

I'm deeply skeptical python 3 support will ever land for this.

Basically, this is 'modernize python 2 and make it faster'; there's been a lot of talk about no one being willing to pickup and maintain a 2.8 version, but this is it, effectively.

Major backer, major new features.

So, lots of good things here, but there no doubt that its going to be divisive in the community, and I'm not sure I really support that. ...promising py3 support nebulously at some point in the future doesn't fix anything.

tldr; If you ever plan on supporting python3, do it already. Otherwise don't make fake promises.

BuckRogers · on Feb 25, 2015

It seems the main feature to Python3 over 2.x is that it enables you to beg projects to port to Python3. But Python3 people- it's opensource. Isn't that wonderful?

Python3 patches welcome.

shadowmint · on Feb 25, 2015

Don't be ridiculous.

There's a reason the python 2.x line is 'patch only' by the developers.

Python 3 made fundamental changes to underlying string operations internally for UFT16, which you can easily argue, was a huge mistake, but there you go.

You can't just 'patch python3 support' in. You literally have to rip out anything that uses char * in the code base and replace it with a unicode supported alternative, which is both more complex, and breaks python 2 backwards compatibility.

Pypy is very clever in how they handle this through rpython, which is why they can kind of support both; but randomly dropping cpython 2.x code into the project is completely not forward looking.

I'd be happy with: "We never intend to support python 3, sorry".

If that's the path you want to walk for all the complicated reasons you choose it, fair enough.

BuckRogers · on Feb 25, 2015

If you want Pyston to support Python3, get ta portin'.

I'm sure Dropbox would appreciate the help more than the soapboxing.

illicium · on Feb 25, 2015

Releasing a new Python VM that only targets Python 2 certainly isn't going to help improve that statistic

BuckRogers · on Feb 25, 2015

Dropbox's interest is in getting things done, not pushing Python3.

illumen · on Feb 25, 2015

Then why aren't they using an existing python jit implementation if they want to get things done? Numba or pypy?

eiopa · on Feb 25, 2015

I was wondering the same thing

BuckRogers · on Feb 26, 2015

http://lmgtfy.com/?q=Why+did+dropbox+create+Pyston%3F

infogulch · on Feb 25, 2015

These goals don't have to be mutually exclusive. It would be a bit of a gamble for sure but if they had gone with Python3 they may have nudged the community to switch by the time the engine is ready for production some time in the future.

baq · on Feb 25, 2015

10% of Python devs is a very large number of people and it'll only grow larger. Python 3.4 is very pleasant to work with and library support is pretty good (I'm working on a py3 code base daily.)

IanOzsvald · on Feb 25, 2015

The recent Py2 vs Py3 survey over Christmas suggests that approx. 32% of respondents write in Python 3 (increasing on last year), 68% write in Python 2 (decreasing on last year). Py3 usage is up approx. 12% on last year's survey. For personal projects Py2 and Py3 have roughly equal popularity: http://www.randalolson.com/2015/01/30/python-usage-survey-20...

At monthly PyDataLondon meetings I remind the audience to switch to Py3 (a few do each month) as Py2's sunset date is less than 5 years away now.

fijal · on Feb 25, 2015

Yes, but the "survey" has a terrible bias of people who actually care enough to go and respond to such survey. It also means overrepresentation of python-dev people and so on. There is a very heavy bias towards Python 3 in such a survey (as opposed to say pypi download stats)

IanOzsvald · on Feb 25, 2015

Hey fijal. Agreed that the survey has a self-selecting audience, I'd also argue that they're the more forward-thinking folk rather than jobbing background users.

Back in April 2013 (the last time I saw python.org download stats - where did they go?!) I wrote a blog post noting that fresh downloads of Windows Python 3.3 were greater than downloads of Windows Python 2.7, for 3 months running. Windows is useful as Python isn't bundled (unlike e.g. Linux and Mac). I presume this trend has continued but have no firm evidence either way: http://ianozsvald.com/2013/04/15/more-python-3-3-downloads-t...

What do the PyPI stats say?

kasabali · on Feb 25, 2015

Does anybody know if they considered a similar thing to what HotPy (2) [0] did? I'm not anywhere near to understanding the implementation details, but HotPy stood up most when I looked into different Python implementations. Basically, it was a fork of an early development version of CPython 3.3 code, with more bytecodes added (which were easier to optimize than the originals), a tracing engine and some other things I don't understand. The best thing is you get language compatibilitity for free and it is also binary compatible with CPython extensions.

Unfortunately its development looks to be ceased.

[0] https://sites.google.com/site/makingcpythonfast/

IanOzsvald · on Feb 25, 2015

HotPy is Mark Shannon's research project, he's not making a 'new better Python for the masses', he's using it to test his assumptions about ways CPython could be improved. Ask him about it if you bump into him at a python conference!

kasabali · on Feb 25, 2015

Thanks, it's more clear now. I misunderstood Pyston's motives and thought it was just making Python faster rather than writing it from stratch.

montecarl · on Feb 24, 2015

Is it expected that Pyston play well with numpy/scipy? This is the biggest barrier that I see towards adoption of PyPy for numerical computing. If Pyston can work with numpy and scipy then it would be a huge accomplishment.

ngoldbaum · on Feb 24, 2015

Their focus is on getting the dropbox application working under pyston. It would probably take contributions from someone outside of dropbox to make this a reality. Presumably it will be easier on pyston than with pypy since pyston has designed in better compatibility with the CPython C API.

dagw · on Feb 25, 2015

If speeding up numpy/scipy code is important to you (and it's certainly important to me), then numba is probably the project on the horizon to follow.

rogerbinns · on Feb 25, 2015

Does the C extension API remain the same? ie can I take an existing extension and just recompile it, or does the code have to support a different API.

I ported my main extension to PyPy but had to leave bits of functionality out because of missing parts of the API that CPython had that they didn't.

cpr · on Feb 24, 2015

Is Guido still at Dropbox? Contributing to this effort?

If he is, that would certainly make the effort carry a lot more weight.

techdragon · on Feb 24, 2015

Why?

Not downplaying how important our esteemed BDFL Guido is to Python, but I'll be flat out honest, aiming their sights on Python 2 just further entrenches its eventual place as the COBOL of 2050.

Python 2 is DEAD to me. It should be dead to you. It's Dead with a capital D. Node.js wound up with all this fork nonsense in HALF the time Python 2.7 has been stagnating the Python ecosystem.

Python 2 only code is nuclear waste grade technical debt.

I haven't written a line of Python 2 in 6 months after 2 years of slow decline.

I now use Python 2 vs 3 as an interview question.

I run my tests on 3.4.x, 3.5-dev, and versions of HEAD that pass the Python test suite.

Why does Guido working on Pyston do anything but hurt the future of Python by legitimising the position that it's ok as a community of Python programmers, to keep accruing this technical debt ?

rdtsc · on Feb 25, 2015

> Python 2 is DEAD to me. It should be dead to you

You mean the kind of DEAD that has thousands of tested libraries supporting it and that runs and makes money in the bank. I like that kind of DEAD. Sign me up.

> Python 2 only code is nuclear waste grade technical debt.

It is nuclear fusion kind of code for me that just keeps making me money.

> I now use Python 2 vs 3 as an interview question.

What could you meangingfully interview about it? "Show me how to read this unicode file"?

> I run my tests on 3.4.x, 3.5-dev, and versions of HEAD that pass the Python test suite.

I run my tests on my code base and make sure they pass and have decent coverage.

> Why does Guido working on Pyston do anything but hurt the future of Python by legitimising the position that it's ok as a community of Python programmers, to keep accruing this technical debt ?

What technical debt. My code is clean and nice looking. There is no technical debt.

But pray tell what are these great features you are using in Python 3 that Python 2 doesn't have and that just call for all this great excitement?

I like me some fancy tuple unpacking and binary literals but not enough to go around yelling Python 2 is DEAD at anyone or to destabilize nice working code for it.

belorn · on Feb 25, 2015

I recently visited a website which back end could not parse non-ascii character for credit card information. Here I am, willing to spend perfectly good money, and the stone age website only want money from people whose name is made from a-z.

Treating text and ascii as the same thing is bad. Its bad for technological reasons, its bad for economical reasons, and its just plain bad. If you think it "runs", it simply because all those customer who get "invalid character" error has gone over to a competitor and didn't bother to tell you.

lumpypua · on Feb 25, 2015

This is the most insanely strawman argument against python 2 I've ever read. I've written thousands of lines of unicode aware python 2 code.

rdtsc · on Feb 25, 2015

> If you think it "runs", it simply because all those customer who get "invalid character" error has gone over to a competitor and didn't bother to tell you.

Silly, that's not how sales of $500K-$1M products work.

svisser · on Feb 25, 2015

Python 3 has no benefits if your Python 2 code works and many dependencies are not Python 3 compatible yet. Why would a company even port to Python 3?

jsmeaton · on Feb 25, 2015

There aren't that many dependencies without py3 support anymore. Of course, if your project uses any then being stuck on py2 is a given.

I still don't think projects should port to python 3 unless they need to be moved forward or continually developed. Greenfield projects should most definitely start on py3 though.

duaneb · on Feb 25, 2015

...because Python 2 has a shorter lifetime and shitty unicode/string support (compared to 3).

coldtea · on Feb 25, 2015

Python 2 is the kind of dead that everybody uses.

baq · on Feb 25, 2015

arguably the worst kind of dead, then.

influx · on Feb 25, 2015

What is your Python 2 vs 3 interview question?

shadowmint · on Feb 25, 2015

Pyston isnt python 3 compatible.

chucksmash · on Feb 25, 2015

I can't source the comment with a quick googling so take the recollection at face value, but I recall one of either the Pyston team members or Guido himself in an interview saying that he hadn't been involved so far.

As of release 0.2 he was referring to it as being "built by some of my colleagues"[1]. Looking at the list of contributors for the Pyston repo on github[2], I don't see gvanrossum among them.

[1]: https://twitter.com/gvanrossum/status/510154006564335616

[2]: https://github.com/dropbox/pyston/graphs/contributors

phkahler · on Feb 24, 2015

Does Pyston work with any of the GUI toolkits? If not, is it expected to at some point?

andrewchambers · on Feb 24, 2015

Thanks kmod, keep up the good work.

codexon · on Feb 24, 2015

How is this different from pypy?

chrisseaton · on Feb 25, 2015

One difference that was originally highlighted by the Pyston team as being significant is that this is a method based JIT, where PyPy is a tracing JIT. I think it also prioritises support for C extensions, which PyPy does not.

Scarbutt · on Feb 24, 2015

But what if condoleza sneaks a trojan somewhere in the pyston code?