Compared speed to Python 3.9 using python-speed for those who want a simpler, more straight-forward benchmark. [1]
Basically one can expect overall 24% increase in performance "for free" in a typical application.
Improvements across the board in all major categories. Seriously impressive.
Stack usage and multiprocessing had the largest performance increase. Even regex had 21% increase. Just wow!
And this may be the first Python3 that will actually be faster (about 5%) than Python 2.7. We've waited 12 years for this... Excited about Python's future!
def f(x):
return x * x;
v=0.0
for i in range(100000000):
v = v + f(i)
Execution time: 15 seconds
---
JavaScript:
function f(x) { return x * x; }
var v = 0.0;
for(var i = 0; i < 100000000; i++) v += f(i);
Execution time: 0.5 seconds
---
So calling a function in a for loop in Python is still 30x slower than in JavaScript which is an as-dynamic language. Good to see a 24% increase, but a 3000% increase should still be possible.
I did this test in python 3.10, not python 3.11, but I assume if they did have a 30x speedup from the inlined python to python function calls this would have been mentioned, but the fastest speedup listed is 1.96x faster.
It really is not. Python allows a huge amount of the object model to be overridden and introspected at any point. Javascript doesn't even allow operator overloading.
Indeed comparing with CPython, which the link is about.
Indeed CPython is not JIT and PyPy is, but CPython is unfortunately usually what you get, comparing with PyPy would not be fairer since the goal is to speed up "vanilla" CPython.
It's unfortunate to me that the default python interpreter, so widely used for scientific computational purposes, is so much slower than the JS interpreter you get in your web browser, and then we're being happy +24% benchmark results about this when 30x faster is known to be possible (e.g. with JIT).
Just checked and pypy has the same speed as JS (instant). You are right that this is not what the most people use when working with Python and there is package incompatibility.
I am curious but don't have the time to check would the same result be for a regex?
Right but there's no "standard" JavaScript engine either. Benchmarking V8 against CPython and pointing out a totally expected 30x speed difference isn't insightful
I still have to disagree, it's not because both slow and fast JS engines exist (recent ones vs those from before around 2008), or both slow and fast python engines exist (CPython vs PyPy), that it's not ok to point out that the vanilla official python interpreter is much slower than possible.
Why does one have to be ok with a totally expected speed difference? This is the official interpreter of a language, they give a slow tool by default, the official tool deserves scrutiny.
Just on the python3/python2.7 speed... This one has always killed me on BeagleBones...
`$ time python3 -c "print('hello')"`
I have a handful of utility scripts written in python, and the overhead of starting/stopping is just massive! (Sadly, I don't have one on hand to actually print a metric...)
If your script doesn't need any third-party modules, you can try running it with "python3 -S", which should make the startup significantly faster if there are a lot of modules installed. (Twice as fast on my machine. Running Python from a venv is also somewhat faster than running it directly, but not as fast as with "-S".)
$ time python3 -c "print('hello')"
hello
real 0m0.223s
user 0m0.148s
sys 0m0.047s
and, for the record:
$ python3 bench.py
python-speed v1.3 using python v3.7.3
string/mem: 46673.98 ms
pi calc/math: 98650.32 ms
regex: 38489.41 ms
fibonnaci/stack: 30723.3 ms
multiprocess: 75500.34 ms
total: 290037.35 ms (lower is better)
Might get a little better with some tweaking of performance governors / idle states enabled but, yeah...
Just because 10000 companies are making some random webapps in python doesn't make the 10 local Python apps any less bothersome. This is literally what killed java for end-users.
Interestingly, Java has been improving lately in this regard, thanks to cloud and container use-cases. There was a big regression with Java 9, but since then it's been getting better with each release.
- PEP 659: Specializing Adaptive Interpreter: variable
The interpreter changes work by generating specialized code so this might have a memory impact which they are working to offset and expect to cap at 20% more.
- Cheaper Python frames are enabled by making some semantic changes to the language, mostly by removing some dynamic functionality that no one uses
- "Inlined Python function calls" is a bit misleading: Python functions are not inlined, it's the "code that does the bookkeeping for calling a Python function" that is now eliminated for Python-to-Python calls.
- The most important part of the specializing interpreter is the attribute caching, which is quite significant
Source: I work on Pyston, where we do many similar things (but can't make semantic changes to the language)
Not sure what OP meant but basically internal frame struct has been significantly simplified to contain only essential information required at runtime.
If a user is wanting to get or manipulate debugging information then the old frame struct will be generated when this is called.
Ah! I remember reading something about self-optimizing interpreters, which is pretty cool. I think it's related to what they call "adaptive interpreter".
The interpreter starts out with a generic/slow approach and as it gets more datatype info it uses more specialized and faster implementations.
Don't understand why you are being downvoted, coming from another language, it's an understandable surprise.
The explanation is that Python features heavy dynamism, meaning one can easily replace any part of the system at runtime (at any moment), including builtins, code objects, and even hook into the parser, AST, import mechanism, etc.
This renders inlining a dangerous exercice: say you inline the builtin len() function, how do you know it's not the intent of a code that runs later to replace it with a different implementation?
Now, there are ways to implement inlining, but they are not as straightforward as say, with a compiler, when you know nothing is going to change afterward.
> The explanation is that Python features heavy dynamism
JavaScript too, any function can be dynamically changed at any time, but its function calls are 30x faster (roughly, I measured it just now with a for loop doing 100 million function calls, which took 0.5s in JS, 15s in python)
This is a realistic scenario if you want to run e.g. a custom statistics function on a large array in python so it can be annoying (and yes numpy can do things but that's no reason to keep the main language encouraging less readable code where you don't define separate functions)
> its function calls are 30x faster (roughly, I measured it just now with a for loop doing 100 million function calls, which took 0.5s in JS, 15s in python)
Your benchmark also measures iterators and boxed integers, which were not in the JavaScript version, so it's not clear how much of the difference is due to function call overhead.
(Of course, it certainly doesn't help Python's performance that there's no simple way to do loops without iterators and integers without boxing. It would be nice if a future version could optimize the abstractions away.)
On second thought, I'm not sure. Can you observe the number's object identity like you can in Python? Does JavaScript have to allocate a new Number object for each "i++"?
My point was that the Python version of Aardwolf's function call benchmark [1] is definitely using iterators and boxing all numbers, so it's doing a lot more than just calling a function, and the Java equivalent would look something like this:
Long f(Integer x) {
return new Long(x.longValue() * x.longValue());
}
// ...
Double v = new Double(0.0);
Iterator<Integer> range = IntStream.range(0, 100000000).iterator();
while (range.hasNext()) {
Integer i = range.next();
v = new Double(v.doubleValue() + f(i).doubleValue());
}
Hence the "So it's something that had be delayed.", and not "this is something impossible".
Javascript has the benefit of having billions of dollars allocated to it from Google/Microsoft/Apple to hire dozen of amazing engineers full time over decades to work on it.
We are talking a difference of funding of 6 order of magnitudes. Not to mention one is a language that has to stand on its own, the second has an accidental monopoly on the most popular plateform in the world.
> Python has existed since 1994, yet in 2011, the Python Software Fundation budget was less than $40K
Exact, as usual it is mainly a question of human resources. Hence why I advocate for Python switching to GraalPython as the default runtime, it would enable much better performance in the next few years.
Also, I'm pretty sure cpython devs should ask Google/OpenAI/MicrosoftAI funding, how many millions can they waste on useless projects while not improving the core bottleneck..
You are asking for the rarest thing of all, a geek and volounteer worker that is also good at dealing with big entities and comfortable with managing money.
> how do you know it's not the intent of a code that runs later to replace it with a different implementation?
I think progress could move forward by adding a compiler flag "assume no fully free function body replacement" you know just like in every other mainstream language.
It's not body replacement, it's function reference erasing. There is no way in Python to tell something has been replaced or not, since variables are dumb labels and a reference is the same kind for any object. In fact, there is no particular difference between 2 objects in python.
E.G: functions are objects, and any object can be called as a function. And it's just a reference to "something callable" in the namespace mapping (literally a dict in most implementations), which is mutable by definition
Also a compiler flag would not help, since most users don't compile the python VM.
Now, you could put a runtime flag, that list all stuff that are created for the first time in a namespace, and refuse to allow reassignment.
It's possible, but it would break a LOT of things and prevent many patterns.
The last attempt was to put guards, and to assume no replacement, but if a replacement occurs, then at this moment we revert locally this assumption.
The process is refined at each iteration, but there is no turn-key solution as you seem to believe.
> I think progress could move forward by adding a compiler flag "assume no fully free function body replacement" you know just like in every other mainstream language.
Ah yes, a compiler flag which literally says “break the language”, that sounds like a great feature which would be used a lot.
> assume no fully free function body replacement
There is no “body replacement”, the `len` builtin is looked up as a global in the module on every access, anyone can just replace the module’s global.
I'm just advocating for a compiler flag that says "do_reasonable things" just like in every other mainstream language. Python must free himself from those idiosyncrasies if it wants to stay relevant, migrations are not that hard especially since those patterns seems illegitimate or workaroundable (e.g. use overriding if you want to replace a function or do extension methods). At worst at least tag the concerned functions as dirty via an annotation which will tell cpython to locally deoptimize.
I'm not arguing for an exceptional thing, just for conventionality and sanity.
I think you're missing how deep the function reference change goes in standard code. What about unittest mocks? Conditional module loading? Enabling/disabling tracing features? I'd bet there's some dynamic function reference assignment that happens during python console startup.
It's not unreasonable. It's just what the language makes easy and useful. It's still used much less than for example method aliasing in Ruby which has about the same result.
JavaScript doesn't do this, you can replace properties of window at will. I don't think Ruby does it, or Lua. PHP probably does it under some circumstances in modern versions but only because it's remarkably un-dynamic for an interpreted language.
This is normal fare for high-level interpreted languages, for better or worse.
But that means you have to add all sorts of dependency tracking, such that you are able to deoptimise any function affected by mutation on things which were optimised in or out.
This means your complexity increases very fast, very high, before you can have anything which actually works.
If by "just fine" you mean, "with incredible engineering resources funded by a small number of large companies with deep pockets and with some performance cost to check that the optimization is still valid", yes.
One of our scientific collaborators is _literally_ of the view that "One day people will realise what a mistake python 3 was, and go back to Python 2" (which is a verbatim quote).
While this has happened with Perl (AFAIK, people realized what a mistake Perl 6 was, and there's a plan for Perl 7 to mostly go back to Perl 5, which makes it two breaking changes in a row), I see no signs of it happening with Python; instead, every new Python 3 release gets farther from Python 2.
I don't think people considered Perl 6 a "mistake". It's more that they realized that it wasn't a successor to Perl 5 and is instead a separate language. The language itself isn't a mistake, it was treating it as the next version that was.
The Perl community may be attempting to "fix" that mistake with Perl 7, but the damage has been done. I think Perl 6 fractured the community so hard and scared away so many users that both Perl 7 and Raku are likely to be minor languages for the foreseeable future.
If either of them has legs, I think it's likely to be Raku. Because Raku is at least an interesting language with a lot of really interesting, powerful language features.
Perl 5/7 by virtue of its history, is mostly just another dynamic scripting language, but one with a particularly unfriendly syntax. Aside from CPAN and inertia, I think there is relatively little reason to write new Perl 5/7 code when PHP, Python, Ruby, and JavaScript are out there.
That mistake was sad. In the end they succeeded at creating a language that fixed most Perl problems and is actually quite awesome. But at the cost of catastrophic loss of mindshare.
Raku is too slow for no obvious reason, plus every lib has to be rewritten from scratch, including the good and mature ones from Perl.
There are more interesting things happening with Ruby and Python. In fact there are Python libs for obscure stuff like CAN-bus and ISO-TP. Want to talk to a vehicle ECU? It can be done with Python.
There's also lack of momentum and quite a lot of bikeshedding in the Perl community. There were attempts to modernise Perl by rurban and others, but they were met with unecessary resistence. Without community support they all ended up as one man shows. Perl is pretty much a dead end. You are hearing this from someone who still writes Perl code every day for work. My latest proof of concept was done in Ruby and it will probably end up as production code.
I've tried for several iterations until I finally gave up and moved on to Ruby, Racket etc. Moose is quite slow itself. I never use it or anything that depends on it. I use Moo or Mojo::Base instead.
Inline::Perl5 is quite an ugly last resort solution. And you still need a Perl intepreter as opposed to calling on C code from D where you just need the libs.
If you gave up on it, why still keep saying it is slow, when you have no current information about that?
Also, when you're talking slow: why is it that any performant Perl module, actually has most of its logic written in XS (aka C)? So I think it's shows quite a bit of hutzpah to call Inline::Perl5 "an ugly last resort solution", whereas a *lot* of upstream CPAN modules rely on the hack that is XS to make them performant. To give you an example: the pure Raku version of Text::CSV is more than 2x as fast as the pure Perl version of Text::CSV.
I thought the Perl situation was a little more nuanced.
Although Perl 6 was meant to be a Python 3000 type thing, it was spun out into its own language (Raku). Perl 7 will continue the 5.x lineage, but with saner defaults. Thus, any code that’s around now should run, but the interpreter’s name might change.
(I think…I mostly keep up with Perl for nostalgia’s sake).
> Although Perl 6 was meant to be a Python 3000 type thing, it was spun out into its own language (Raku).
The problem for them is that Perl 6 actually had an official release (and IIRC, more than one) under the Perl 6 name; the rename of the language to Raku (which IIRC was originally the name of just one VM for running Perl 6 programs) came later. So anyone who managed to keep up with the latest release of the Perl language would have two huge breaking changes (Perl 5 to Perl 6 and Perl 6 to Perl 7), the second one undoing most of the changes of the first. (AFAIK, PHP avoided all that by deciding to go back before officially releasing PHP 6, so anyone who was following the latest release just jumped directly from PHP 5 to PHP 7, avoiding the breaking changes which had been planned for PHP 6.)
I get how you (and perhaps others) might think it's as you say. But I can tell you're not saying what you've said based on knowing it to be true but guessing it to be so. And while your guess isn't a surprising one given natural assumptions due to the names "Perl", "Perl 5", "Perl 6", and "Perl 7", it doesn't correspond to what has actually happened.
Anyone keeping up with the latest release of the Perl language has had near zero breakage for decades. (Indeed Perl has a well deserved reputation for having an outstanding track record in this regard compared to almost all other mainstream PLs.) I personally see every likelihood Perl 7 will extend that track record, though of course my crystal ball prognostications are necessarily based purely on what I see.
No one using P6 or Raku had a huge breaking change from Perl 5. No one using P6/Raku will have another one going to Perl 7.
If you presume Raku and Perl are different languages you'll get the essence of what has actually happened so far, and seems likely to be more or less true for the rest of this decade at least.
Incorrect, Raku(do) is not the VM. It's MoarVM (or JVM but that one is trailing behind). Rakudo is the reference implementation.
Also you should think in terms of Perl 5 -> 7. Raku is the language formally known as Perl 6. Perl 7 isn't undoing anything, it will be the successor to Perl 5. Perl 7 should have been 5.32 with saner defaults, but of course it's now going to take another 20 years of bikeshedding until this is happening.
It needed to be done; there were several choices in python 1/2 holding the language back, that all but necessitated breaking changes. Mostly with string/bytes/unicode handling. And the non-breaking route would have been a long term pain.
They didn't decide to break over print() and cosmetic changes, it runs way deeper.
I'll assume this is a good faith question and answer it by quoting the "What's new in Python 3.11?" page (linked below the quote).
> During a Python function call, Python will call an evaluating C function to interpret that function’s code. This effectively limits pure Python recursion to what’s safe for the C stack.
> In 3.11, when CPython detects Python code calling another Python function, it sets up a new frame, and “jumps” to the new code inside the new frame. This avoids calling the C interpreting function altogether.
> Most Python function calls now consume no C stack space. This speeds up most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions can recurse significantly deeper (if the user increases the recursion limit). We measured a 1-3% improvement in pyperformance.
BTW It'd be nice if python implemented support for tail call recursive or even better, a growable/segmented stack in order to allow arbitrary pythons functions be stack overflow safe.
e.g. https://gcc.gnu.org/wiki/SplitStacks#:~:text=Split%20Stacks%....
With Python it's possible to do tail-call optimization yourself. See this all-time great Stack Overflow answer and the linked repo for details (the actual code is quite short): https://stackoverflow.com/a/18506625.
If Python doesn't depend on the C stack anymore (for most of its work), that's basically what you'll have for free.
Python's own stackframes are heap-allocated and chained (so each function has its own stack, in essence), so its use of a single unified allocation space for stack is already an implementation detail of the interpreter.
Other languages do this by sacrificing correctness. If a() calls b() calls c() causes an exception Python will show you the correct callstack. In other languages the callstack will differ depending on whether the compiler decided to inline c() or b() in a(). Same thing with TCO, it destroys callstacks and is therefore incorrect. If one prefers correctness over performance one should use Python. If one doesn't mind sacrificing correctness to gain performance one can use other languages.
I’m not sure what language do you mean but Java has observable call stacks and inlines just fine. With JIT, deoptimizations are possible. But sure, for AOT languages it may happen, though stacks there are mostly meaningful only with debug symbols.
The vast majority of languages that support inlining do not reconstruct callstacks. Java is one of few exceptions and, like Python, it does not support tco.
Python is so slow that I ironically don't care about performance improvement that much, because if you are writing any performance-sensitive part of your software in python you are screwed anyway, 50% faster code isn't going to help that much.
Python is a joy to use and is used by a large number of apps as a tool of choice to get things done. Almost all of "retail" web crawling and machine learning in the world runs on Python.
In practice it is not slow at all. Plus the development iteration speed is probably second to none of all the programming languages.
If I would have my kid learn two programming languages it would be HTML and Python.
I never said that python is a bad language. In fact it is my favorite language and I probably have written more python code than any other language.
That has nothing to do with it being slow though. And as I said, all those machine learning code rely on numpy (which relies on LAPACK which is not written in python) or highly optimized cuda kernels (which again is not written in python).
Which is the optimal split! Write application/composing code in Python and highly performance sensitive parts in some FFI language (nowadays probably Rust?). It's not great if you do performance sensitive stuff all the time, but amazing if you just want to build something.
I understand and somehow agree with your point, however, it would also be nice for someone to have the ability to build something performant with a language like Python. But this is not really possible today, so the narrative that Python is for prototyping will keep holding.
In practice, you're either not using Python (sleeping, while waiting for IO operations) or not using Python (C/Fortran/whatever libraries for heavier lifting: Pandas, numpy, PyTorch, etc).
I'd rather get things done in Go (or Rust). Better standard library, sane package management, dumb easy concurrency & parallelism, easy to distribute, compiles to a binary.
I already knew about gonum. That's not numpy for go. Have you tried it? It's not really a replacement for numpy. And the API looks liek they barfed a bunch of line noise into the top-level namespace.
By the time I've typed in the code in https://www.gonum.org/post/intro_to_stats_with_gonum/, numpy already computed the mean.
Not a compelling repalcement. I spoke to the gonum developers when they first created it and told them they were wasting their time beceause the go leaders made their language intentionally be a "systems language", not a "scientific language".
Honestly, it feels inaccurate to call any modern programming language slow. Most are remarkably performant in the vast majority of cases. I’d be quick to yield that if your going for the bare minimum of latency, Python is likely not the right choice, but it feels a bit unfair to say Python is slow.
I will say that I agree with the sentiment of your comment: if your main focus is speed, you won’t be using Python, and if you’re using Python, you likely don’t care about speed.
> Most are remarkably performant in the vast majority of cases.
If your metric is based on execution time, then this might be true. Many programs are faster enough where the user doesn't care, or the impact on the overall execution time is slight.
But, if your metric is compared to other languages, this is measurably false [1]. Even Python emulated in the browser is faster than CPython in many cases [2]. And, this doesn't really give a complete picture, since many CPython libraries don't actually use Python, because pure python implementations of most anything are too slow. They use python as glue to call out to compiled libraries. But, this is also the main use case of CPython: glue for not Python.
While I am a believer in writing performance sensitive code in performant programming languages, I would argue that it matters.
A lot of numerical software is Python glue code written around compiled kernels (written in C++, Rust, Fortran, etc) but the runtime of that glue code usually does impact the total runtime in a non-negligible way. So all wins are good to take.
Except when the algorithmic and data structure improvements to your program enabled by the high programming performance have better payoff than spending the time on low level optimizations on other stacks.
Could the interpreter have a "restricted" mode and a "dynamic" mode, where restricted mode can be faster because it skips certain lookups? And then if your code _engages_ in a "dynamic" behavior, it then drops out of "restricted" mode? (this could also be in a class by class basis too).
Django and friends might always have to be in "dynamic" mode, but possibly you could allow for some complex logic to run quickly if you use subset of the language. (e.g., like RPython but with less overhead to set up)
It could, but detecting whether it has to “drop out” has a performance impact. Engineering such a feature without horrendously impacting performance is complex.
Also, users would not like to see performance drop permanently, only because they redefined some function (e.g. to log it’s arguments in a debugging session), so they’ll expect the system to (eventually) re-optimize code using the new state.
Languages such as JavaScript and Java do this kind of thing (Java not because programs can redefine what len means, but because the JITter makes assumptions such as “there’s only one implementation of interface Foo” or “the Object passed to this function always is an integer”), but I think both have it easier to detect the points in the code where they need to change their assumptions.
I also guess both have had at least an order of magnitude more development effort poured into them.
> It could, but detecting whether it has to “drop out” has a performance impact.
I would be ok with a flag that raised an exception/halted if the restricted dynamicism was encountered, if it meant appreciable performance gains.
For every project I've worked on, and every project I've really become familiar with, the "magic" that requires these crazy levels of dynamicism can relatively easily be avoided, with more "standard" interfaces, and possibly a very slight increase in complexity presented to the library user.
I used to play with python magic frequently, but eventually realized my motivation was just some meta code-flex game I was playing, and it's almost certainly never worth it, if other developers are involved.
That's more or less what PEP 659 tries to achieve, albeit without the flag.
Given that nearly everything in Python is an Object and can be modified/patched at any time, this dynamic adaptation is probably as best as one can get, without something like numba or cython, which "knows" more about specific blocks of code and can compile them down.
Is there some headline reason for these improvements?
I remember reading about the GILectomy, and how actually many of the improvements, aimed at making GILectomy feasible, make single-threaded Python runs faster too. There was even the trepidation that PSF might accept these changes, but still say no to GILectomy. Is this at all related?
Microsoft is funding the work with Mark Shannon / GVR / Eric Snow. The project is called "Faster Cpython". Pretty excited for the 3.12 work, where they plan to introduce JIT compilation I believe?
I'm not sure about the progress on the new form of GIL removal, but as far as I've seen it's a separate effort.
> Pretty excited for the 3.12 work, where they plan to introduce JIT compilation I believe?
"A JIT, according to Shannon, will probably not arrive until 3.13 at the earliest, given the amount of lower-hanging fruit that is still to be worked on. The first step towards a JIT, he explained, would be to implement a trace interpreter, which would allow for better testing of concepts and lay the groundwork for future changes."
For absolutely critical hot path code this obviously won’t be enough but subinterpreters with memory arenas is a really solid model for safe concurrency and faster than multiprocess IPC.
Removing GIL naively would decrease single thread performance. Every project aiming at removing GIL failed because it could not get performance comparable with GILed Python.
It's a fork of Python 3.9, takes out GIL and introduces optimisations to speed up both single- and multi-threaded execution (since the bar set by PSF is that no-GIL implementations must be at least as fast as GIL single threaded programs). He ends up with a net 10% speed improvement.
If he does these optimisations, and also doesn't remove the GIL, the performance boost is even larger. So, depending on how you look at it, it's either:
- A bunch of optimisations, plus a GILectomy which slows Python down, or
- A bulk change that removes GIL and speeds things up
Since these improvements were in a similar ballpark, my fear was that the improvements are taken off the branch, with GIL left in place...
Removing the GIL is an idea (and as you point out not working very well). When optimizing do not depend on 'that one cool trick' to fix everything. In this case it looks like they are removing extra work and doing work once and keeping a copy around (caching).
Why would it decrease single thread performance? How is python different than other languages that support native full-fledged multi-threading, eg Java, Go, C#?
A big part is that Python uses reference counting GC. Java, Go, C# all use tracing GC. Py_INCREF and Py_DECREF are responsible for inc/decreasing the reference count, and are not atomic. The GIL ensures refcount safety by allowing only one thread access to changing refcount. The naive approach to parallelization would require locking each ref inc/dec. There are some more sophisticated approaches (thanks to work by Sam Gross et al) that avoid a mutex hit for every inc/dec.
Tracing GC does not run into this problem. Why Python doesn't use tracing GC is not something I am qualified to answer.
I am by no means knowledgeable enough on the topic, but Swift has similar problem domain, and afaik only uses atomic ref counts for objects that “escape” from a given thread - is there a reason something like that wouldn’t work for python as well?
python made it's C api visible, so things like reference counting are widely observed by C libraries that interop with python. This makes it much harder to make changes since you can't change the implementation in ways that programs rely on.
I'd like to see some kind of financial marketplace for this kind of speedup.
There are tens of thousands of people who wish their python code would run a bit quicker. Many of those stand to earn/save actual money if the code was quicker, so would be happy to pay some of that towards making optimisations.
If that could be pooled together, with some kind of "$100k per 1% speed up on this set of benchmarks" metric, then developers could get properly paid for the work, and everyone would walk away happy.
As it is, everyone wishes it was faster, but realises that they alone can't pay a developer to make a dent, so nobody does it.
Such a system works to an extent, but I suspect the globally economically optimal amount of effort to put into these widely used opensource projects is far higher than the actual effort put in.
I remember noticing that Python 3 interpreter startup was significantly slower than Python 2 a couple of years ago. I wonder if these improvements have reversed the situation.
The CPython developers have never historically prioritised performance. Simplicity and readability of implementation have typically been higher priorities. In fact that's a key quality of Python in general that development speed is more important than runtime performance.
PHP had a similar performance binge about 6 years ago which is why it tends to run circles around similar class interpreted languages these days
There was no serious money invested until recently in Python performance.
Imagine what could happen if billions of dollars were invested in performance like it was done for C/C++/Java/Javascript in aggregate by various companies.
As more Python is being run in datacenters that starts to make sense, since 1% of CPU usage improvement can mean tens of millions of dollars per year in power costs.
Quick search shows PyPy received 1.5 mil euros from EU. It also received 200 k dollars from Mozilla. That's enough to fund about 10 developers for a year.
From what I remember IronPython had Microsoft hire something like 5 developers for a few years.
I mean, I guess technically it started as an "educational language", but much of its early-ish development was aimed at turning it into an enterprise language that could rival Java. Guido worked for Zope Corporation from 2000-2003, Zope is a big hairy enterprisey CMS. 10-15 years ago, people at Google for a very long time toyed with the idea of making Python fast enough so they could just use it for everything, cf. Unladen Swallow.
That said, it's true that Python has ended up being used in places that were not envisioned by its original creators, but the implication that it is therefore a bad fit for those settings does not automatically follow. Ruby was never meant to be a "web language" but many seem to enjoy using it that way.
Erlang/OTP but you are probably asking about the usability once you have 100+ developers working on a project so it doesn't quite fit in. Even Go is questionable here when compared to Java, C# and even C++.
C# indeed. But even javascript has a bearable JIT nowadays. Also no need for the vague enterprisey connotation, only features and performance are needed, which e.g. scala provide and Go do not (feature wise).
Ohh and Swift! I just mean that lots of popular languages were historical accidents when it comes to their use on large scale projects and it’s pretty rare (I suppose less now it seems) for someone to go out and invent languages specifically for large scale repos and “toy language is retrofitted for scale” is a more common narrative.
Yes historically this has been true but we are past this time, no need for each new language to duplicate it VM, its JIT, its GC.. GraalVM solve it all and bring ecosystems interop.
In terms of actual compute speed, Python is still significantly slower, and although 3.11's changes will help quite a bit, V8 is also just insane.
In terms of I/O speed, uvloop (https://magic.io/blog/uvloop-blazing-fast-python-networking/) can beat Node quite handily, so if you're more concerned with being able to handle requests than doing anything major during those requests, Python might be comparable.
Thanks a lot, hmm kinda disappointing, except for fibonnaci/stack.
Was it in native (binary) of JIT form ? Also GraalVM EE has additional optimizations.
(I'm assuming you used GraalVM 2022.1 edition)
Anyway thanks for sharing :)
GraalJS has ~99% support of the JS language features, as for GraalPython it's unclear to me what's missing but it should get feature parity in the next few years. It already support modern python versions so that's a good sign.
Basically one can expect overall 24% increase in performance "for free" in a typical application.
Improvements across the board in all major categories. Seriously impressive.
Stack usage and multiprocessing had the largest performance increase. Even regex had 21% increase. Just wow!
And this may be the first Python3 that will actually be faster (about 5%) than Python 2.7. We've waited 12 years for this... Excited about Python's future!
-----
python-speed v1.3 using python v3.9.2
string/mem: 2400.67 ms
pi calc/math: 2996.1 ms
regex: 3201.59 ms
fibonnaci/stack: 2487.13 ms
multiprocess: 812.37 ms
total: 11897.85 ms (lower is better)
-----
python-speed v1.3 using python v3.11.0
string/mem: 2234.78 ms
pi calc/math: 2667.84 ms
regex: 2548.81 ms
fibonnaci/stack: 1149.57 ms
multiprocess: 480.25 ms
total: 9081.25 ms (lower is better)
-----
[1] https://github.com/vprelovac/python-speed