Python 3.11 beta vs. 3.10 benchmark

freediver · on May 18, 2022

Compared speed to Python 3.9 using python-speed for those who want a simpler, more straight-forward benchmark. [1]

Basically one can expect overall 24% increase in performance "for free" in a typical application.

Improvements across the board in all major categories. Seriously impressive.

Stack usage and multiprocessing had the largest performance increase. Even regex had 21% increase. Just wow!

And this may be the first Python3 that will actually be faster (about 5%) than Python 2.7. We've waited 12 years for this... Excited about Python's future!

-----

python-speed v1.3 using python v3.9.2

string/mem: 2400.67 ms

pi calc/math: 2996.1 ms

regex: 3201.59 ms

fibonnaci/stack: 2487.13 ms

multiprocess: 812.37 ms

total: 11897.85 ms (lower is better)

-----

python-speed v1.3 using python v3.11.0

string/mem: 2234.78 ms

pi calc/math: 2667.84 ms

regex: 2548.81 ms

fibonnaci/stack: 1149.57 ms

multiprocess: 480.25 ms

total: 9081.25 ms (lower is better)

-----

[1] https://github.com/vprelovac/python-speed

Aardwolf · on May 18, 2022

Python:

  def f(x):  
    return x * x;  
  v=0.0  
  for i in range(100000000):  
    v = v + f(i)

Execution time: 15 seconds

---

JavaScript:

  function f(x) { return x * x; } 
  var v = 0.0;  
  for(var i = 0; i < 100000000; i++) v += f(i);

Execution time: 0.5 seconds

---

So calling a function in a for loop in Python is still 30x slower than in JavaScript which is an as-dynamic language. Good to see a 24% increase, but a 3000% increase should still be possible.

I did this test in python 3.10, not python 3.11, but I assume if they did have a 30x speedup from the inlined python to python function calls this would have been mentioned, but the fastest speedup listed is 1.96x faster.

ris · on May 18, 2022

> JavaScript which is an as-dynamic language

It really is not. Python allows a huge amount of the object model to be overridden and introspected at any point. Javascript doesn't even allow operator overloading.

lenkite · on May 20, 2022

Can't Python just get a `const` keyword and CPython can then optimize accordingly ?

frozenport · on May 18, 2022

You that like its a bad thing :-P

pretext-1 · on May 19, 2022

True, but if I want a proper type system then JavaScript is not an option either.

nly · on May 18, 2022

Presumably you're comparing a JavaScript JIT to interpreted Python (CPython)?

Surely PyPy would be a fairer comparison

Aardwolf · on May 18, 2022

Indeed comparing with CPython, which the link is about.

Indeed CPython is not JIT and PyPy is, but CPython is unfortunately usually what you get, comparing with PyPy would not be fairer since the goal is to speed up "vanilla" CPython.

It's unfortunate to me that the default python interpreter, so widely used for scientific computational purposes, is so much slower than the JS interpreter you get in your web browser, and then we're being happy +24% benchmark results about this when 30x faster is known to be possible (e.g. with JIT).

freediver · on May 18, 2022

Just checked and pypy has the same speed as JS (instant). You are right that this is not what the most people use when working with Python and there is package incompatibility.

I am curious but don't have the time to check would the same result be for a regex?

Aardwolf · on May 18, 2022

I didn't check myself, but in the link from the other comment regex is the one thing where python wins!

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Of course this is testing two native implementations of regex parsing, not the higher level scripting language handling.

igouy · on May 18, 2022

PCRE2 "wins" ;-)

nly · on May 19, 2022

Right but there's no "standard" JavaScript engine either. Benchmarking V8 against CPython and pointing out a totally expected 30x speed difference isn't insightful

Aardwolf · on May 19, 2022

I still have to disagree, it's not because both slow and fast JS engines exist (recent ones vs those from before around 2008), or both slow and fast python engines exist (CPython vs PyPy), that it's not ok to point out that the vanilla official python interpreter is much slower than possible.

Why does one have to be ok with a totally expected speed difference? This is the official interpreter of a language, they give a slow tool by default, the official tool deserves scrutiny.

igouy · on May 18, 2022

Python: JavaScript:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

bjt2n3904 · on May 18, 2022

Just on the python3/python2.7 speed... This one has always killed me on BeagleBones...

`$ time python3 -c "print('hello')"`

I have a handful of utility scripts written in python, and the overhead of starting/stopping is just massive! (Sadly, I don't have one on hand to actually print a metric...)

ptx · on May 18, 2022

If your script doesn't need any third-party modules, you can try running it with "python3 -S", which should make the startup significantly faster if there are a lot of modules installed. (Twice as fast on my machine. Running Python from a venv is also somewhat faster than running it directly, but not as fast as with "-S".)

cozzyd · on May 18, 2022

Here you go, on a BeagleBoneBlack:

    $ time python3 -c "print('hello')"
    hello

    real    0m0.223s
    user    0m0.148s
    sys     0m0.047s

and, for the record:

    $ python3 bench.py
    python-speed v1.3 using python v3.7.3
    string/mem: 46673.98 ms
    pi calc/math: 98650.32 ms
    regex: 38489.41 ms
    fibonnaci/stack:  30723.3 ms
    multiprocess: 75500.34 ms

    total:  290037.35 ms (lower is better)

Might get a little better with some tweaking of performance governors / idle states enabled but, yeah...

freediver · on May 18, 2022

Luckily in most cases a python web app is 'served' so it does not have this start-up overhead.

jcelerier · on May 18, 2022

Just because 10000 companies are making some random webapps in python doesn't make the 10 local Python apps any less bothersome. This is literally what killed java for end-users.

ptx · on May 19, 2022

Interestingly, Java has been improving lately in this regard, thanks to cloud and container use-cases. There was a big regression with Java 9, but since then it's been getting better with each release.

Numbers for "hello world" on my machine:

Java 7: 43 ms

Java 8: 29 ms

Java 11: 34 ms

Java 18: 22 ms

Python 2.7: 6 ms

Python 3.10: 12 ms

Perl 5: 1 ms

Game_Ender · on May 18, 2022

This comes from the following enhancements:

- Cheaper, lazy Python frames: 3-7%

- Inlined Python function calls: 1-3%

- PEP 659: Specializing Adaptive Interpreter: variable

The interpreter changes work by generating specialized code so this might have a memory impact which they are working to offset and expect to cap at 20% more.

See the in progress release notes for details: https://docs.python.org/3.11/whatsnew/3.11.html#faster-cpyth...

kmod · on May 18, 2022

Bit of context for this:

- Cheaper Python frames are enabled by making some semantic changes to the language, mostly by removing some dynamic functionality that no one uses

- "Inlined Python function calls" is a bit misleading: Python functions are not inlined, it's the "code that does the bookkeeping for calling a Python function" that is now eliminated for Python-to-Python calls.

- The most important part of the specializing interpreter is the attribute caching, which is quite significant

Source: I work on Pyston, where we do many similar things (but can't make semantic changes to the language)

masklinn · on May 18, 2022

> - Cheaper Python frames are enabled by making some semantic changes to the language, mostly by removing some dynamic functionality that no one uses

What dynamic functionalities does no one use?

oblvious-earth · on May 18, 2022

Not sure what OP meant but basically internal frame struct has been significantly simplified to contain only essential information required at runtime.

If a user is wanting to get or manipulate debugging information then the old frame struct will be generated when this is called.

dklend122 · on May 18, 2022

What are the goals and positioning of pyston given these other efforts?

gosukiwi · on May 18, 2022

Ah! I remember reading something about self-optimizing interpreters, which is pretty cool. I think it's related to what they call "adaptive interpreter".

The interpreter starts out with a generic/slow approach and as it gets more datatype info it uses more specialized and faster implementations.

SemanticStrengh · on May 18, 2022

Are you saying python never inlined functions before??

BiteCode_dev · on May 18, 2022

Don't understand why you are being downvoted, coming from another language, it's an understandable surprise.

The explanation is that Python features heavy dynamism, meaning one can easily replace any part of the system at runtime (at any moment), including builtins, code objects, and even hook into the parser, AST, import mechanism, etc.

This renders inlining a dangerous exercice: say you inline the builtin len() function, how do you know it's not the intent of a code that runs later to replace it with a different implementation?

Now, there are ways to implement inlining, but they are not as straightforward as say, with a compiler, when you know nothing is going to change afterward.

So it's something that had be delayed.

Aardwolf · on May 18, 2022

> The explanation is that Python features heavy dynamism

JavaScript too, any function can be dynamically changed at any time, but its function calls are 30x faster (roughly, I measured it just now with a for loop doing 100 million function calls, which took 0.5s in JS, 15s in python)

This is a realistic scenario if you want to run e.g. a custom statistics function on a large array in python so it can be annoying (and yes numpy can do things but that's no reason to keep the main language encouraging less readable code where you don't define separate functions)

ptx · on May 18, 2022

> its function calls are 30x faster (roughly, I measured it just now with a for loop doing 100 million function calls, which took 0.5s in JS, 15s in python)

Your benchmark also measures iterators and boxed integers, which were not in the JavaScript version, so it's not clear how much of the difference is due to function call overhead.

(Of course, it certainly doesn't help Python's performance that there's no simple way to do loops without iterators and integers without boxing. It would be nice if a future version could optimize the abstractions away.)

SemanticStrengh · on May 19, 2022

JS integers are value types unlike Java?!

ptx · on May 19, 2022

On second thought, I'm not sure. Can you observe the number's object identity like you can in Python? Does JavaScript have to allocate a new Number object for each "i++"?

My point was that the Python version of Aardwolf's function call benchmark [1] is definitely using iterators and boxing all numbers, so it's doing a lot more than just calling a function, and the Java equivalent would look something like this:

    Long f(Integer x) {
        return new Long(x.longValue() * x.longValue());
    }

    // ...

    Double v = new Double(0.0);

    Iterator<Integer> range = IntStream.range(0, 100000000).iterator();
    while (range.hasNext()) {
        Integer i = range.next();
        v = new Double(v.doubleValue() + f(i).doubleValue());
    }

[1] https://news.ycombinator.com/item?id=31427506

BiteCode_dev · on May 18, 2022

Hence the "So it's something that had be delayed.", and not "this is something impossible".

Javascript has the benefit of having billions of dollars allocated to it from Google/Microsoft/Apple to hire dozen of amazing engineers full time over decades to work on it.

Python has existed since 1994, yet in 2011, the Python Software Fundation budget was less than $40K: https://pyfound.blogspot.com/2012/01/psf-grants-over-37000-t...

We are talking a difference of funding of 6 order of magnitudes. Not to mention one is a language that has to stand on its own, the second has an accidental monopoly on the most popular plateform in the world.

So it had to be delayed, unlike for JS.

SemanticStrengh · on May 18, 2022

> Python has existed since 1994, yet in 2011, the Python Software Fundation budget was less than $40K Exact, as usual it is mainly a question of human resources. Hence why I advocate for Python switching to GraalPython as the default runtime, it would enable much better performance in the next few years.

Also, I'm pretty sure cpython devs should ask Google/OpenAI/MicrosoftAI funding, how many millions can they waste on useless projects while not improving the core bottleneck..

BiteCode_dev · on May 19, 2022

You are asking for the rarest thing of all, a geek and volounteer worker that is also good at dealing with big entities and comfortable with managing money.

SemanticStrengh · on May 18, 2022

> how do you know it's not the intent of a code that runs later to replace it with a different implementation?

I think progress could move forward by adding a compiler flag "assume no fully free function body replacement" you know just like in every other mainstream language.

BiteCode_dev · on May 18, 2022

It's not body replacement, it's function reference erasing. There is no way in Python to tell something has been replaced or not, since variables are dumb labels and a reference is the same kind for any object. In fact, there is no particular difference between 2 objects in python.

E.G: functions are objects, and any object can be called as a function. And it's just a reference to "something callable" in the namespace mapping (literally a dict in most implementations), which is mutable by definition

Also a compiler flag would not help, since most users don't compile the python VM.

Now, you could put a runtime flag, that list all stuff that are created for the first time in a namespace, and refuse to allow reassignment.

It's possible, but it would break a LOT of things and prevent many patterns.

The last attempt was to put guards, and to assume no replacement, but if a replacement occurs, then at this moment we revert locally this assumption.

The process is refined at each iteration, but there is no turn-key solution as you seem to believe.

masklinn · on May 18, 2022

> I think progress could move forward by adding a compiler flag "assume no fully free function body replacement" you know just like in every other mainstream language.

Ah yes, a compiler flag which literally says “break the language”, that sounds like a great feature which would be used a lot.

> assume no fully free function body replacement

There is no “body replacement”, the `len` builtin is looked up as a global in the module on every access, anyone can just replace the module’s global.

SemanticStrengh · on May 18, 2022

I'm just advocating for a compiler flag that says "do_reasonable things" just like in every other mainstream language. Python must free himself from those idiosyncrasies if it wants to stay relevant, migrations are not that hard especially since those patterns seems illegitimate or workaroundable (e.g. use overriding if you want to replace a function or do extension methods). At worst at least tag the concerned functions as dirty via an annotation which will tell cpython to locally deoptimize. I'm not arguing for an exceptional thing, just for conventionality and sanity.

viraptor · on May 18, 2022

I think you're missing how deep the function reference change goes in standard code. What about unittest mocks? Conditional module loading? Enabling/disabling tracing features? I'd bet there's some dynamic function reference assignment that happens during python console startup.

It's not unreasonable. It's just what the language makes easy and useful. It's still used much less than for example method aliasing in Ruby which has about the same result.

duckerude · on May 18, 2022

> just like in every other mainstream language

JavaScript doesn't do this, you can replace properties of window at will. I don't think Ruby does it, or Lua. PHP probably does it under some circumstances in modern versions but only because it's remarkably un-dynamic for an interpreted language.

This is normal fare for high-level interpreted languages, for better or worse.

SemanticStrengh · on May 18, 2022

and yet I'm pretty sure JS can inline just fine

masklinn · on May 18, 2022

Sure, so can pypy.

But that means you have to add all sorts of dependency tracking, such that you are able to deoptimise any function affected by mutation on things which were optimised in or out.

This means your complexity increases very fast, very high, before you can have anything which actually works.

munificent · on May 18, 2022

If by "just fine" you mean, "with incredible engineering resources funded by a small number of large companies with deep pockets and with some performance cost to check that the optimization is still valid", yes.

duckerude · on May 18, 2022

It can, but it doesn't have a "do reasonable things" mode. Instead it notices when you do something unreasonable and undoes its optimizations.

adrianN · on May 18, 2022

> migrations are not that hard

You might recall that there are still holdouts who haven't migrated to Python 3. Breaking changes are generally not a good idea in mature languages.

misnome · on May 18, 2022

One of our scientific collaborators is _literally_ of the view that "One day people will realise what a mistake python 3 was, and go back to Python 2" (which is a verbatim quote).

cesarb · on May 18, 2022

While this has happened with Perl (AFAIK, people realized what a mistake Perl 6 was, and there's a plan for Perl 7 to mostly go back to Perl 5, which makes it two breaking changes in a row), I see no signs of it happening with Python; instead, every new Python 3 release gets farther from Python 2.

munificent · on May 18, 2022

I don't think people considered Perl 6 a "mistake". It's more that they realized that it wasn't a successor to Perl 5 and is instead a separate language. The language itself isn't a mistake, it was treating it as the next version that was.

The Perl community may be attempting to "fix" that mistake with Perl 7, but the damage has been done. I think Perl 6 fractured the community so hard and scared away so many users that both Perl 7 and Raku are likely to be minor languages for the foreseeable future.

If either of them has legs, I think it's likely to be Raku. Because Raku is at least an interesting language with a lot of really interesting, powerful language features.

Perl 5/7 by virtue of its history, is mostly just another dynamic scripting language, but one with a particularly unfriendly syntax. Aside from CPAN and inertia, I think there is relatively little reason to write new Perl 5/7 code when PHP, Python, Ruby, and JavaScript are out there.

Qem · on May 19, 2022

That mistake was sad. In the end they succeeded at creating a language that fixed most Perl problems and is actually quite awesome. But at the cost of catastrophic loss of mindshare.

petre · on May 18, 2022

Raku is too slow for no obvious reason, plus every lib has to be rewritten from scratch, including the good and mature ones from Perl.

There are more interesting things happening with Ruby and Python. In fact there are Python libs for obscure stuff like CAN-bus and ISO-TP. Want to talk to a vehicle ECU? It can be done with Python.

There's also lack of momentum and quite a lot of bikeshedding in the Perl community. There were attempts to modernise Perl by rurban and others, but they were met with unecessary resistence. Without community support they all ended up as one man shows. Perl is pretty much a dead end. You are hearing this from someone who still writes Perl code every day for work. My latest proof of concept was done in Ruby and it will probably end up as production code.

lizmat · on May 20, 2022

> Raku is too slow for no obvious reason

I wonder when it was the last time you tried Raku. Or compared it to a Perl script with Moose.

> plus every lib has to be rewritten from scratch, including the good and mature ones from Perl.

The good and mature ones from Perl can be used from Raku with Inline::Perl5.

petre · on May 20, 2022

I've tried for several iterations until I finally gave up and moved on to Ruby, Racket etc. Moose is quite slow itself. I never use it or anything that depends on it. I use Moo or Mojo::Base instead.

Inline::Perl5 is quite an ugly last resort solution. And you still need a Perl intepreter as opposed to calling on C code from D where you just need the libs.

lizmat · on May 23, 2022

If you gave up on it, why still keep saying it is slow, when you have no current information about that?

Also, when you're talking slow: why is it that any performant Perl module, actually has most of its logic written in XS (aka C)? So I think it's shows quite a bit of hutzpah to call Inline::Perl5 "an ugly last resort solution", whereas a *lot* of upstream CPAN modules rely on the hack that is XS to make them performant. To give you an example: the pure Raku version of Text::CSV is more than 2x as fast as the pure Perl version of Text::CSV.

mattkrause · on May 18, 2022

I thought the Perl situation was a little more nuanced.

Although Perl 6 was meant to be a Python 3000 type thing, it was spun out into its own language (Raku). Perl 7 will continue the 5.x lineage, but with saner defaults. Thus, any code that’s around now should run, but the interpreter’s name might change.

(I think…I mostly keep up with Perl for nostalgia’s sake).

cesarb · on May 18, 2022

> Although Perl 6 was meant to be a Python 3000 type thing, it was spun out into its own language (Raku).

The problem for them is that Perl 6 actually had an official release (and IIRC, more than one) under the Perl 6 name; the rename of the language to Raku (which IIRC was originally the name of just one VM for running Perl 6 programs) came later. So anyone who managed to keep up with the latest release of the Perl language would have two huge breaking changes (Perl 5 to Perl 6 and Perl 6 to Perl 7), the second one undoing most of the changes of the first. (AFAIK, PHP avoided all that by deciding to go back before officially releasing PHP 6, so anyone who was following the latest release just jumped directly from PHP 5 to PHP 7, avoiding the breaking changes which had been planned for PHP 6.)

raiph · on May 24, 2022

I get how you (and perhaps others) might think it's as you say. But I can tell you're not saying what you've said based on knowing it to be true but guessing it to be so. And while your guess isn't a surprising one given natural assumptions due to the names "Perl", "Perl 5", "Perl 6", and "Perl 7", it doesn't correspond to what has actually happened.

Anyone keeping up with the latest release of the Perl language has had near zero breakage for decades. (Indeed Perl has a well deserved reputation for having an outstanding track record in this regard compared to almost all other mainstream PLs.) I personally see every likelihood Perl 7 will extend that track record, though of course my crystal ball prognostications are necessarily based purely on what I see.

No one using P6 or Raku had a huge breaking change from Perl 5. No one using P6/Raku will have another one going to Perl 7.

If you presume Raku and Perl are different languages you'll get the essence of what has actually happened so far, and seems likely to be more or less true for the rest of this decade at least.

petre · on May 18, 2022

Incorrect, Raku(do) is not the VM. It's MoarVM (or JVM but that one is trailing behind). Rakudo is the reference implementation.

Also you should think in terms of Perl 5 -> 7. Raku is the language formally known as Perl 6. Perl 7 isn't undoing anything, it will be the successor to Perl 5. Perl 7 should have been 5.32 with saner defaults, but of course it's now going to take another 20 years of bikeshedding until this is happening.

https://www.perl.com/article/announcing-perl-7/

kortex · on May 18, 2022

It needed to be done; there were several choices in python 1/2 holding the language back, that all but necessitated breaking changes. Mostly with string/bytes/unicode handling. And the non-breaking route would have been a long term pain.

They didn't decide to break over print() and cosmetic changes, it runs way deeper.

giancarlostoro · on May 18, 2022

I'll assume this is a good faith question and answer it by quoting the "What's new in Python 3.11?" page (linked below the quote).

> During a Python function call, Python will call an evaluating C function to interpret that function’s code. This effectively limits pure Python recursion to what’s safe for the C stack.

> In 3.11, when CPython detects Python code calling another Python function, it sets up a new frame, and “jumps” to the new code inside the new frame. This avoids calling the C interpreting function altogether.

> Most Python function calls now consume no C stack space. This speeds up most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions can recurse significantly deeper (if the user increases the recursion limit). We measured a 1-3% improvement in pyperformance.

https://docs.python.org/3.11/whatsnew/3.11.html#inlined-pyth...

SemanticStrengh · on May 18, 2022

BTW It'd be nice if python implemented support for tail call recursive or even better, a growable/segmented stack in order to allow arbitrary pythons functions be stack overflow safe. e.g. https://gcc.gnu.org/wiki/SplitStacks#:~:text=Split%20Stacks%....

jleahy · on May 18, 2022

There was once stackless python that did this (Eve online was built in it), I don’t know if it still exists though.

bqmjjx0kac · on May 18, 2022

Neat! https://en.wikipedia.org/wiki/Stackless_Python

Sadly, it isn't true parallelism, more like green threads. And they didn't manage to kill the GIL.

FreakLegion · on May 19, 2022

With Python it's possible to do tail-call optimization yourself. See this all-time great Stack Overflow answer and the linked repo for details (the actual code is quite short): https://stackoverflow.com/a/18506625.

masklinn · on May 18, 2022

If Python doesn't depend on the C stack anymore (for most of its work), that's basically what you'll have for free.

Python's own stackframes are heap-allocated and chained (so each function has its own stack, in essence), so its use of a single unified allocation space for stack is already an implementation detail of the interpreter.

SemanticStrengh · on May 18, 2022

bjourne · on May 18, 2022

Other languages do this by sacrificing correctness. If a() calls b() calls c() causes an exception Python will show you the correct callstack. In other languages the callstack will differ depending on whether the compiler decided to inline c() or b() in a(). Same thing with TCO, it destroys callstacks and is therefore incorrect. If one prefers correctness over performance one should use Python. If one doesn't mind sacrificing correctness to gain performance one can use other languages.

kaba0 · on May 18, 2022

I’m not sure what language do you mean but Java has observable call stacks and inlines just fine. With JIT, deoptimizations are possible. But sure, for AOT languages it may happen, though stacks there are mostly meaningful only with debug symbols.

bjourne · on May 18, 2022

The vast majority of languages that support inlining do not reconstruct callstacks. Java is one of few exceptions and, like Python, it does not support tco.

planede · on May 18, 2022

Not all languages make call stacks observable.

hexomancer · on May 18, 2022

Python is so slow that I ironically don't care about performance improvement that much, because if you are writing any performance-sensitive part of your software in python you are screwed anyway, 50% faster code isn't going to help that much.

freediver · on May 18, 2022

Python is a joy to use and is used by a large number of apps as a tool of choice to get things done. Almost all of "retail" web crawling and machine learning in the world runs on Python.

In practice it is not slow at all. Plus the development iteration speed is probably second to none of all the programming languages.

If I would have my kid learn two programming languages it would be HTML and Python.

hexomancer · on May 18, 2022

I never said that python is a bad language. In fact it is my favorite language and I probably have written more python code than any other language.

That has nothing to do with it being slow though. And as I said, all those machine learning code rely on numpy (which relies on LAPACK which is not written in python) or highly optimized cuda kernels (which again is not written in python).

solarkraft · on May 18, 2022

Which is the optimal split! Write application/composing code in Python and highly performance sensitive parts in some FFI language (nowadays probably Rust?). It's not great if you do performance sensitive stuff all the time, but amazing if you just want to build something.

zekrioca · on May 18, 2022

I understand and somehow agree with your point, however, it would also be nice for someone to have the ability to build something performant with a language like Python. But this is not really possible today, so the narrative that Python is for prototyping will keep holding.

nomel · on May 18, 2022

> In practice it is not slow at all.

In practice, you're either not using Python (sleeping, while waiting for IO operations) or not using Python (C/Fortran/whatever libraries for heavier lifting: Pandas, numpy, PyTorch, etc).

lwofjsldkfjjw · on May 18, 2022

I'd rather get things done in Go (or Rust). Better standard library, sane package management, dumb easy concurrency & parallelism, easy to distribute, compiles to a binary.

nojito · on May 18, 2022

You spend far more time porting python libraries to rust when you can just use python for the vast majority of use cases.

dekhn · on May 18, 2022

is there a numpy for go? because numpy style numerics is table stakes for anything that replaces python.

dolmen · on May 18, 2022

https://www.gonum.org/

dekhn · on May 18, 2022

I already knew about gonum. That's not numpy for go. Have you tried it? It's not really a replacement for numpy. And the API looks liek they barfed a bunch of line noise into the top-level namespace. By the time I've typed in the code in https://www.gonum.org/post/intro_to_stats_with_gonum/, numpy already computed the mean.

Not a compelling repalcement. I spoke to the gonum developers when they first created it and told them they were wasting their time beceause the go leaders made their language intentionally be a "systems language", not a "scientific language".

jgb1984 · on May 18, 2022

Go is painfully verbose and Rust painfully complex compared to python.

nscalf · on May 18, 2022

Honestly, it feels inaccurate to call any modern programming language slow. Most are remarkably performant in the vast majority of cases. I’d be quick to yield that if your going for the bare minimum of latency, Python is likely not the right choice, but it feels a bit unfair to say Python is slow.

I will say that I agree with the sentiment of your comment: if your main focus is speed, you won’t be using Python, and if you’re using Python, you likely don’t care about speed.

nomel · on May 18, 2022

> Most are remarkably performant in the vast majority of cases.

If your metric is based on execution time, then this might be true. Many programs are faster enough where the user doesn't care, or the impact on the overall execution time is slight.

But, if your metric is compared to other languages, this is measurably false [1]. Even Python emulated in the browser is faster than CPython in many cases [2]. And, this doesn't really give a complete picture, since many CPython libraries don't actually use Python, because pure python implementations of most anything are too slow. They use python as glue to call out to compiled libraries. But, this is also the main use case of CPython: glue for not Python.

1. Python always near the bottom. See other problems too: https://programming-language-benchmarks.vercel.app/problem/b...

2. Brython benchmarks: https://brython.info/speed_results.html

igouy · on May 18, 2022

fwiw https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

nestorD · on May 18, 2022

While I am a believer in writing performance sensitive code in performant programming languages, I would argue that it matters.

A lot of numerical software is Python glue code written around compiled kernels (written in C++, Rust, Fortran, etc) but the runtime of that glue code usually does impact the total runtime in a non-negligible way. So all wins are good to take.

fulafel · on May 18, 2022

Except when the algorithmic and data structure improvements to your program enabled by the high programming performance have better payoff than spending the time on low level optimizations on other stacks.

properdine · on May 18, 2022

Could the interpreter have a "restricted" mode and a "dynamic" mode, where restricted mode can be faster because it skips certain lookups? And then if your code _engages_ in a "dynamic" behavior, it then drops out of "restricted" mode? (this could also be in a class by class basis too).

Django and friends might always have to be in "dynamic" mode, but possibly you could allow for some complex logic to run quickly if you use subset of the language. (e.g., like RPython but with less overhead to set up)

Someone · on May 18, 2022

It could, but detecting whether it has to “drop out” has a performance impact. Engineering such a feature without horrendously impacting performance is complex.

Also, users would not like to see performance drop permanently, only because they redefined some function (e.g. to log it’s arguments in a debugging session), so they’ll expect the system to (eventually) re-optimize code using the new state.

Languages such as JavaScript and Java do this kind of thing (Java not because programs can redefine what len means, but because the JITter makes assumptions such as “there’s only one implementation of interface Foo” or “the Object passed to this function always is an integer”), but I think both have it easier to detect the points in the code where they need to change their assumptions.

I also guess both have had at least an order of magnitude more development effort poured into them.

nomel · on May 18, 2022

> It could, but detecting whether it has to “drop out” has a performance impact.

I would be ok with a flag that raised an exception/halted if the restricted dynamicism was encountered, if it meant appreciable performance gains.

For every project I've worked on, and every project I've really become familiar with, the "magic" that requires these crazy levels of dynamicism can relatively easily be avoided, with more "standard" interfaces, and possibly a very slight increase in complexity presented to the library user.

I used to play with python magic frequently, but eventually realized my motivation was just some meta code-flex game I was playing, and it's almost certainly never worth it, if other developers are involved.

kortex · on May 18, 2022

That's more or less what PEP 659 tries to achieve, albeit without the flag.

Given that nearly everything in Python is an Object and can be modified/patched at any time, this dynamic adaptation is probably as best as one can get, without something like numba or cython, which "knows" more about specific blocks of code and can compile them down.

https://docs.python.org/3.11/whatsnew/3.11.html#pep-659-spec...

rich_sasha · on May 18, 2022

Is there some headline reason for these improvements?

I remember reading about the GILectomy, and how actually many of the improvements, aimed at making GILectomy feasible, make single-threaded Python runs faster too. There was even the trepidation that PSF might accept these changes, but still say no to GILectomy. Is this at all related?

bpicolo · on May 18, 2022

Microsoft is funding the work with Mark Shannon / GVR / Eric Snow. The project is called "Faster Cpython". Pretty excited for the 3.12 work, where they plan to introduce JIT compilation I believe?

I'm not sure about the progress on the new form of GIL removal, but as far as I've seen it's a separate effort.

https://github.com/faster-cpython/

https://www.theregister.com/2021/05/13/guido_van_rossum_cpyt...

Alex3917 · on May 18, 2022

> Pretty excited for the 3.12 work, where they plan to introduce JIT compilation I believe?

"A JIT, according to Shannon, will probably not arrive until 3.13 at the earliest, given the amount of lower-hanging fruit that is still to be worked on. The first step towards a JIT, he explained, would be to implement a trace interpreter, which would allow for better testing of concepts and lay the groundwork for future changes."

https://pyfound.blogspot.com/2022/05/the-2022-python-languag...

theandrewbailey · on May 18, 2022

At this point, I've given up hope that CPython will ever lose the GIL.

The best we can hope for is having multiple interpreters[0], each with their own GIL[1].

[0] https://peps.python.org/pep-0554/

[1] https://github.com/ericsnowcurrently/multi-core-python/issue...

Spivak · on May 18, 2022

For absolutely critical hot path code this obviously won’t be enough but subinterpreters with memory arenas is a really solid model for safe concurrency and faster than multiprocess IPC.

KptMarchewa · on May 18, 2022

Removing GIL naively would decrease single thread performance. Every project aiming at removing GIL failed because it could not get performance comparable with GILed Python.

rich_sasha · on May 18, 2022

Sam Gross worked on this, this made rounds recently in Python user world: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...

It's a fork of Python 3.9, takes out GIL and introduces optimisations to speed up both single- and multi-threaded execution (since the bar set by PSF is that no-GIL implementations must be at least as fast as GIL single threaded programs). He ends up with a net 10% speed improvement.

If he does these optimisations, and also doesn't remove the GIL, the performance boost is even larger. So, depending on how you look at it, it's either:

- A bunch of optimisations, plus a GILectomy which slows Python down, or

- A bulk change that removes GIL and speeds things up

Since these improvements were in a similar ballpark, my fear was that the improvements are taken off the branch, with GIL left in place...

sumtechguy · on May 18, 2022

Removing the GIL is an idea (and as you point out not working very well). When optimizing do not depend on 'that one cool trick' to fix everything. In this case it looks like they are removing extra work and doing work once and keeping a copy around (caching).

riyadparvez · on May 18, 2022

Why would it decrease single thread performance? How is python different than other languages that support native full-fledged multi-threading, eg Java, Go, C#?

kortex · on May 18, 2022

A big part is that Python uses reference counting GC. Java, Go, C# all use tracing GC. Py_INCREF and Py_DECREF are responsible for inc/decreasing the reference count, and are not atomic. The GIL ensures refcount safety by allowing only one thread access to changing refcount. The naive approach to parallelization would require locking each ref inc/dec. There are some more sophisticated approaches (thanks to work by Sam Gross et al) that avoid a mutex hit for every inc/dec.

Tracing GC does not run into this problem. Why Python doesn't use tracing GC is not something I am qualified to answer.

Sam Gross' work: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...

The GIL code: https://github.com/python/cpython/blob/main/Python/ceval_gil...

Py_INCREF: https://github.com/python/cpython/blob/a4460f2eb8b9db46a9bce...

kaba0 · on May 18, 2022

I am by no means knowledgeable enough on the topic, but Swift has similar problem domain, and afaik only uses atomic ref counts for objects that “escape” from a given thread - is there a reason something like that wouldn’t work for python as well?

adgjlsfhk1 · on May 18, 2022

python made it's C api visible, so things like reference counting are widely observed by C libraries that interop with python. This makes it much harder to make changes since you can't change the implementation in ways that programs rely on.

digisign · on May 18, 2022

The reason is that a lot of folks complain Python is slow. It can be 100x slower than optimized C at times.

Are these projects related? No, except that they are being attempted in the same time frame and may complement each other.

londons_explore · on May 18, 2022

I'd like to see some kind of financial marketplace for this kind of speedup.

There are tens of thousands of people who wish their python code would run a bit quicker. Many of those stand to earn/save actual money if the code was quicker, so would be happy to pay some of that towards making optimisations.

If that could be pooled together, with some kind of "$100k per 1% speed up on this set of benchmarks" metric, then developers could get properly paid for the work, and everyone would walk away happy.

As it is, everyone wishes it was faster, but realises that they alone can't pay a developer to make a dent, so nobody does it.

allendoerfer · on May 18, 2022

Or we could continue to expect big tech monopolies, which are extracting the most value out of Open Source, to also fund it.

londons_explore · on May 18, 2022

Such a system works to an extent, but I suspect the globally economically optimal amount of effort to put into these widely used opensource projects is far higher than the actual effort put in.

FartyMcFarter · on May 18, 2022

I remember noticing that Python 3 interpreter startup was significantly slower than Python 2 a couple of years ago. I wonder if these improvements have reversed the situation.

BiteCode_dev · on May 18, 2022

I think this is more of a job for PEP 690: https://peps.python.org/pep-0690/

digisign · on May 18, 2022

Somewhat, yes.

baq · on May 18, 2022

impressive gains for such a mature project.

chippiewill · on May 18, 2022

It's not _that_ surprising.

The CPython developers have never historically prioritised performance. Simplicity and readability of implementation have typically been higher priorities. In fact that's a key quality of Python in general that development speed is more important than runtime performance.

PHP had a similar performance binge about 6 years ago which is why it tends to run circles around similar class interpreted languages these days

Spivak · on May 18, 2022

And I can really appreciate that. Reading the CPython source is extremely easy for me and I have only “I learned it in college” level C experience.

323 · on May 18, 2022

There was no serious money invested until recently in Python performance.

Imagine what could happen if billions of dollars were invested in performance like it was done for C/C++/Java/Javascript in aggregate by various companies.

As more Python is being run in datacenters that starts to make sense, since 1% of CPU usage improvement can mean tens of millions of dollars per year in power costs.

pjmlp · on May 18, 2022

I am quite sure PyPy has gotten lots of money, specially from EU research grants.

As did IronPython, while Microsoft cared to burn money on it.

323 · on May 18, 2022

Quick search shows PyPy received 1.5 mil euros from EU. It also received 200 k dollars from Mozilla. That's enough to fund about 10 developers for a year.

From what I remember IronPython had Microsoft hire something like 5 developers for a few years.

SemanticStrengh · on May 18, 2022

[flagged]

FartyMcFarter · on May 18, 2022

I think it's more fair to say that Python became widely used for purposes it wasn't designed for.

Originally Python was meant as a language to write small to medium scripts in, not a language for complex programs benefiting from high performance.

stdbrouw · on May 18, 2022

I mean, I guess technically it started as an "educational language", but much of its early-ish development was aimed at turning it into an enterprise language that could rival Java. Guido worked for Zope Corporation from 2000-2003, Zope is a big hairy enterprisey CMS. 10-15 years ago, people at Google for a very long time toyed with the idea of making Python fast enough so they could just use it for everything, cf. Unladen Swallow.

That said, it's true that Python has ended up being used in places that were not envisioned by its original creators, but the implication that it is therefore a bad fit for those settings does not automatically follow. Ruby was never meant to be a "web language" but many seem to enjoy using it that way.

SemanticStrengh · on May 18, 2022

in other words, a historical accident

Spivak · on May 18, 2022

Outside of Java, Kotlin, and Go are there any languages designed specifically for large scale corporate/enterprisey use?

aeyes · on May 18, 2022

Erlang/OTP but you are probably asking about the usability once you have 100+ developers working on a project so it doesn't quite fit in. Even Go is questionable here when compared to Java, C# and even C++.

ThomasBHickey · on May 18, 2022

FartyMcFarter · on May 18, 2022

COBOL ?

SemanticStrengh · on May 18, 2022

C# indeed. But even javascript has a bearable JIT nowadays. Also no need for the vague enterprisey connotation, only features and performance are needed, which e.g. scala provide and Go do not (feature wise).

Spivak · on May 18, 2022

Ohh and Swift! I just mean that lots of popular languages were historical accidents when it comes to their use on large scale projects and it’s pretty rare (I suppose less now it seems) for someone to go out and invent languages specifically for large scale repos and “toy language is retrofitted for scale” is a more common narrative.

SemanticStrengh · on May 18, 2022

Yes historically this has been true but we are past this time, no need for each new language to duplicate it VM, its JIT, its GC.. GraalVM solve it all and bring ecosystems interop.

jadbox · on May 18, 2022

How does real world Python performance compare with Node.js these days? Anyone have recent experience with the semi-latest?

kirbyfan64sos · on May 18, 2022

AFAIK, it depends on your bottlenecks.

In terms of actual compute speed, Python is still significantly slower, and although 3.11's changes will help quite a bit, V8 is also just insane.

In terms of I/O speed, uvloop (https://magic.io/blog/uvloop-blazing-fast-python-networking/) can beat Node quite handily, so if you're more concerned with being able to handle requests than doing anything major during those requests, Python might be comparable.

lenkite · on May 20, 2022

How is this faster than nodejs if both are built on libuv ?

SemanticStrengh · on May 18, 2022

Would be interesting to compare with GraalPython

freediver · on May 18, 2022

Did that

python-speed v1.3 using python v3.8.5

string/mem: 3032.69 ms

pi calc/math: 6652.63 ms

regex: 2442.87 ms

fibonnaci/stack: 382.66 ms

multiprocess: 11573.28 ms

total: 24084.13 ms (lower is better)

Compare to results in the first comment in this post. Graal looks much slower except for stack manipulation.

SemanticStrengh · on May 18, 2022

Thanks a lot, hmm kinda disappointing, except for fibonnaci/stack. Was it in native (binary) of JIT form ? Also GraalVM EE has additional optimizations. (I'm assuming you used GraalVM 2022.1 edition) Anyway thanks for sharing :)

freediver · on May 18, 2022

Used the latest -dev binary available from their github releases.

pypy does much better:

python-speed v1.3 using python v3.9.12

string/mem: 2397.79 ms

pi calc/math: 2317.79 ms

regex: 1767.99 ms

fibonnaci/stack: 109.36 ms

multiprocess: 640.2 ms

total: 7233.14 ms (lower is better)

kmod · on May 18, 2022

GraalPython only supports a relatively-small subset of the Python language, making it impossible to collect numbers on realistic benchmarks

azinman2 · on May 18, 2022

Will that change? I thought the promise of graal is to massively speed up languages like Python and Ruby.

SemanticStrengh · on May 18, 2022

GraalJS has ~99% support of the JS language features, as for GraalPython it's unclear to me what's missing but it should get feature parity in the next few years. It already support modern python versions so that's a good sign.