Python, while slow, is "fast enough" because it's super easy to add native objects for anything that's too slow. It's kind of an inversion for the usual scripting language layout with fast native code loading slow snippets of script to provide customization, and tbh that's probably the better approach for most situations. Write the 99% of your application that's not performance critical in something quick and easy and flexible, and then optimize the heck out of the remaining 1% that matters.
(Not a card carrying Python fan or anything, haven't used it in a decade, I just like this architecture.)
> IOW, the question is not whether Python is "fast" or "slow", but whether it is "fast enough" or "too slow" for a given user.
100% this. There isn't REALLY any such thing as good/bad/fast/slow/cheap/expensive, there's just "meets spec better" / "doesn't meet spec as well". Everything is relative to your requirements.
I think one thing that makes many Python programs "fast enough" is that people have already provided a huge pile of native objects in the form of the OS and its trimmings. But this is true for other languages as well. If I were writing in C, I'd never try to rewrite any portion of the OS unless I had some otherwise insurmountable issue to overcome.
Quite a lot of programs are almost entirely bopping along from one OS or numpy call to the next, with a tiny bit of business logic thrown in. You've got a better chance of making your users happy by thinking about that business logic at a high level, than trying to optimize code that's only doing 1% of the actual work.
My use for compiled code is when there's no OS, and every nanosecond matters, e.g., working with microcontrollers.
This works well for applications where you have that kind of split, but there are applications where most of the time is spent scattered around a lot of the code - and if you are in that situation in Python and need to go faster, it's an awkward place to be.
Yeah, when none of the code is tight inner loops, all of it is. Not much to do then but reimplement your program in a more efficient language... then call it from a Python script. ;)
This really depends on your perspective. Sure, it's a lot slower than running a native binary. But it's still fast enough for interactive tools that you run often. If you compare that to java tools, for example, which take seconds if not dozens of seconds to start (gradle, I'm looking at you!), python is far better.
gradle has pretty much the worst ergonomics of any developer tool I've ever used[1], so that's a pretty low bar.
[1] e.g. "Something went wrong. Re-run with -debug or -info. Now here's 5000 lines of useless stack traces that won't help at all." Nevermind an ecosystem of plugins where it's not clear where the DSL ends and API starts, and makes it virtually impossible to locate the actual source of an error message, if you get one at all.
This time last year, I was deep into writing Gradle plugins at work.
Gradle's entire architecture is the best argument I have ever seen for why you should never use a big blob of shared mutable state when all you really needed to do was pass discrete values up and down the call stack.
Compare with Bazel. I appreciate why people bounce off of many of its strictures. But when you just look at the architecture and what it takes to develop for it, it's amazing how much easier it is to grok, while achieving a similar level of flexibility.
> chg's very existence is because we need hg to be a native binary in order to avoid Python startup overhead. If hg weren't a Python script, we wouldn't need chg to be a separate program.
I suspect that, back in the day, Bitbucket (which was originally hg-only) might have won out over GitHub if Git hadn't also had the huge publicity boost that comes with being written by Linus.
It's not always the sin it's made out to be. It's a matter of risk and cost. If there's a high risk performance will be an issue, and optimising later will have a high cost (or optimising early has a low cost), then it makes sense.
Yes and no. One of the biggest disadvantages of python is that the way people typically optimize python code is to
1. rewrite in numpy
2. rewrite in Cython (or other python compiler)
3. rewrite in C++
The problem with this workflow is that it means you end up rewriting large chunks of your code over several years in systems that have fairly different idioms. If you suspect you might in the future be performance bottle-necked, writing your code in a faster (but still productive) language to start can be a lot better because you can then improve performance more incrementally without multiple full rewrites.
I work in python all the time, and I’ve still not found a situation where this happened. Sometimes you might have specific components that you might want to write in other languages, but this is really rare unless you’re manually writing really computationally intensive or time critical procedures. Which hopefully you should know going into the project. Like if you’re rolling your own in-line audio processing you know up front that python is a bad choice. But any CRUD-like app? Python is plenty fast, especially if you’re relying on c implementations of heavy-lifting stuff.
It's a great prototyping language. Sometimes the final product will use a better algorithm because it was found easily in Python then ported to bare metal language.
(Thinking into the wild here…) Wonder if python could be sped up by not being loaded as usual per call. Ie something more like a service with careful namespace management. I’d think something that copies what’s already in ram to new registers would be much faster.
Frankly, I would find it much easier to write the naive C version than this "moderately" optimized Python version in the article although I am much more familiar with Python than with C.
I really hate these kinds of articles because they don't DO anything useful in the sample programs.
Grab a web page and parse it for some data. Suddenly, C looks like trash. The network latency negates any speed advantage from C and the fact that C has to manually manage memory everywhere makes your C code look like a dumpster fire relative to the Python code. You can always pick a task that favors one language or the other.
What people forget is that fast/slow ALSO includes "time to develop the code". If I can write the Python code in 1 hour and the C code in 10 hours, the code has to be a lot slower or be run a lot more times before the time wasted writing the C code pays off.
And if he'd applied the same optimizations to the C version, it would have trounced the optimized Python version. I really don't like it when authors unfairly penalize their "not preferred" solution -- it makes me feel like the entire time spent reading the article was nothing more than listening to pointless chest thumping by the author. The whole article was barely about the actual compiling of Python code.
I've been typing SQLAlchemy for some months now and there's lots of scenarios where if you end up on mypy's issue tracker (which is very often), you will see "oh use a #type: ignore to deal with this occasional use case" as official advice. And here we see a compiler that considers all "# type: ignore" to be bugs and will eventually refuse to compile them. That's simply not realistic for the way Python typing works right now and I am skeptical.
Some interesting tidbits on python code and compilation performance
1) During AoC I mostly used cpython and numpy. One day my code was running too slow so I fired up pypy. It ran even slower. Turns out numpy and pypy do not play well together in terms of performance.
2) I use matlab in work, and had some code that was slower than I liked, despite efforts to profile and optimise. I used the coder tool to compile it to C and got a 32x speedup. I was surprised, as matlab has a jit, so didn't expect a 32x performance delta, but there it is.
Advent of Code. Its a yearly coding challenge in December. Increasingly difficult problems each day, and an overall very fun way to brush up on algorithms or learn a new language.
Honestly for a language that’s supposed to be simple and easy to use Python seems really complicated and hard to understand. Anything big or performance critical should probably not be written in Python - and yet people are doing it. And then we end up with a house of cards with this project built on top of the very shaky foundation of mypy.
I think Python is simple to use but difficult to scale. It can be useful to think of programming languages as tools. Assembly is the workbench where tools are made and raw materials are handled. C++ is like your standard set of machining tools. Java is like a bag of tools, hammers, handsaws. Python is like an electric nail gun, or perhaps a gas chainsaw. It does many of the things the other languages say that you can do directly out of the box and often in a manner that’s more sensible to people initially using a tool.
But when it comes to building a house and all you have is a chainsaw? You’re gonna have some very cool log cabins, and if we go outside the analogy for a moment: code that uses tricks to accomplish what others would do with better building, architecture, or execution.
Now that said, how many people do you know who might go off and build a log cabin vs. a modern home on their own?
It's because Python is both simple and easy to use, and complicated and hard to understand.
It's a Janus of a programming language. There are lots of complexities that alternately tie software engineers up in knots and let them go into raptures of blissful hackery. But they generally aren't all that visible in the face it presents to data scientists and devops.
And it's hard to make sweeping statements about "big or performance critical". A couple times in the past few years years I've seen or been involved in projects that successfully replaced big Java applications with Python implementations that were 1% the SLOC and had better performance. They were special situations, to be sure, though more so, I think, in terms of explaining the performance improvement than the cost savings.
I've been writing some python the last few days. Today I did something terrible. I like to believe it was out of wisdom rather than ignorance. I needed to do some nuanced bit manipulations, and so I built up a string of ascii 0's and 1's, sliced the string, and then converted back to integers.
I worked on a codebase many years ago when I was an intern. That code would read frames off a CAN bus in a car. For some reason, it could only handle about 100 frames per second before running out of CPU. It turned out someone had done the same; they turned every 8 byte CAN payload into a string of 64 characters, sliced the relevant fields out, and then converted back to integers. There were about 2000 frames per second coming in, and it was just way too inefficient.
In my usage, it will maybe never happen, outside of a unit test I wrote for it. I am really tempted to rewrite it tomorrow to use but shifts...
Reminds me of that Tao of Programming koan (if I remember it rightly) that went something like:
The novice, in his frustration, struck the side of his computer. The master walked over and asked what he was doing. The novice exclaimed, "my computer is not working and I do not know why!" The master admonished him, saying "you cannot solve the problem by striking the computer without knowing what is wrong." Then the master struck the side of the computer, and it worked flawlessly.
(I love this one because one time I was doing a gig and my computer refused to boot. Suspecting that the vibration from being lugged into the car and taken for a long drive had unseated something slightly, I gave it a sharp smack and power cycled it, and it started up fine.)
> A novice was trying to fix a broken Lisp machine by turning the power off and on.
> Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
> I built up a string of ascii 0's and 1's, sliced the string, and then converted back to integers.
How about a list of integers:
$ txr
This is the TXR Lisp interactive listener of TXR 274.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
This could be the year of the TXR desktop; I can feel it!
1> (digits 37)
(3 7)
2> (digits 37 2)
(1 0 0 1 0 1)
3> (reverse *2)
(1 0 1 0 0 1)
4> (mapcar (op - 1) *3)
(0 1 0 1 1 0)
5> (poly 2 *4)
22
> Today I did something terrible. I like to believe it was out of wisdom rather than ignorance. I needed to do some nuanced bit manipulations, and so I built up a string of ascii 0's and 1's, sliced the string, and then converted back to integers.
At this point, I think maybe Python ceases to be the correct tool for the purpose. There comes a point where trying to squeeze every ounce of performance out of something that is, relatively speaking, not focusing on raw performance is just stubbornness. Write the slow part in C/C++/Rust/Zig/Nim or something and be done with it instead of making your Python unreadable. At least your native code can be idiomatic instead of being filled with arcane tricks.
Not to be shit-eating, but I never understood why people use python instead of Go other than for ML teams. I program mostly in Java and Go so would love a perspective.
Probably for the same reasons people use Java and Go instead of C. It's just more convenient sometimes. But sometimes not.
In terms of building a team at a company? In my experience it's been arbitrary. Manager / Lead has experience in language X, decides to hire people who also know it. Or all the other teams are already using language X. Or Manager / Lead has heard "X language is good at Y" and decides to go with that. Or there's simply 10x more engineers (and cheaper) available for language X than Y.
The times I've seen a language picked for a particular purpose:
- Perl/Python used for web apps. It's interpreted so you can just upload your source code and refresh your browser. Faster and simpler than having to compile/package it, lots of useful frameworks and modules, lots of developers.
- Erlang/OTP for telecom. Kind of on the nose, but there you are.
Yeah I get that, the same reason why I question some of the teams I've been on choice of JS frameworks, it's just what the developers were familiar with.
Go compiles so fast and reads cleaner imo, that I never saw the benefit of python. I always thought debugging python was clunky, could be unfamiliarity with interpreted languages. Also a few times I needed to use an ODBC driver with python left a bad taste in my mouth. But overall I get what you're saying, my new team writes all of their lambdas in python so I'm going to have to learn.
I use mostly Python, with a little Golang in my job and my reasoning is:
1. Most of what I am doing is calling other programs, often over ssh. The speed difference between the two is going to be tiny (where it is not, I use Golang)
2. Python is a more ergonomic language to work in a lot of the time. Wrapping a series of steps in a try block and then handling the errors in one go (very common in the things I am doing) is just easier to write, easier to maintain, and easier to read than what I have to do in Golang. And there are a ton of modules in Python that are just easier to work with than in Golang. And the resistance to the 'while... else' and 'for... else' patterns just saddens me. And the 'for' implementation in Golang often makes me want to tear my hair out... why make it hard to do this by reference?
3. Even in places where Golang should be much better, sometimes it is a challenge. For example when I am trying to parallelize something, but want to only have N number of workers going at a time. In Python I just use the worker pattern and I am done, I can even feed from one set of workers to another. In Golang I have to use a limited channel, and be careful that I block on that channel before I do anything that is going to eat a lot of memory (otherwise I have protected the CPU, but not memory resources). It feels like I am fighting the language there.
For what it's worth, go is still "new" in my book. I'm seeing 2012 for the first 1.0 public release. In contrast, we've had python since 1991. There's a large number of programmers who haven't had a chance to be exposed to go. I have used it, and enjoyed it. I'd expect, but haven't confirmed, that python has a larger collection of libs (especially long tail).
Although, looking now I'm seeing a short support window for go releaI. That can hurt uptake.
Like go 1.18 was released 2022-03-15 and will EOL Q1 2023.
Lots of projects want long term support so that churn isn't desired. I'm curious how teams handle that.
> Not to be shit-eating, but I never understood why people use python instead of Go
I've got a few sysadmin type scripts I wrote in perl that I'm currently rewriting in python because other people will use and need to update those things and they will be more likely to know python than perl. I can't imagine they'll be more likely to know Go either.
You can use numpy just fine. Unlike Cython, mypyc doesn’t give you a way of accessing numpy’s C API (because you’re basically just writing Python code).
I would advise against optimizing your Python programs for speed. Fighting against a language's nature never leads to good results, it's much easier and better to use a faster language if (or where) you need the speed.
This is even more relevant if you are still learning the language. Focus on learning it, and leave arcane always changing implementation details for after you know it well.
Or rather, focus on the big things. Algorithmic complexity etc.
Also for all languages where you want to do performance work: Learn to use a profiler, so you can find the things that matter. And then selectively look what you can improve there. Even in Python, some hacks or slightly unergonomic patterns in a really hot loop can be worth a lot.
There is currently a "Faster CPython"[1] project going on that seek to improve the speed of Python. With Python 3.11 later this year there will be several speed improvements targeting common uses of Python, which might make some of these existing speedup tricks irrelevant. Or at least less relevant.
- Get the overall system structure in Python. Get the architectural design right, and the big-O stuff right.
- If there are bottlenecks, re-code those directly in C, cython, or similar. Or better yet, find libraries.
Python is great for expressing high-level operations and system design. It is also very easy to integrate with native code. I've never had much happiness in optimizing Python itself beyond that. Broadly speaking, code falls into three categories:
- Most code: Instant. Performance doesn't matter. Python
- Some code: Big-O(lifespan of the universe). Not worth building.
- Narrow slice of stuff in between: Don't do in Python, but use Python as the glue.
There are plenty of things for which python is not slow, but people using python don't know a lot about programming, and end up with a slow result.
A couple of advises:
- the right algo will go a long, long way.
- know your data structures. E.G: assigning to a slice is ridiculously fast (even while unpacking), memory views may save a lot on byte heavy workloads, heapq and deque are underrated, etc. Also check out https://wiki.python.org/moin/TimeComplexity for big O notations on python builtin types common operations to understand what you pay for.
- know the stdlib. collections, itertools and functools all contain incredible gems.
- delegate. Python is a fantastic glue language, use it for what it's good at. Your database, your numpy arrays, your cache are all amazing are what they do, no matter the language. Let them do the heavy work.
- don't kill good perfs by ignorance. I regularly see people casting a generator, iterating on a dataframe, calling readlines() on a file or doing something else that is destroying the otherwise excellent perfs of their program.
- know the ecosystem. There are some very good fast libs out there: diskcache, sortedcontainer, scipy, uvloop...
- use threads to avoid blocking a GUI, multiprocess to share work between CPU and asyncio to speed up network operations. Each tool has a sweet spot. But threads are underrated, they work well for a couple of hundred parallel network operations, and most C libs will actually release the GIL, so they can use several CPU more often than you'd thin. Also, use pools if you can, shared_memory in 3.8 or mmap.
- sometime the dirty solution is just faster, like subprocessing to ffmpeg.
- The more recent, the slower. Sure, I love statistics, pathlib and dataclasses. In a regular code they are great. On a bottleneck however, they are very slow.
- the array module is not supposed to speed up the code, only save memory. But sometimes it does.
- printing to the terminal is limiting. Sometime your program is doing fine, the display is preventing it to go faster. At least check the flush.
- comprehensions are faster than alternatives.
- pre-allocating lists and dicts can help.
- measure. The austin profiler is your friend.
- rewriting the hot path in a faster language is likely more interesting than writing the whole program in it. A bit of nim or rust is easy to call from python.
A big thank you! This is an amazing list of things to look into and keep in mind. Many points are obscure to me, but I'm going to print this and keep it at hand. Thanks again.
Thanks for writing this! I just reproduced your experiment and could get the ~10x performance gain (0.42/0.02) on my mac m1 (I had to uninstall typing - which hopefully gets fixed soon). I think this is a great way of attacking both type checking, as well as a meaningful speedup in the CI cycle.
I sort of wish the precomputed template string was just a hard coded string literal. There's a lot of code dedicated to that, and you only need to understand the output it computes.
Other than that, I think the python code is quite readable and understandable with the context provided in the post.
I played around with cython once. It was pretty cool! It couldn't handle simple list comprehensions properly, though, and would cause fun crashes like reference counting `None` down to 0 and then garbage collecting it.
Does anyone have any personal experience in applying mypyc on type-hinted, but otherwise not specially optimized python code? I'm especially interested in what kind of performance speed-ups could be achieved.
After reading the article last night. I spent the whole evening about 5 hours getting one of my DOM libraries to compile with mypyc. It's a hacky codebase with little to no type hinting. I had to rewrite a fair abit to appease mypy and in most cases just used 'Any'. It's still got runtime issues and is buggy but I got at least a 2x speed increase on rendering a single node... https://github.com/byteface/htmlx/blob/mypyc/benchmarks.md
However I was unable to compile from my mac, could only compile using linux. but that could be as I'm using older version of dev tools.
mypy compiles itself with mypyc, for about a 3.5-4x speed up. black very recently started shipping mypyc compiled binaries, for about a 2x speed up (iirc).
A big portion of python's slowness to start up is that 'import' actually evaluates any code at the top level of the file it's importing.
If you just have function and class definitions, this isn't too bad, but when you start doing things like setting up caches, reading files, and testing whether or not there's a GPU at the top level, you can add several seconds to the startup and balloon your memory usage. One wrong import somewhere in your code base, and your application will craw to a standstill on launch.
He says no inlining nor unrolling of code, but isn't the precompute_template (which generates pregenerated output) basically inlining+unrolling over the 3*5 combinations?
It really is a shame that "it's fast enough" is repeated without saying what for. It really is fast enough for scripting logic, that so many other aspects would benefit imensly if more effort were put into speeding up the standard interpreter.
I find Python to be a great language to describe business logic, and honestly it should end in 10 years to be within at most 2x slower than Javascript.
"Mypyc is currently alpha software. It’s only recommended for production use cases with careful testing, and if you are willing to contribute fixes or to work around issues you will encounter."
At least for now, I wouldn't use it in production.
The obvious catch is that the compiled binary is not portable. You need to compile and deploy for each platform separately. You don't need to worry about platforms at all when deploying/sharing pure Python modules.
At least this is honest.
No matter what the script does, no matter how fast the libraries, the Python intepreter has a slow startup time.
On multiple occasions I have seen people commenting on HN argue that Python is not slow.
I think for these commenters Python is "fast enough".
For others, like me, it may not be "fast enough".
IOW, the question is not whether Python is "fast" or "slow", but whether it is "fast enough" or "too slow" for a given user.
For some users, it might not be fast enough. And that's fine.