I would advise against optimizing your Python programs for speed. Fighting against a language's nature never leads to good results, it's much easier and better to use a faster language if (or where) you need the speed.
This is even more relevant if you are still learning the language. Focus on learning it, and leave arcane always changing implementation details for after you know it well.
Or rather, focus on the big things. Algorithmic complexity etc.
Also for all languages where you want to do performance work: Learn to use a profiler, so you can find the things that matter. And then selectively look what you can improve there. Even in Python, some hacks or slightly unergonomic patterns in a really hot loop can be worth a lot.
There is currently a "Faster CPython"[1] project going on that seek to improve the speed of Python. With Python 3.11 later this year there will be several speed improvements targeting common uses of Python, which might make some of these existing speedup tricks irrelevant. Or at least less relevant.
- Get the overall system structure in Python. Get the architectural design right, and the big-O stuff right.
- If there are bottlenecks, re-code those directly in C, cython, or similar. Or better yet, find libraries.
Python is great for expressing high-level operations and system design. It is also very easy to integrate with native code. I've never had much happiness in optimizing Python itself beyond that. Broadly speaking, code falls into three categories:
- Most code: Instant. Performance doesn't matter. Python
- Some code: Big-O(lifespan of the universe). Not worth building.
- Narrow slice of stuff in between: Don't do in Python, but use Python as the glue.
There are plenty of things for which python is not slow, but people using python don't know a lot about programming, and end up with a slow result.
A couple of advises:
- the right algo will go a long, long way.
- know your data structures. E.G: assigning to a slice is ridiculously fast (even while unpacking), memory views may save a lot on byte heavy workloads, heapq and deque are underrated, etc. Also check out https://wiki.python.org/moin/TimeComplexity for big O notations on python builtin types common operations to understand what you pay for.
- know the stdlib. collections, itertools and functools all contain incredible gems.
- delegate. Python is a fantastic glue language, use it for what it's good at. Your database, your numpy arrays, your cache are all amazing are what they do, no matter the language. Let them do the heavy work.
- don't kill good perfs by ignorance. I regularly see people casting a generator, iterating on a dataframe, calling readlines() on a file or doing something else that is destroying the otherwise excellent perfs of their program.
- know the ecosystem. There are some very good fast libs out there: diskcache, sortedcontainer, scipy, uvloop...
- use threads to avoid blocking a GUI, multiprocess to share work between CPU and asyncio to speed up network operations. Each tool has a sweet spot. But threads are underrated, they work well for a couple of hundred parallel network operations, and most C libs will actually release the GIL, so they can use several CPU more often than you'd thin. Also, use pools if you can, shared_memory in 3.8 or mmap.
- sometime the dirty solution is just faster, like subprocessing to ffmpeg.
- The more recent, the slower. Sure, I love statistics, pathlib and dataclasses. In a regular code they are great. On a bottleneck however, they are very slow.
- the array module is not supposed to speed up the code, only save memory. But sometimes it does.
- printing to the terminal is limiting. Sometime your program is doing fine, the display is preventing it to go faster. At least check the flush.
- comprehensions are faster than alternatives.
- pre-allocating lists and dicts can help.
- measure. The austin profiler is your friend.
- rewriting the hot path in a faster language is likely more interesting than writing the whole program in it. A bit of nim or rust is easy to call from python.
A big thank you! This is an amazing list of things to look into and keep in mind. Many points are obscure to me, but I'm going to print this and keep it at hand. Thanks again.
A book, a website, a cheat sheet or even a MOOC or part of a MOOC?