Author here, Thank you all! really glad to see this on the frontpage!
There is a second part that may be of interest: here a python tracer is implemented, one that shows the side effects of each line, as it is executed (it shows the effect of all the various load and store instructions). The objective was to get something that is similar to the set -x built-in of the bash shell.
And me is also looking for a job again ;-( I need a new job in April. So here is my linked-in profile. I also do C++ and Java/Scala. Available on-site in the Tel-Aviv area, considering remote only jobs anywhere else. https://www.linkedin.com/in/michael-moser-32211b1/ E-mail address is in my HN profile.
We're also in the Tel Aviv area (but remote is an option if you prefer) and we've been looking for someone like you who explains technical topics in simple terms.
We have some low level stuff (e.g. I wrote a python debugging tool for Kubernetes which injects debugby into target processes using gdb [1])... but also a lot of higher level stuff in our python framework for k8s automation.
Tried to give you a shoutout on twitter but you aren't on there. Tried to give you a shoutout on Linkedin but it won't let me mention your profile. :) Good luck with your search!
An important thing to note that is a bit buried in this text: Python bytecode changes in every version, sometimes just a little, sometimes a lot. So Python 3.10 has difference in the bytecode instruction set from 3.9, and 3.11 will have differences from 3.10.
>I was suprised to learn, that many bytecode instructions changed in minor releases of the runtime!
That's also how dropbox used to obfuscate their client when it was python. They would ship only pyc files, which is just bytecode. But they would change around the opcodes, map multiple numbers to the same opcode, etc. Then also stream encrypt the pyc file and hide the key inside of it.
Though playing around with this offline is not exactly difficult either: you can just invoke `dis.dis(codestring)` at the command-line, or use `dis.dis` as a decorator when defining a function (it automatically prints out the bytecode).
Sadly `-mdis` requires feeding a file by path or data through stdin, so for mucking around it’s not the best.
> If you are upgrading or downgrading the python interpreter, then you probably should also delete all __pycache__ folders, these folders hold the binary files that hold the compiled bytecode instructions, but you can't be sure that these will work after a version change!
This is incorrect. Python bytecode files are versioned alongside the interpreter, so when CPython finds a __pycache__/*.pyc file which is the wrong version, it will just ignore it and won't cause any problems.
> the stack is maintained separately per each function object
Can someone elaborate on this? Having separate stacks makes sense for coroutines, but does this mean that a normal Python function call allocates a private stack for that function?
All it means is that python bytecode is stack based where most instructions pop arguments and push results on operand stack. In contrast with register based VMs.
When implementing a VM it makes sense to store call stack and operand stack separately so that you don't have to mix types. You probably don't want to allow function to uncontrollably modify operands in lower frames as in most cases that would be either a bug or vulnerability. Having separate operand stack for each frame also makes any kind of analysis much easier. Call instruction can be viewed as a fat instruction which pops some amount of arguments and pushes single result back.
Once you restrict cross frame operand stack access, whether it's stored in single or multiple arrays becomes an implementation detail. Many other VMs do more or less the same JVM, AVM2(flash), CIL(C#). It doesn't necesarily mean that the stacks are separate after JIT but from the perspective of bytecode instructions operand stacks are separate.
CPython compiles Python source code to bytecode, but it never compiles the bytecode to machine code. Instead it interprets the bytecode, reading one instruction at a time, and basically calling a giant switch statement that handles every possible opcode.
A JIT would compile the bytecode to machine code then run it directly (at least for frequently executed code paths). There is no "switch" anymore. Each bytecode instruction has already been replaced by the corresponding machine code.
> Is there any reason why official python doesn't have any JIT option?
Desires to keep the implementation simple and approachable (relatively), as well as avoid issues of performance cliffs and such.
Also the C API has historically been extremely broad and provided large access to what amount to implementation details, making this keep working properly with a jit is difficult (at least for anything but a simplistic macro-ish JIT).
PyPy is not fully compatible with CPython. You won't have the same behaviour and CPython C API is not guaranteed to be fully compatible. So, I'm not sure that having a JIT that is fully compatible is easy.
Most terms in language implementation are fuzzy. But just in time compilation most often refers to switching from interpreting bytecode to (generating machine code and running that) generated machine code in specific spots after having analyzed the currently running bytecode for a while.
"Classical" (again, every term is fuzzy) JIT compilers either do this machine code compilation after seeing a good candidate _entire function_ or a good candidate _section of code within a function_. Good candidates are often areas of code that are executed a large number of times and with consistent internals (e.g. iterating from 0 to 10000 with variables inside that have provably fixed types).
But there are infinite variations of JIT compilation.
In any case, CPython doesn't do that switching from bytecode to generated machine code. Pypy does do that. As does V8 and the JVM and so on.
Why didn't the Python committee opt for a compiled system (like PyPy) when they moved to the 3.0 series (and had to break backward compatibility anyway)?
You are right, i think that Python is trying to be as expressive and succinct as possible. A runtime like pypy is very difficult to change, and it would therefore make it much more difficult to evolve the language.
Oh I see what you mean, I misread the sentence. Switching between language backends has nothing to do with compatibility. Yup makes sense. They can swap implementations at any time.
The point really is that PyPy has some compatibility issues with the C api I think mostly because of the garbage collector. This has less to do with whether you compile or interpret bytecode, yes.
There is a second part that may be of interest: here a python tracer is implemented, one that shows the side effects of each line, as it is executed (it shows the effect of all the various load and store instructions). The objective was to get something that is similar to the set -x built-in of the bash shell.
https://github.com/MoserMichael/pyasmtool/blob/master/tracer...
And it's all part of this advanced python course: https://github.com/MoserMichael/python-obj-system/blob/maste... (well, I am still working on it)
And me is also looking for a job again ;-( I need a new job in April. So here is my linked-in profile. I also do C++ and Java/Scala. Available on-site in the Tel-Aviv area, considering remote only jobs anywhere else. https://www.linkedin.com/in/michael-moser-32211b1/ E-mail address is in my HN profile.