This reminds me of an SO user that was eventually banned because he (or possibly she, we will never know) had a hobby of writing fairly detailed questions that looked totally innocent on the surface but got strange once you started digging into them. Those that suffered through would eventually realize the whole thing was masterfully crafted to waste your time. There was once an excellent collection of these but I can't find it now and wish for the life of me I could.
What you did will make inputting a 4 yield a 5, but it won't change the behaviour of outputting a 4. (And it will only turn an input 4 into a 5 if your interpreter checks for words before checking for numbers which is not universally the case).
Is there really anything unexpected here? I would have thought that it is obvious that when importing some library, that library will execute code already at import time. And that you can do strange things, esp when messing with the underlying C structures, should also be clear.
Even in C itself, you can do the same in shared libraries. It's a very important functionality that the library can run some init code when it is being loaded. And you can do all kinds of strange things, modifying some random memory.
The CPython source is really a bit of a joy to look through. The lowest level stuff is really tough, but generally it's all relatively straightforward.
Though with the specializing compiler work, some stuff is... less obvious. But I still generally find it straightforward when I want to know some detail about how the language works.
You get an hint that Python's import is not a simple "add name to scope" when importing a builtin package opens a web browser:
import antigravity
As others have mentioned, every line of code not in a function/class gets executed when imported, except if guarded with an `if __name__ == '__main__':` (only true when executing the script with `python xxx.py`). A related catch: functions' default arguments are also evaluated when imported the first time.
And this is why optimizing Python code is really hard. When at runtime you can change almost any aspect of the language it's virtually impossible to give a semantics for Python code beyond "run it and find out".
While optimizing Python code is indeed really hard, this is not a good example of why.
It uses implementation-specific details which are outside of the scope of anything to do with Python semantics.
It's roughly equivalent to:
#include <stdio.h>
int nine = 9;
int main(void) {
printf("nine = %d\n", nine);
return 0;
}
/* in a library */
__attribute__((constructor))
static void sneaky(void) {
int *n = (int *) &nine;
*n = 8;
}
Your hyperbole simply isn't true, as demonstrated by the many static code analysis tools for Python. They can't handle all cases, certainly, but they demonstrate it's mostly possible to give semantics for Python code without running it.
I dunno, the fact that integers are by (often mutable) references has to make it really difficult for optimization.
You don't have to be "sneaky" for this to bite you with Python. Maybe it looks obvious when stated in a bare-bones fashion but this bug was not easy to track down in a larger code base:
i = 1
incr_by_1 = lambda x : x + i
i = 4
incr_by_4 = lambda x : x + i
i is a reference in both incr_by_1 and incr_by_4 are equivalent at this point. If anyone assigns to i, then their behaviour will change.
In most languages, integers are values so an optimiser has a chance to (for example) replace incr1 by a single CPU increment instruction but can't do it here as the value "pointed to" by i needs to be fetched according to Python semantics.
Agreed! Optimizing Python code is indeed really hard, and the lack of const and ability to describe capture semantics don't help.
To be fair, the equivalent in C++ is:
#include <iostream>
int main() {
int i;
i = 1;
auto incr_by_1 = [&](int x) {return x + i;};
i = 4;
auto incr_by_4 = [&](int x) {return x + i;};
std::cout << incr_by_1(0) << " and " << incr_by_4(0) << std::endl;
return 0;
}
which prints "4 and 4". Replace the first [&] with [i] and it prints "1 and 4".
A Python implementation also can't replace incr1 by a single CPU increment instruction because it doesn't know the type of x.
That's still a far cry from being unable to give semantics for Python code without running it.
Not really. Stuff like this is shown around from time to time as a massive "gotcha" kind of thing for a few languages, but it's really just the nature of boxed primitives and interning (=literal bindings).
This isn't a bug, it's absolutely expected behaviour. The author has just dressed it up in a blog post to make something of it, but anyone who has written a python library will know that code that isn't in a function (including function default arguments) gets evaluated on import. You don't need a C-extension to do that part. Then he messes with some internals, which isn't surprising either since python's philosophy is very much "internals are available - caveat emptor but if you want to mess with them go nuts".
That’s clever, but illustrates something not widely appreciated:
When you import a module, Python executes it. For instance, `def` isn’t syntax that says “hey compiler, this is a function!” It’s a statement that’s executed at runtime to define a function. You can put any code you want at the top level of a module and it’ll get executed when the Python interpreter gets to that line.
IMO Python imports behave like the bash source command.
This is why people use the `if __name__ == "__main__"` so the majority of people will address it in all their scripts even if they don't know the reason why.
It's a feature not a bug IMO. You can use importing a .py file as a singleton hack. You can also use `refresh` to re-load a module, to clear it of any runtime overrides.
Python inports collect the locals in the "module script" and store them in a module object, which is then made available to the module that ran `import`. That module object is cached, so reimporting the module another time will not run the code again.
Python imports are much more principled than sourcing bash though. They are executed in a new namespace, and subsequent imports reference that namespace directly instead of re-evaluating the code.
C extensions don't significantly change matters because the module is still constructed by procedural C code.
It’s definitely a feature! Just one that’s often not understood. If you include a file in a C program, that codes just sitting there until you call it (more or less, yada yada #define, etc.). If you import a Python module, it executes the code in it. That code is typically a set of statements that defines functions and classes, but it could be anything.
This reminds me of one of my internet white whales. It's very similar to this, but it goes through a ton of different ways that one `import malevolent` or something alone those lines and completely change how normal Python works. Mentioning it here in case anyone remembers it!
Because every Python object also contains a reference count (which needs to be modified whenever the object is passed around), a `const PyObject*` is effectively useless.
This special casing just had not been implemented yet. But as it is an interesting optimization, more so with multi-interpreter or no-GIL Python, the developers will actually introduce immortal objects in Python 3.12 to avoid counting references on some objects (PEP 683 has been accepted):
cpython caches the small integers, and this is just grabbing a reference to 8 and 9 and then altering the value of the integer held in each cached reference.
The author could have skipped the c library and used the ctypes module to munge the bits.
There's no guarantee that any other version of python would use the same caching, same structure layout and certain not be able to link with the same c library.
So, yeah, it's specific.
A fun little adventure into how things work for the author, though.
This seems to be a C extension so will likely only work with CPython, but since Python is so dynamic you can do all sorts of weird things just using plain Python.
https://codegolf.stackexchange.com/questions/28786/write-a-p...
Really shows you the skeletons hiding in some languages. My favorite is Haskell, which will happily do what you tell it to.