Understanding how a given CPU (+ the other computer hardware) works, does not suffice to understand what is going on when a particular program is running. For that, you need to either read the program, or an execution trace, or both, or something along these lines, which is specific to the program being run.
This is the wrong analogy. The transformer block is a bunch of code and weights. It's a set of instructions laying out which numbers to run which operations on. The optimizer changes weights to minimize a loss function during training and then the code implementing a forward pass just runs during inference. That's what it is doing. It's not doing something else.
If the argument is that a model is a function approximator, then it certainly isn't approximating some function that performs worse at the task at hand, and it certainly isn't approximating a function we can describe in a few hundred words.
There is pretty good reason. If it could be described explicitly in a few hundred words, it would be extremely unlikely that we'd have seen a jump in capability with model size.