Humans are unreliable, but we are also under normal circumstances thoroughly and...

Humans are unreliable, but we are also under normal circumstances thoroughly and continually grounded in an external world whose mechanics we interact with, make predictions about, and correct our beliefs about.

The specific way we're training coding assistants for next-token-prediction would also be an incredibly difficult context for humans to produce code.

Suppose you were dropped off in an society of aliens whose perceptual, cultural and cognitive universe is meaningfully different from our own; you don't have a grounding in concepts of what they're trying to _do_ with their programs. You receive a giant dump of reams and reams of source code, in their unfamiliar script, where none of the names initially mean anything to you. In the pile of training material handed to you, you might find some documentation about their programming language, but it's written in their (foreign, weird to you) natural language, and is mixed with everythign else. You never get a teacher who can answer questions, never get access to a IDE/repl/interpreter/debugger/compiler, never get to _run_ a program on different inputs to see its outputs, never get to add a log line to peek at the program's internal state, etc. After a _lot_ of training, you can often predict the next symbol in a program text. But shouldn't we _expect_ you to be "unreliable"? You don't have the ability to run checks against the code you produce! You don't get a warning if you use a variable that doesn't exist! You just produce _tokens_, and get no feedback.

To the degree humans are reliable at coding, it's because we can simulate what program execution will do, with a level of abstraction which we vary in a task dependent way. You can mentally step through every line in a program carefully if you need to. But you can also mentally choose to trust some abstraction and skip steps which you infer cannot be related to some attribute or condition of interest if that abstraction is upheld. The most important parts of your attention are on _what the program does_. This is fully hidden in the next-token-prediction scenario, which is totally focused on _what tokens are used to write the program_.