> Is this a naive fantasy on my part, or actually possible? Possible, yes, desir...

> Is this a naive fantasy on my part, or actually possible?

Possible, yes, desirable, no.

The issue I have with all these end-to-end models is that they're a massive regression. Practitioners fought tooth and nails to get programmers to acknowledge correctness and security aspects.

Mathematicians and computer scientists developed theorem solvers to tackle the correctness part. Practitioners proposed methodologies like BDD and "Clean Code" to help with stability and reliability (in terms of actually matching requirements now and in the future).

AI systems throw all this out of the window by just throwing a black box onto the wall and scraping up whatever sticks. Unit tests will never be proof for correctness - they can only show the presence of errors, not their absence.

You'd only shift the burden from implementation (i.e. the program) to the tests. What you actually want is a theorem prover that proofs the functional correctness in conjunction with integration tests that demonstrate the runtime behaviour if need be (i.e. profiling) and references that link implementation to requirements.

The danger lies in the fact that we already have a hard time getting security issues and bugs under control with software that we should be able to understand (i.e. fellow humans wrote and designed it). Imagine trying to locate and fix a bug in software that was synthesised by some elaborate black box that emitted inscrutable code in absence of any documentation and without references to requirements.