Perhaps you've omitted some important context here, or you're using an extremely restricted definition of "works"? The interesting and hard question with software is not "did it compile" but rather "did it meet the never-clearly-articulated needs of the user"...
I would agree that it is a primary goal of software engineering to move as much as possible into the category of automatic verification, but we're a long, long way from 99%.
I think that antirez is technically correct in that there is a vast amount of code that will not compile compared to the amount of code that will compile. So saying '99%' sort of makes sense.
But that doesn't capture the fact that of the code that compiles there is a vast amount of code that doesn't do what we want to happen at runtime compared to the code that does do what we want to happen.
And after that there is a vast amount of code that doesn't do what we want to happen 100% of the time at runtime compared to the code that only most of the times does what we want to happen at runtime.
The interesting thought experiment that came to me when thinking about this was that I would be more likely to trust LLM code in C# or Rust than I would be to trust LLM code in assembly or Ruby.
Which makes me wonder ... can LLMs write working Idris or ATS code?
I've seen people put untested AI hallucinations under review, with non existant function names, passing CI just because it was under debug defines.
I've seen some refer to non existant APIs while discussing migration to a new library major version. "Sure that's easy, we should just replace this function with this new one".
Imagine all those more subtle bugs that are harder to spot.