> this erudite fool is at our disposal and answers all the questions asked of them,
Yes, but I have to double-check every answer. And that, for me, greatly mitigates or entirely negates their utility. Of what value is a pocket calculator that only gets the right answer 75% if the time, and you don't ex ante know what 75%?
- I can read the code and reading code is faster than writing it.
- I can also tell the llm to write tests for the code it wrote and i can validate that the tests are valid.
- LLMs are also valuable in introducing me to concepts and techniques I would never had had exposure to. For example, I have a problem and explain my problem, it will bring up technologies or terms I never considered because I just didn't know about them. I can then do research into those technologies to decide if they are actually the right approach.
Perhaps you've omitted some important context here, or you're using an extremely restricted definition of "works"? The interesting and hard question with software is not "did it compile" but rather "did it meet the never-clearly-articulated needs of the user"...
I would agree that it is a primary goal of software engineering to move as much as possible into the category of automatic verification, but we're a long, long way from 99%.
I think that antirez is technically correct in that there is a vast amount of code that will not compile compared to the amount of code that will compile. So saying '99%' sort of makes sense.
But that doesn't capture the fact that of the code that compiles there is a vast amount of code that doesn't do what we want to happen at runtime compared to the code that does do what we want to happen.
And after that there is a vast amount of code that doesn't do what we want to happen 100% of the time at runtime compared to the code that only most of the times does what we want to happen at runtime.
The interesting thought experiment that came to me when thinking about this was that I would be more likely to trust LLM code in C# or Rust than I would be to trust LLM code in assembly or Ruby.
Which makes me wonder ... can LLMs write working Idris or ATS code?
I've seen people put untested AI hallucinations under review, with non existant function names, passing CI just because it was under debug defines.
I've seen some refer to non existant APIs while discussing migration to a new library major version. "Sure that's easy, we should just replace this function with this new one".
Imagine all those more subtle bugs that are harder to spot.
Yes, but I have to double-check every answer. And that, for me, greatly mitigates or entirely negates their utility. Of what value is a pocket calculator that only gets the right answer 75% if the time, and you don't ex ante know what 75%?