It's always POC apps in js or python, or very small libraries in other popular languages with good structure from the start. There are ways to make them somewhat better in other cases (automated testing/validation/linting being a big one), but for the type of thing that 95% of developers are doing day to day (working on a big, sprawling code base where none of those attributes apply), it's not close to being there.
The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.
I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.
LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.
The bad news is that mostly, as far as we can see, that doubling of performance also requires (at least) doubling of resource usage, plus we're getting close to a point where planetary resources for doubling LLM resources are getting kind of low...
This species is going extinct. I finally accepted that when my dad died rather than change his lifestyle, despite being warned 10000x. My mom survived a heart attack, saw what happened to my dad, still hasn't changed her lifestyle.
It absolutely counts as a POC app until it's production grade, deployed, being used by people, maintained over time, etc.
This doesn't mean that it's not useful, or that you shouldn't be happy with what the LLM built. I also had Claude Code build me a web app for my own personal use in Rust this week. It's very useful to me. But it is 100% of POC/MVP quality, and always will be, because the code that it created is abjectly awful and I would never be able to scale it into a real world service without rewriting 50+% of it.
> They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.
Sorry, but this is just not true.
I'm using agents with a totally idiosyncratic code base of Haskell + Bazel + Flutter. It's a stack that is so quirky and niche that even Google hasn't been able to make it work well despite all their developer talent and years of SWEs pushing for things like Haskell support internally.
With agents I'm easily 100x more productive than I would be otherwise.
I'm just starting on a C++ project, but I've already done at least 2 weeks worth of work in under a day.
Share the codebase and what you're doing or, I'm sorry, you're just another example of what I laid out above.
If you honestly believe that "agents" are making you better than Goole SWEs then you severely need to take a step back and reevaluate, because you are wrong.
Hold the phone. So, Google, with its legions of summa cum laude engineers, can't make this stack work well, but your AI agent is nailing it into next week? Seriously, show me the way, so I too may find AI enlightenment.
The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.
I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.
LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.