Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's always POC apps in js or python, or very small libraries in other popular languages with good structure from the start. There are ways to make them somewhat better in other cases (automated testing/validation/linting being a big one), but for the type of thing that 95% of developers are doing day to day (working on a big, sprawling code base where none of those attributes apply), it's not close to being there.

The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.

LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.



I like them for refactoring and “explain this massive codebase please”. Basically polishing or investigating things that already work.

But I think we should expect the scope of LLM work to improve rapidly in the next few years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...


The bad news is that mostly, as far as we can see, that doubling of performance also requires (at least) doubling of resource usage, plus we're getting close to a point where planetary resources for doubling LLM resources are getting kind of low...


This species is going extinct. I finally accepted that when my dad died rather than change his lifestyle, despite being warned 10000x. My mom survived a heart attack, saw what happened to my dad, still hasn't changed her lifestyle.


Hmm, I got Claude Opus to build me a game in Rust. I don’t think it really counts as POC app any more at that point.


It absolutely counts as a POC app until it's production grade, deployed, being used by people, maintained over time, etc.

This doesn't mean that it's not useful, or that you shouldn't be happy with what the LLM built. I also had Claude Code build me a web app for my own personal use in Rust this week. It's very useful to me. But it is 100% of POC/MVP quality, and always will be, because the code that it created is abjectly awful and I would never be able to scale it into a real world service without rewriting 50+% of it.


> They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

Sorry, but this is just not true.

I'm using agents with a totally idiosyncratic code base of Haskell + Bazel + Flutter. It's a stack that is so quirky and niche that even Google hasn't been able to make it work well despite all their developer talent and years of SWEs pushing for things like Haskell support internally.

With agents I'm easily 100x more productive than I would be otherwise.

I'm just starting on a C++ project, but I've already done at least 2 weeks worth of work in under a day.


I’m going to ask what I’ve asked the last person here who said they are “10-20x” more productive:

If you’re really that more productive, why don’t you quit your job and vibecode 10 ios apps (in your case that would be 50 to 100 proportionally)


Because money? Even if you can quickly build them it’s pointless if you can’t sell them. And Claude cannot help with that.


Share the codebase and what you're doing or, I'm sorry, you're just another example of what I laid out above.

If you honestly believe that "agents" are making you better than Goole SWEs then you severely need to take a step back and reevaluate, because you are wrong.


Hold the phone. So, Google, with its legions of summa cum laude engineers, can't make this stack work well, but your AI agent is nailing it into next week? Seriously, show me the way, so I too may find AI enlightenment.


What do you mean “with agents”?


I've been using mainly gemini-cli and am starting to play around with claude code.


Are you referring to those as agents or do you mean spinning separate/multiple agents out of sessions on them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: