This is awesome overall. A valiant attempt at using AI to create something real. My number one criticism with AI demos is that they are too simple and have too much flexibility to be anything. Once you start adding constraints, the prompts start becoming a lot trickier making it harder to get the desired output.
One observation so far is seeing that there is real effort in configuring the agentic workflow, and getting that to work even semi-consistently, meaning that there is always tweaking and experimentation for the specific use case. This isn't trivial work, and arguably it's a tradeoff, since all that effort could be spend coding the game directly. With enough skills it would be about the same time except better quality to code it by hand (with of course AI assistance where needed, but it would be human written).
This just means that the real value so far with AI is in letting non-programmers do coding, but it doesn't come without the cost of real effort to set things up and less quality in the end.
My experience is the same. In short, agents cannot plan ahead, or plan at a high level. This means they have a blindspot for design. Since they cannot design properly, it limits the kind of projects that are viable to smaller scopes (not sure exactly how small but in my experience, extremely small and simple). Anything that exceeds this abstract threshold has a good chance of being a net negative, with most of the code being unmantainable, unextensible, and unreliable.
Anyone who claims AI is great is not building a large or complex enough app, and when it works for their small project, they extrapolate to all possibilities. So because their example was generated from a prompt, it's incorrectly assumed that any prompt will also work. That doesn't necessarily follow.
The reality is that programming is widely underestimated. The perception is that it's just syntax on a text file, but it's really more like a giant abstract machine with moving parts. If you don't see the giant machine with moving parts, chances are you are not going to build good software. For AI to do this, it would require strong reasoning capabilities, that lets it derive logical structures, along with long term planning and simulation of this abstract machine. I predict that if AI can do this then it will be able to do every single other job, including physical jobs as it would be able to reason within a robotic body in the physical world.
To summarize, people are underestimating programming, using their simple projects to incorrectly extrapolate to any possible prompt, and missing the hard part of programming which involves building abstract machines that work on first principles and mathematical logic.
>Anyone who claims AI is great is not building a large or complex enough app
I can't speak for everyone, but lots of us fully understand that the AI tooling has limitations and realize there's a LOT of work that can be done within those limitations. Also, those limitations are expanding, so it's good to experiment to find out where they are.
Conversely, it seems like a lot of people are saying that AI is worthless because it can't build arbitrarily large apps.
I've recently used the AI tooling to make a docusign-like service and it did a fairly good job of it, requiring about a days worth of my attention. That's not an amazingly complex app, but it's not nothing either. Ditto for a calorie tracking web app. Not the most complex app, but companies are making legit money off them, if you want a tangible measure of "worth".
Right, it has a lot of uses. As a tool it has been transformative on many levels. The question is whether it can actually multiply productivity across the board for any domain and at production level quality. I think that's what people are betting on, and it's not clear to me yet that it can. So far that level looks more like a tradeoff. You can spend time orchestrating agents, gaining some speedup at the cost of quality, or you can use it more like a tool and write things "manually" which is a lot higher quality.
> Anyone who claims AI is great is not building a large or complex enough app
That might be true for agentic coding (caveat below), but AI in the hands of expert users can be very useful - "great" - in building large and complex apps. It's just that it has to be guided and reviewed by the human expert.
As for agentic coding, it may depend on the app. For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot for LLMs, it doesn't have all the messiness of typical business system requirements.
> For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot
I haven't looked into it deeply, but I've seen people claiming to find it useful, which is one metric of success.
Agentic vibe coding maximalists essentially claim that code quality doesn't matter if you get the desired functionality out of it. Which is not that different from what a lot of "move fast and break things" startups also claim, about code that's written by humans under time, cost, and demand pressure. [Edit: and I've seen some very "sloppy and poorly implemented" code in those contexts, as well as outside software companies, in companies of all sizes. Not all code is artisanally handcrafted by connoisseurs such as us :]
I'm not planning to explore the bleeding edge of this at the moment, but I don't think it can be discounted entirely, and of course it's constantly improving.
I'd say it is a success at being useful, but yeah it does seem like the code itself has been a bit of a mess.
I've used a version that had a bd stats and a bd status that both had almost the same content in slightly different formats. Later versions appear to have made them an alias for the same thing. I've also had a version where the daemon consistently failed to start and there were no symptoms other than every command taking 5 seconds. In general, the optimization with the daemon is a questionable choice. It doesn't really need to be _that_ fast.
And yet, even after all of that it still has managed to be useful and generally fairly reliable.
Anything above a simple app and it becomes a tradeoff that needs to be carefully tuned so that you get the most out of it and it doesn't end up being a waste of time. For many use cases and domain combinations this is a net positive, but it's not yet consistent across everything.
From my experience it's better at some domains than others, and also better at certain kinds of app types. It's not nearly as universal as it's being made out to be.
The main problem that I'm seeing is that software design is underappreciated and underestimated. To the extent there is AI hype it is driven by this blind spot. Software isn't just a bunch of text. Software is logical structures that form moving parts that interlock and function based on a ton of requirements and specs of the target hardware.
So far AI has shown it cannot understand this layer of software. There are studies of how LLMs derive their answers to technical questions and it is not based on the first principals or logical reasoning, but rather sparse representations derived from training data. As a result it could answer extremely difficult questions that are well represented in the training data but fail miserably on the simplest kinds of questions, i.e. some simple addition of ten digits.
This is what the article is talking about with small teams with new projects being more productive. Chances are these small teams have small enough problems and also have a lot more flexibility to produce software that is experimental and doesn't work that well.
I am also not surprised the hype exists. The software industry does not value software design, and instead optimize their codebases so they can scale by adding an army of coders that produce a ton of duplicate logic and unnecessary complexity. This goes hand-in-hand with how LLMs work, so the transition is seamless.
I mostly agree with you, especially on software design being underappreciated. A lot of what slows teams down today isn’t typing code, it’s reasoning about systems that have accreted over time. I am thinking about implicit contracts, historical decisions, and constraints that live more in people’s heads than in the code itself.
Where I’d push back slightly is on framing this primarily as an LLM limitation. I don’t expect models to reason from first principles about entire systems, and I don’t think that’s what’s missing right now. The bigger gap I see is that we haven’t externalised design knowledge in a way that’s actionable.
We still rely on humans to reconstruct intent, boundaries, and "how work flows" every time they touch a part of the system. That reconstruction cost dominates, regardless of whether a human or an AI is writing the code.
I also don’t think small teams move faster because they’re shipping lower-quality or more experimental software (though that can be true). They move faster because the design surface is smaller and the work routing is clear. In large systems, the problem isn’t that AI can’t design; it’s that neither humans nor AI are given the right abstractions to work with.
Until we fix that, AI will mostly amplify what already exists: good flow in small systems, and friction in large ones.
Good points. Design has a higher amount of creativity than the implementation based on specs, and AI is missing something that hampers its creativity, if it even has anything analogous to it.
I suspect this is also related to agency, and why we need to spell things out in the prompt and run multiple agents in a loop, not to mention the MoE and CoT, all of which would not be needed if the model could sustain a single prompt until it is finished, creating its own subgoals and reevaluating accordingly. Agency requires creativity and right now that's the human part, whether it's judging the output or orchestration of models.
but they are supporting lot of our task, researching, intelligence reports, content writing, website and cli improvements, among others.
I'll say the key to trust and being able to delegate is the alignment of the agents with your goals.
This post, explain the difference between the 2025 data agents (only data retrievers) ; and what' possible today (agents that act based on trusted data)
I have a solution but I don't think companies care about this level of meta-analysis on how people work together. They think they do but in reality they just care about optics, and the status quo culture has a huge weight on continuing in the same direction, largely dictated by industry "standards".
In essence, estimates are useless. There should only be deadlines and the confidence of engineers of achieving the deadline. To the extent there are estimates, it should be an external observation on the part of PMs and POs based not only on the past but also on knowledge of how each team member performs. This of course only works if engineers are ONLY focusing on technical tasks, not creating tickets or doing planning. The main point of failure in an abstract sense is making people estimate or analyze their own work, which comes with a bias. This bias needs to be eliminated and at the same time you give engineers the opportunity to optimize their workflows and maximize their output.
TLDR, engineers should only focus on strictly technical because it allows to optimize within the domain, meanwhile other roles (whoever PM, PO or other) should be creating tasks, and estimating. Of course this doesn't work because there are hard biases in the industry they are hard to break.
reply