I'm not sure why this is against 'frameworks' per se; if we were sure that the code LLMs could generate was the best possible, we might as well use Assembly, no, since that'd lead to best performance? But we don't generally, we still need to validate, verify and read it. And in, that, there is still some value in using a framework since the code generated is likely, on the whole, to be shorter and simpler than that not using a framework. On top of that, because it's simpler, I've at least found that there's less scope for LLMs to go off and do something strange.
To a degree but most enterprise focused software usually has differential pricing. Often that pricing isn't public so different companies get different quotes.
The other thing is bringing in the knowledge about what other customers in the same field want. For business-focused software this can be a boon, customers often can't really envision the solution to their problem, it's like the Henry Ford attributed "If I had asked people what they wanted, they would have said faster horses"
Until a given company decides they need access control for their contractors that's different from their employees, etc. etc. etc. - seen it all before with internal often data scientist written applications that they then try to scale out and run into the security nightmare and lack of support internally for developing and taking forward. Usually these things fizzle out when someone leaves and it stops working.
Most people who've been in a business SaaS environment know that writing the software is relatively the easy part aside from in very difficult technical domains. The sales cycle + renewals and solution engineering for businesses is the majority of the work, and that's going nowhere.
I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
LLMs do a cool parlour trick; all they do is predict “what should the next word be?” But they do it so convincingly that in the right circumstances they seem intelligent. But that’s all it is; a trick. It’s a cool trick, and it has utility, but it’s still just a trick.
All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.
So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.
Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.
been experimenting with the same flow as well, it is sort of the motivation behind this project - to streamline the generate code -> detect gaps -> update spec -> implement flow.
curious to hear if you are still seeing code degradation over time?
That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.
Create an AGENTS.md that says something like, "when I tell you to do something in a certain way, make a note of this here".
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.
The exact opposite is true in some places: being a massive asshole is often rewarded because those people are able to bully others into getting what the assholes want done. Only it's dubbed "influencing". The higher up you go the more toxic and defensive the political landscape.
reply