Dothey do requirement gatherings? Like talking to stakeholder and getting their input of what the feature should, translating business jargon to domain terms?
No.
Do they do the analysis? Removing specs that conflict with each other, validating what's possible in the technical domain and in the business domain?
No.
Do they help with design? Helping coming up with the changes that impact the current software the least, fitting in the current architecture and be maintainable in the feature.
All they do is pattern matching on your prompt and the weights they have. Not a true debate or weighing options based on the organization context.
Do they help with coding?
A lot if you're already experienced with the codebase and the domain. But that's the easiest part of the job.
Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?
I don't know as I haven't seen any demo on that front.
Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?
Why do you think any of these should be a challenge for, say, O3/O3 pro?
You pretty much just have to ask and give them access for these things. Talking to a stakeholder and translating jargon and domain terms? Trivial. They can churn through specs and find issues, none of that seems particularly odd to ask of a decent LLM.
> Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?
This is pretty standard in agentic coding setups. They'll fix up broken tests, and fix up code when it doesn't pass the test. They can add debug statements & run to find issues, break down code to minimal examples to see what works and then build back up from there.
> Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?
Yes - dependency updates is probably the easiest. Have it read the changelogs, new api docs and look at failing tests, iterate to have it pass.
These things are progressing surprisingly quickly so if your experience of them is from 2024 then it's quite out of date.
> Do they do requirement gatherings? Like talking to stakeholder and getting their input of what the feature should, translating business jargon to domain terms?
No.
Why not? This is a translation problem so right up its alley.
Give it tool access to communicate directly with stakeholders (via email or chat) and put it in a loop to work with them until the goal is reached (stakeholders are happy). Same as a human would do.
And of course it will still need some steering by a "manager" to make sure it's building the right things.
> Why not? This is a translation problem so right up its alley.
Translating a sign can be done with a dictionary. Translating a document is often a huge amount of work due to cultural difference, so you can not make a literal translation of sentences. And sometimes terms don't map to each other. That's when you start to use metaphors (and footnotes).
Even in the same organization, the same term can mean different things. As humans we don't mind when terms have several definitions and the correct one is contextual. But software is always context free. Meaning everything is fixed at its inception and the variables govern flow, not the instruction themselves ("eval" instruction (data as code) is dangerous for a reason).
So the whole process is going from something ambiguous and context dependent, to something that isn't. And we do this by eliminating incorrect definitions. Tell me how LLMs is going to help with that when it has no sense of what correct and what it is not (aka judging truthness).
No they don't do requirements gathering, they also don't cook my food and wash my clothing. Some things are out of scope for an LLM.
Yes, they can do analysis, identify conflicting specs, etc. especially with a skilled human in the loop
Yes, they help with design, though this works best if the operator has sufficient knowledge.
The LLM can help significantly by walking through the code base, explaining parts of it in variable depth.
Yes, agentic LLMs can easily write tests, run them, validate the output (again, best used with an experienced operator so that anti-patterns are spotted early).
From your posts I gather you have not yet worked with a strong LLM in an agentic harness, which you can think of as almost a general purpose automation solution that can either handle, or heavily support most if not all of your points that you have mentioned.
I mean, if you "program" (prompt) them to do those stuff, then yeah, they'll do that. But you have to consider the task just like if you handed it over to a person with absolutely zero previous context, and explain what you need from the "requirements gathering", and how it should handle that.
None of the LLMs handle any of those things by themselves, because that's not what they're designed for. They're programmable things that output text, that you can then program to perform those tasks, but only if you can figure out exactly how a human would handle it, and you codify all the things we humans can figure out by ourselves.
> But you have to consider the task just like if you handed it over to a person with absolutely zero previous context,
Which no one does. Even when hiring someone, there's the basic premise that they know how they should do the job (interns are there to learn, not to do). And then they are trained for the particular business context, with a good incentive to learn well and then do the job well.
You don't just suddenly wake up and find yourself at an unknown company being asked to code something for a jira task. And if you do find yourself in such situation, the obvious thing is to figure what's going on, not "Sure, I'll do it".
I don't understand the argument, I haven't said humans act like that, what I said is how you have to treat LLMs if you want to use it for things like that.
If you're somehow under the belief that LLMs will (or should) magically replace a person, I think you've built the wrong understanding of what LLMs are and what they can do.
I interact with tools and with people. When with people, there's a shared understanding of the goal and the context (aka, alignment as some people like to called it). With tools, there's no such context needed. Instead I need reproducible results and clear output. And if it's something that I can automate, that it will follow my instructions closely.
LLMs are obviously tools, but their parameters space is so huge that's it's difficult to provide enough to ensure reliable results. With prompting, we have unreliable answers, but with agents, you have actions being made upon those reliable answers. We had that before with people copying and pasting from LLMs output, but now the same action is being automated. And then there's the feedback loop, where the agent is taking input from the same thing it has altered (often wrongly).
So it goes like this: Ambiguous query -> unrealiable information -> agents acting -> unreliable result -> unreliable validation -> final review (which are often skipped). And then the loop.
While with normal tools: Ambiguous requirement -> detailed specs -> formal code -> validation -> report of divergence -> review (which can be skipped) . There are issues in the process (which give us bugs) but we can pinpoint where we did wrong and fix the issue.
I'm sorry, I'm very lost here, are you responding to the wrong comment or something? Because I don't see how any of that is connected to the conversation from here on up?
>>> But you have to consider the task just like if you handed it over to a person with absolutely zero previous context, and explain what you need from the "requirements gathering", and how it should handle that
The most similar thing is software. Which is a list of instructions we give to a computer alongside the data that forms the context for this particular run. Then it goes to process that data and gives us a result. The basic premise is that these instructions need to be formal so that they became context-free. The whole context is the input to the code, and you can use the code whenever.
Natural language is context dependent. And the final result depends on the participants. So what you want is a shared understanding so that instructions are interpreted the same way by every participant. Someone (or the LLM) coming in with zero context is already a failure scenario. But even with the context baked in every participant, misunderstandings will occur.
So what you want is formal notation which removes ambiguity. It's not as flexible as natural language or as expressive, but it's very good at sharing instructions and information.
This is true, but they have helped prepare me with good questions to ask during those meetings!
> Do they do the analysis? Removing specs that conflict with each other, validating what's possible in the technical domain and in the business domain?
Yes, I have had LLMs point out missing information or conflicting information in the spec. See above about "good questions to ask stakeholders."
> Do they help with design? Helping coming up with the changes that impact the current software the least, fitting in the current architecture and be maintainable in the feature.
Yes.
I recently had a scenario where I had a refactoring task that I thought I should do, but didn’t really want to. It was cleaning up some error handling. This would involve a lot of changes to my codebase, nothing hard, but it would have taken me a while, and been very boring, and I’m trying to ship features, not polish off the perfect codebase, so I hadn’t done it, even though I still thought I should.
I was able to ask Claude “hey, how expensive would this refactoring be? how many methods would it change? What’s the before/after diffs on a simple affected place, and one of the more complex affected places look like?
Previously, I had to use my hard-won human intuition to make the call about implementing this or not. It’s very fuzzy. With Claude, I was able to very quickly quantify that fuzzy notion into something at least close to accurate: 260 method signatures. Before and after diffs look decent. And this kind of fairly mechanical transformation is something Claude can do much more quickly and just as accurately as I can. So I finally did it.
That I shipped the refactoring is one point. But the real point is that I was able to quickly focus my understanding of the problem, and make a better, more informed decision because of it. My gut was right. But now I knew it was right, without needing to actually try it out.
> Not a true debate or weighing options based on the organization context.
This context is your job to provide. They will take it into account when you provide it.
> Do they help with coding?
Yes.
> Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?
Yes, absolutely.
> Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?
No.
Do they do the analysis? Removing specs that conflict with each other, validating what's possible in the technical domain and in the business domain?
No.
Do they help with design? Helping coming up with the changes that impact the current software the least, fitting in the current architecture and be maintainable in the feature.
All they do is pattern matching on your prompt and the weights they have. Not a true debate or weighing options based on the organization context.
Do they help with coding?
A lot if you're already experienced with the codebase and the domain. But that's the easiest part of the job.
Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?
I don't know as I haven't seen any demo on that front.
Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?
No demo so far.