Ingesting documents and using natural language to search your org docs with an internal assistant sounds more like a good use case for RAG[1]. Agents are best when you need to autonomously plan and execute a series of actions[2]. You can combine the two but knowing when depends on the use case.
I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.
In this case, the agent would also need to learn from new events, like project lessons learned, for example.
Just curious: can a RAG[1] system actually learn from new situations over time in this kind of setup, or is it purely pulling from what's already there?
Especially with a client, consider the word choices around "learning". When using llms, agents, or rag, the system isn't learning (yet) but making a decision based on the context you provide. Most models are a fixed snapshot. If you provide up to date information, it will be able to give you an output based on that.
"Learning" happens when initially training the llm or arguably when fine-tuning. Neither of which are needed for your use case as presented.
Thanks for the clarification, really appreciate it. It helps frame things more precisely.
In my case, there will be a large amount of initial data fed into the system as context. But the client also expects the agent to act more like a smart assistant or teacher, one that can respond to new, evolving scenarios.
Without getting into too much detail, imagine I feed the system an instruction like: “Box A and Box B should fit into Box 1 with at least 1" clearance.” Later, a user gives the agent Box A, Box B, and now adds Box D and E, and asks it to fit everything into Box 1, which is too small. The expected behavior would be that the agent infers that an additional Box 2 is needed to accommodate everything.
So I understand this isn't "learning" in the training sense, but rather pattern recognition and contextual reasoning based on prior examples and constraints.
Basically, I should be saying "contextual reasoning" instead of "learning."
There is no memory that the LLM has from your initial instructions to your later instructions.
In practice you have to send the entire conversation history with every prompt. So you should think of it as appending to an expanding list of rules that you put send every time.
What you're attempting to do, integrating an agent in your business, is difficult. It is however relatively easy to fake. Just setup a quick RAG tool, plug it into your LLM, and you're done. From the outside, the only difference between a quick-n-dirty integration and a much more robust approach will be in numbers. One will be more accurate than the other, but you need to actually measure and count performance to establish it as a fact and not just a vibe.
First advice: build up a dataset and measure performance as you develop your agent. Or just don't, and deliver what hype demands.
As for advices ... and looking at those other commenters left ... If you want to do this seriously, I'd recommend that you hire someone who already did that kind of integration, at least as a consultant. Someone whose first reflex won't be to just tell you LLMs are fixed and can't learn but will also add this isn't a limitation since RAG pipelines are better suited for this task than fine-tuning [1].
Also RAG isn't a monolithic solutions, there are many, many variations. For your use-case, I'd consider more elaborate solutions than just baseline RAG, such as GraphRAG [2]. For the box problem above, you might want to consider integrating symbolic reasoning tools such as prolog, or consider using reasoning models and developing your own reinforcement learning environments. Needless to say, all of these aspects need to be carefully balanced and optimized to work together, and you need to follow a benchmark/dataset centric-approach to developing your solution. For this task consider frameworks that were designed to optimize llm/agentic workflows as a whole [3][4].
Shit is complex really.
[1] https://arxiv.org/abs/2505.24832 tells us generalization happens in LLM once their capacity for remembering things is saturated, and this might explain why fine-tuning has been less efficient than RAG so far.
Sound advice and much appreciated.
In this case, I might team up with someone to help me add this feature to my SaaS. But I’ll definitely dive deeper into the subject. Thanks for the info and the links!
There's also (of course) the agentic rag, especially if your data is from a lot of different types of resources, and you will have some context / memory set up that it relies on, in actuality with a lot of context there's is not a lot of "learning" needed.
Incorporating more data or new data into the RAG pool is a form of “learning”, but in general agents don’t “learn” unless you give them a journal or allow them to modify their own prompt.
I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.
[1] https://www.willowtreeapps.com/craft/retrieval-augmented-gen...
[2] https://www.willowtreeapps.com/craft/building-ai-agents-with...