Good fun thanks! Found a cute cat in Buenos Aires. It seems you have quite a lot of activity already, is that all from HN in the last two hours or from something else before this?
Any ideas how to solve the agent's don't have total common sense problem?
I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.
To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.
Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.
Obviously there are endless cases possible in production and best approach is to build your evals using that data.
Training is an overkill at this point imo. I have seen agents work quite well with a feedback loop, some tools and prompt optimisation. Are you doing fine-tuning on the models when you say training?
That's only visible when it's not your personal conversation (you can't interact with someone else's). In a way it's designed to be distracting so you know how to start your own conversation.
The building of the visualiser was less interesting to me than the result and your conclusion. I agree that finding new ways to ingest the structure and logic of software would be very useful, and I like your solution. Is there a way to test it out?
Yes, at the moment it's an issue of cost. I can't use the best models because it is not affordable. Hopefully as performance improves over the years this will become less of an issue. Maybe I can build in a websearch to verify info though...
I hear you. Yes, I think "seeding" an LLM with docs or other learning material is one of the fundamentals of effectively and efficiently using it for learning, maybe you can build more in that direction?
Yeah I actually started there. Https://dev.rebrain.gg has the old version up.
You upload a source and it generates questions from it. However when showing it to friends I found that the barrier to usage was too high as most people don’t have a source ready. But I think adding it as an option would be pretty cool and doable
Re the more focused feedback, I totally agree re the questioning styles. In the prompt I ask for it to not do so many multiple choice questions, but I think it is addicted/the conversation history skews the context.
I'm going to introduce a settings panel (easily accessible during the conversation), which will let you move to "chat mode" (to discuss instead of be asked questions), and also to configure the types of questions you're asked and the ratio (if I can get the llm to oblige). I'm also going to see if I can come up with some different question formats beyond multiple choice, free-form and multi select (which the llm doesn't use too much).
reply