One problem we've had developing autonomous SWE agents (https://github.com/ai-christianson/RA.Aid) is that open models just haven't been performing near sonnet on controlling the agent. Our experience is echoed by many other agent devs out there, and you can see it for yourself if you try deepseek (v3 or r1) vs sonnet in any agentic product.
Do you think that your training setup could help train these models to be better at agentic work?
Cool repo! Agreed OSS models are still lagging, but they're definitely catching up!
So with GRPO and reinforcement learning, the OSS model creators now have one more tool to make OSS models much better, since we now don't need vast amounts of labeled CoT data, but rather just questions and answers, and we let RL / GRPO figure out the CoT itself after using some reward function.
So I guess it definitely can help in agentic workloads!
Great to see Unsloth here, how long did the training process take??
Also, the different version of the same og Colab didn't make a 135M model to learn the XML tags, so do you think 8 billion should be the minimum for use this?