This tool looks really powerful, thanks for the link!
One thing I've been personally really intrigued by is the possibility of using self-play and adversarial learning as a way to advance beyond our current stage of imitation-only LLMs.
Having a strong rules-based framework to be able to be able to measure quality and correctness of solutions is necessary for any RL training setup to proceed. I think that skidl could be a really nice framework to be part of an RL-trained LLM's curriculum!
I've written down a bunch of thoughts [1] on using games or code-generation in an adversarial training setup, but I could see circuit design being a good training ground as well!
One thing I've been personally really intrigued by is the possibility of using self-play and adversarial learning as a way to advance beyond our current stage of imitation-only LLMs.
Having a strong rules-based framework to be able to be able to measure quality and correctness of solutions is necessary for any RL training setup to proceed. I think that skidl could be a really nice framework to be part of an RL-trained LLM's curriculum!
I've written down a bunch of thoughts [1] on using games or code-generation in an adversarial training setup, but I could see circuit design being a good training ground as well!
* [1] https://github.com/HanClinto/MENTAT