Really? Isn't that the point of RL used in the way R1 did? Provide a cost functi...

daveguy · 2025-01-29T20:40:35 1738183235

That is still inference. It is using a model generated from the RL process. The RL process is what used the cost function to add another model layer. Any online/continual learning would have to be performed by a different algorithm than classical LLM or RL. You can think of RL as a revision, but it still happens offline. Online/continual learning is still a very difficult problem in ML.

jvanderbot · 2025-01-29T21:25:42 1738185942

Yes, that makes sense. We're both talking about offline learning.