Yeah, if I understand correctly AI will create it's own internal reasoning language through RL. In R1-Zero it was already a strange mix of languages. They corrected that for R1 to make the thinking useful for humans.
Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form:
"If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing."