Is the model using budget forcing?

Szpadel · 2025-03-06T15:33:45 1741275225

I do not understand why to force wait when model want to output </think>.

why not just decrease </think> probability? if model really wants to finish maybe or could over power it in cases were it's really simple question. and definitely would allow model to express next thought more freely

rahimnathwani · 2025-03-07T02:56:05 1741316165

  why not just decrease </think> probability?

Huggingface's transformers library supports something similar to this. You set a minimum length, and until that length is reached, the end of sequence token has no chance of being output.

https://github.com/huggingface/transformers/blob/51ed61e2f05...

S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.

Both of these approaches set the probability to zero, not something small like you were suggesting.

rosspackard · 2025-03-06T13:14:41 1741266881

I have a suspicion it does use budget forcing. The word "alternatively" also frequently show up and it happens when it seems logically that a </think> tag could have been place.