My understanding is that top_k and top_p are two different methods of decoding t...

nodja · 2025-03-06T06:02:02 1741240922

They both work on a sorted list of tokens by probability. top_k selects a fixed amount of tokens, top_p selects the top tokens until the sum of probabilities passes the threshold p. So for example if the top 2 tokens have a .5 and .4 probability, then a 0.9 top_p would stop selecting there.

Both can be chained together and some inference engines let you change the order of the token filtering, so you can do p before k, etc. (among all other sampling parameters, like repetition penalty, removing top token, DRY, etc.) each filtering step readjusts the probabilities so they always sum to 1.