And given that the inference code has access to the entire vector, it's the logi...

And given that the inference code has access to the entire vector, it's the logical place to put this filtering... OpenAI and other LLM APIs probably don't want to return the entire token probability vector to the user because it's a lot of data. That being said, it wouldn't surprise me if Microsoft has such access as part of their deal because of the obviously superior position this puts them in vs. regular API customers.