And I got it. When the algorithm sees a word that is already in the list, it discards the word in the list first. Then it adds the word again with the same probability as any other word.
This ensures that only the last occurrence of each word can occur in the final list, so the final occurrence of each word are all in the final list with the same probability, and prior occurrences are always removed, if no earlier then when the next occurrence is seen.
If the input is known to be large, there is no reason to start by adding every element. It can treat the first round like any other, and start out with a _p0_ that is smaller than 1.
If the input is known to be large, there is no reason to start by adding every element. It can treat the first round like any other, and start out with a _p0_ that is smaller than 1.