I had the same problem with the same paragraph and still don’t quite get it.
Unfortunately I struggle to follow the detailed explanation you gave… since you seem to understand it… can you confirm that they really do mean to throw away the word in the list they just found?
Eg
ACT
I
SCENE
Elsinore
A
platform
before
the
castle
FRANCISCO
at
his
post.
Enter
to
him
If buffer max is 16, I am supposed to randomly half it first?
Act
Scene
Elisnore
Platform
Before
The
His
Post
Now what? The next words are “Bernado Bernado who’s there”
The key insight is that words should appear in your scratch space with equal probability, no matter how often they appear in the source text. If you have a scratch space of size one, then the sequence of "apple" x 16 + "banana" x 1 should have equal chances of of the scratch space containing [apple] or [banana] at the end of the sequence, at least averaged over all 17 permutations.
One way to achieve this result is to make the scratch space memory-free. Rather than think of it as "remove the word with probability x", think of it as "always remove the word, then re-add it with probability (1-x)".
Unfortunately I struggle to follow the detailed explanation you gave… since you seem to understand it… can you confirm that they really do mean to throw away the word in the list they just found?
Eg
ACT I SCENE Elsinore A platform before the castle FRANCISCO at his post. Enter to him
If buffer max is 16, I am supposed to randomly half it first?
Act Scene Elisnore Platform Before The His Post
Now what? The next words are “Bernado Bernado who’s there”