Hacker News new | past | comments | ask | show | jobs | submit login

I had the same problem with the same paragraph and still don’t quite get it.

Unfortunately I struggle to follow the detailed explanation you gave… since you seem to understand it… can you confirm that they really do mean to throw away the word in the list they just found?

Eg

ACT I SCENE Elsinore A platform before the castle FRANCISCO at his post. Enter to him

If buffer max is 16, I am supposed to randomly half it first?

Act Scene Elisnore Platform Before The His Post

Now what? The next words are “Bernado Bernado who’s there”




Yes, you're supposed to throw it away.

The key insight is that words should appear in your scratch space with equal probability, no matter how often they appear in the source text. If you have a scratch space of size one, then the sequence of "apple" x 16 + "banana" x 1 should have equal chances of of the scratch space containing [apple] or [banana] at the end of the sequence, at least averaged over all 17 permutations.

One way to achieve this result is to make the scratch space memory-free. Rather than think of it as "remove the word with probability x", think of it as "always remove the word, then re-add it with probability (1-x)".


Aaaah, remove and the re-add (maybe) makes a bit more sense..

So if it is not in the list you just add it, right? Actually is that right? Won’t the list fill up to the max again at some point like this?

So if so, I add Bernardo. Now the very next word is Bernardo so I remove the last Bernardo and maybe re-add it based on a 50% chance.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: