So how do they determine whether a user has viewed a post already? I would think...

hrshtr · on May 27, 2017

Thats true, I am thinking that Nazar is more like spam filter and monitors the user behavior.

kchandra · on May 27, 2017

Pretty much, yeah.

lucasschm · on May 27, 2017

Bloom Filters? It has false positives but no false negatives

jimmaswell · on May 28, 2017

Why can't they just associate a list of viewed posts with each user, or list of users that viewed a post with each post, and check that? I don't get why this needs any consideration.

sethammons · on May 28, 2017

They addressed your second point in the article. On a popular post, you would be storing several megabytes of data to capture/relate each unique user that visited. That gets expensive at scale. HLL takes then down to a few kilobytes, less than 1% of the original size.

For your first suggestion, you would have to do a very expensive look up. You couldn't cache it effectively due to the requirement of near real time stats. You could improve look up time using columnar storage, but the performance and memory usage will be nowhere near as nice as with HLL.

Problems are harder at scale.

eropple · on May 28, 2017

I've had a "phases of computing" article percolating for a while to this end. Problems aren't just harder at scale, but they actively change their observable properties because of the stressors involved and where they crop up.

eropple · on May 28, 2017

Have you stopped to think how many users that is and how many posts?

Viewing a single thread could require five hundred associations.

jimmaswell · on May 28, 2017

And it already requires reading five hundred comments.