Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So how do they determine whether a user has viewed a post already? I would think that unique counting is accomplished using the hyperloglog counter, but the article says that this decision is made by the Nazar system, which doesn't use the hyperloglog counter in Redis.


Thats true, I am thinking that Nazar is more like spam filter and monitors the user behavior.


Pretty much, yeah.


Bloom Filters? It has false positives but no false negatives


Why can't they just associate a list of viewed posts with each user, or list of users that viewed a post with each post, and check that? I don't get why this needs any consideration.


They addressed your second point in the article. On a popular post, you would be storing several megabytes of data to capture/relate each unique user that visited. That gets expensive at scale. HLL takes then down to a few kilobytes, less than 1% of the original size.

For your first suggestion, you would have to do a very expensive look up. You couldn't cache it effectively​ due to the requirement of near real time stats. You could improve look up time using columnar storage, but the performance and memory usage will be nowhere near as nice as with HLL.

Problems are harder at scale.


I've had a "phases of computing" article percolating for a while to this end. Problems aren't just harder at scale, but they actively change their observable properties because of the stressors involved and where they crop up.


Have you stopped to think how many users that is and how many posts?

Viewing a single thread could require five hundred associations.


And it already requires reading five hundred comments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: