Hacker Newsnew | past | comments | ask | show | jobs | submit | lucasschm's commentslogin

Bloom Filters? It has false positives but no false negatives


You are correct, but HyperLogLog has many buckets counting the longest run of zeros in order to avoid the problem of outliers. I recently studied these probabilistic algorithms and did a notebook with code and plots to show their performance: https://github.com/lucasschmidtc/Probabilistic-Algorithms/bl...


Thanks for sharing that!

Just skimmed through it and seems pretty interesting. I'll read it more in depth later.


No problem. If there are mistakes or a segment is not clear, let me know


Thanks for the write up, Lucas. It was very intuitive and I learnt a lot.

I noticed that you used 5000 buckets to store the frequency of 7000 non-unique words in the section on 'Counting Bloom Filters'. How is that better than using 7000 buckets and a uniformly distributed hash function, which would maintain frequencies perfectly? We would be using fewer buckets by an order of magnitude in a real-world implementation to save memory.


Yeah, I should have given more thought to that number. Updated the example for N=300. Thanks


Thanks for sharing guy! Interesting repo.


For this subject I would like to offer this counter view: Why China bears are wrong: An interview with Andy Rothman (http://supchina.com/sinica/china-bears-wrong-interview-andy-...)

As it is likely for someone to mention the Ghost cities, I recommend this video https://www.youtube.com/watch?v=AyBBQ-wF87M&list=PLxh5xkC0W-...


Ghost cities are weird: first they talk about the ghost cities, then others say the ghost cities are filling up. If you actually visit, say, Ordos New Town, you'll really get that, no, those ghost cities really exist.

Some will fill up, like Pudong did, I get Tianjin's new financial district will also. But those in areas with little economic hope in the near term (Ordos and dying coal), they really aren't going to happen before the buildings become substantially rundown (given Chinese concrete overbuilding to make use of unskilled migrant labor, these buildings require a lot of maintenance and will look decrepit sooner rather than later).


I wonder how much percentage of all buildings are empty in China. In my visit I saw plentiful empty skyscrapers, especially next to decaying homes where people would still live.


Local governments have ways of telling, e.g. By electricity usage. You can also try counting the lights on at night to get an idea of apartment occupancy (a fun last time at the apartment complex I used to live in). Someone definitely knows, but you can be damned sure that this information is considered "state secrets."or


Given that this guy benefits financially from investor sentiment, I would take his advice with a very large grain of salt.


It's a catch 22, people who know the most about a market are almost certainly invested.


Recently I found out that my knowledge about probabilistic algorithms was quite lacking, so I decided to a jupyter notebook about them. I think some of you will find them as interesting as I did. And if there are any mistakes, let me know.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: