You are correct, but HyperLogLog has many buckets counting the longest run of zeros in order to avoid the problem of outliers. I recently studied these probabilistic algorithms and did a notebook with code and plots to show their performance: https://github.com/lucasschmidtc/Probabilistic-Algorithms/bl...
Thanks for the write up, Lucas. It was very intuitive and I learnt a lot.
I noticed that you used 5000 buckets to store the frequency of 7000 non-unique words in the section on 'Counting Bloom Filters'. How is that better than using 7000 buckets and a uniformly distributed hash function, which would maintain frequencies perfectly? We would be using fewer buckets by an order of magnitude in a real-world implementation to save memory.
Ghost cities are weird: first they talk about the ghost cities, then others say the ghost cities are filling up. If you actually visit, say, Ordos New Town, you'll really get that, no, those ghost cities really exist.
Some will fill up, like Pudong did, I get Tianjin's new financial district will also. But those in areas with little economic hope in the near term (Ordos and dying coal), they really aren't going to happen before the buildings become substantially rundown (given Chinese concrete overbuilding to make use of unskilled migrant labor, these buildings require a lot of maintenance and will look decrepit sooner rather than later).
I wonder how much percentage of all buildings are empty in China. In my visit I saw plentiful empty skyscrapers, especially next to decaying homes where people would still live.
Local governments have ways of telling, e.g. By electricity usage. You can also try counting the lights on at night to get an idea of apartment occupancy (a fun last time at the apartment complex I used to live in). Someone definitely knows, but you can be damned sure that this information is considered "state secrets."or
Recently I found out that my knowledge about probabilistic algorithms was quite lacking, so I decided to a jupyter notebook about them. I think some of you will find them as interesting as I did. And if there are any mistakes, let me know.