Hacker News new | past | comments | ask | show | jobs | submit login

>Wouldn't these two questions have different answers?

Yes, though in practice they're close.

To answer the actual question asked. We can say that there is a length n, such that all sequences S, of length n, cannot be contained in our total bits transferred, B. This is a consequence of the pigeonhole principle.

Naively, this would be B = n * S, and S = 2^n. So B = n * 2^n. Note that we can actually compress things more though, if we assume that in the worst case every window in the internet bit-corpus is unique.

That this is possible isn't immediately obvious, but consider

01, 00110, 0001011100, 0000100110101111000, 000001000110010100111010110111110000 (which, interestingly, isn't in OEIS, but related sequences: https://oeis.org/A052944 and https://oeis.org/A242323 are).

These bitstrings contain, perfectly overlapping, every bitstring of length n, for n from 1-5. In general, it takes 2^n + n - 1 bits to convey every possible bitstring, if you manage to overlap them perfectly (if someone can prove this, please do. I thought it was grey code/hamiltonian path over hypercube related, but I unconvinced myself of that).

EDIT: Someone else mentioned De Brujin sequences, which these are. And they are based on hamiltonian paths, although not over hypercubes :(. And my sequence is on OEIS, as https://oeis.org/A166315, just converted to decimal. /EDIT

So the real answer is just B = 2^n + n - 1, but for n > 15 or so, n - 1 is so small we can ignore it. In other words, our length n is just log(B). Given the assumption in the article of 3.4067e22 bits, the base 2 log of that is...74.85. This is exactly 1 more than what the article says is the point where you'll have seen half the messages. This isn't a coincidence.

Which raises an interesting question that I leave as an exercise to the reader:

Why is the length for which you cannot have conveyed all bit strings exactly 1 bit more than the point where you've conveyed about half of them?




You’re giving the range for a 50% to 100% chance of non-transmission.

But in the spirit of the original question, I’d suppose we want to find n such that the probability of non-transmission is 2^-n, and the expected number of such transmissions is 1? Is that number similarly close?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: