As the article notes, encryption does make a huge difference here. For global we...

cortesoft · on April 19, 2018

Even if every single bit of traffic was encrypted, a huge portion of that would still be identical - TCP headers, for one thing, would share a lot of common bits for each packet. Although bandwidth reporting often ignores those, so I am not sure about the data source for world bandwidth.

wskinner · on April 19, 2018

Wouldn’t some combination of unique session keys, PFS algorithms, or block encryption modes make this false? When would the headers encrypt to the same ciphertext unless you were using ECB mode with the same key?

schoen · on April 19, 2018

I think you misinterpreted cortesoft's point, which I could try to make clearer:

"Even if every single bit of payload traffic was encrypted, a huge portion of the traffic actually sent over the wire would still be identical - TCP headers, for one thing, would share a lot of common bits for each packet."

So I think you're in agreement here.

votepaunchy · on April 19, 2018

The original question was “sent over the Internet” not “sent over the wire”.

dooglius · on April 19, 2018

Anything in the IP header (perhaps excluding the TTL and checksum) and below is sent over the Internet.

freyir · on April 19, 2018

If you’re looking at TCP headers, you might as well go all the way to the PHY layer. Your TCP headers will get encoded by an error correcting code (good, they’ll look random again) but then the data will be split into frames, and each frame begins with an identical preamble sequence of bits.

bmm6o · on April 19, 2018

> good, they’ll look random again

They might look random, but they contain the same amount of entropy as the original data and are longer. Also, the encoding is deterministic.

freyir · on April 19, 2018

The code itself is deterministic but usually bits are are interleaved and scrambled with a long time varying pseudorandom sequence. If you pass a short repetive sequence in, it will look random coming out.

bmm6o · on April 19, 2018

That's interesting. I know a bit about ECC in general, but not too much about what the current state of implementations is. Do you know what the codes in use are called?

freyir · on April 22, 2018

The scrambler's actually a separate block, usually immediately before or shortly after the encoder. It's common in wireless modems, which I'm more familiar with, but it seems like it's rare in wireline PHYs. Ethernet apparently uses an 8b10b encoding to maintain a similar number of 0s and 1s.

I can't speak on entropy, but turbo codes use an internal interleaver and the codewords can be quite long, so a short repetitive input sequence turns into a very random looking output sequence. As you mentioned though, two identical input sequences would still map to identical output sequences.

caf · on April 19, 2018

All of this just means that the author's estimate is an upper-bound estimate.