Strongly agreed - Google's Project Zero helped them immensely and without them i...

_ugfj · on Feb 25, 2017

One of us misunderstands what happened.

> Chances are it wasn't a domain that was leaking information - though we don't know as there's no publicly accessible list and no way to tell if they had the buffer overrun features active.

As far as I my understanding goes domains that had the features on were leaking the data of other domains. So it's near impossible to tell who was affected.

detaro · on Feb 25, 2017

You understand it right, but I think you get the conclusions wrong. If a site didn't have said features active, information from it could have leaked via other sites that had them on, and could be in archives of those other sites (= so it still should rotate secrets where necessary). But the archive of the page itself can be kept safely, since without the buggy features no leaked data from other sites has been included in it.

_ugfj · on Feb 25, 2017

Ah, I get it now, thanks for explaining.

pdpi · on Feb 25, 2017

The key bit is that your page needs to have been in memory in that process for it to leak. So malformed pages that use the affected services would leak data from other pages that use those services.

vorticalbox · on Feb 25, 2017

Wouldn't this be hard to exploit from an attackers poitn of view? I mean there is no way to know what data was currently in RAM, it is at best a blind attack.

Mahn · on Feb 25, 2017

If I understand correctly, you could literally keep smashing f5 on an affected page and get a different chunk of memory every time. An attacker could have potentially collected a lot very quickly with a simple script.

tptacek · on Feb 25, 2017

That's, from what I understand, completely correct. Not only that, but because of the nature of the flaw, it's not clear to me that the attackers would be generating any real anomalies by doing so. They're just fetching a particular pattern of otherwise non-notable web pages.

Klathmon · on Feb 25, 2017

But it's a blind attack that is much more likely to hit big players.

And seeing as cloudflare has some of the biggest pipes on the net, it would be easy to saturate a pipe to gather as much data as possible from any sites you can find that are affected.

firebones · on Feb 25, 2017

Can't they archive the content (but not expose), and then figure out how to filter the exfiltrated data? No history need be lost.

Smerity · on Feb 25, 2017

To some degree that's what happens. Internet Archive and Common Crawl use WARC files and deleting an individual entry or set of entries from them can be markedly difficult. It's easiest to mark it with some manner of tombstone for later handling.

The complication comes with the edge cases they'd need to face and @eastdakota's call to "get the crawl team to prioritize the clearing of their caches"[1]. This is also work they now need to do due to Cloudflare's blunder.

For Internet Archive and Common Crawl, these aren't caches, they're historical information. You can't just blow that away - but you also can't serve it if it has PII in it. Either they need to find all the needles and filter/tombstone them - which we'd expect to be very difficult given leaked information is still sitting in Google and Bing's cache now - or wipe/prevent access to the affected domains.

Wiping the donaldjtrump.com would be historically painful but even temporarily blocking access to the domain would be problematic.

Finally, and most importantly, the fact that non-profit projects need to worry about how to exfiltrate such information is ridiculous anyway. Exfiltrating the information is non-trivial as well and may well be destructive with even a minor bug in processing. Having looked at much of the web, it can be hard to tell what's rubbish and what isn't :)

(Fun example: I had an otherwise high quality Java library for HTML processing crash during a MapReduce job as someone used a phone number as a port (i.e. url:port). The internet is weird. The fact it works at all a mystery ;))

[1]: https://news.ycombinator.com/item?id=13721644

jacquesm · on Feb 25, 2017

One way cloudflare could mitigate the damage they caused is by coughing up some money to facilitate non-profits in their cleanup activities on behalf of cloudflare.

Polluter pays.

empath75 · on Feb 25, 2017

It seems like google and bing should be able to sue cloudflare for the hours they're spending cleaning up their mess.

nol13 · on Feb 25, 2017

They decided to cache the internet, as soon as they decide to copy it to their servers, it's their problem, imho.

jacquesm · on Feb 25, 2017

Cloudflare seems to be of the opinion they deserve priority treatment. Personally I think Google, Bing and others are mainly doing this to protect the users of Cloudflare powered websites, not Cloudflare.

nol13 · on Feb 25, 2017

Well ya, nothing to do with helping out Cloudflare, but however it got there random creds are being served to the world from their product. No specific criticism on their efforts to fix it, just that they bear some responsibility to make it inaccessible from their servers as fast as possible, even if its still cached in a million other spots. Shitty task but it comes with the territory.

supremesaboteur · on Feb 25, 2017

robots.txt

nol13 · on Feb 25, 2017

it's not my fault, robots.txt made me do it!

nsgi · on Feb 25, 2017

To be fair, access to the archives of donaldjtrump.com could be blocked anyway if the owners at some point decide to add a robots.txt blocking the Wayback Machine.

flik · on Feb 25, 2017

>>While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache.

Why the help was not 100% appreciated...