Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Strongly agreed - Google's Project Zero helped them immensely and without them it would have continued to grow worse.

The length of time this went for was already disastrous. From the Cloudflare blog post, "The earliest date memory could have leaked is 2016-09-22".

As an example of how destructive it might have been beyond leaked information, note that Internet Archive have spent considerable time archiving the election and its aftermath.

donaldjtrump.com is served via Cloudflare.

Chances are it wasn't a domain that was leaking information - though we don't know as there's no publicly accessible list and no way to tell if they had the buffer overrun features active.

If it was however the Internet Archive now have a horrible choice on their hands - wipe data from a domain that will be of historical interest for posterity or try to sanitize the data by removing leaked Cloudflare PII details.

History will already look back at the early digital age with despair - most content will be lost or locked in arcane digital formats. Imagine having to explain that historical context was made worse as humanity deleted it after accidentally mixing PII into random web pages on the internet -_-



One of us misunderstands what happened.

> Chances are it wasn't a domain that was leaking information - though we don't know as there's no publicly accessible list and no way to tell if they had the buffer overrun features active.

As far as I my understanding goes domains that had the features on were leaking the data of other domains. So it's near impossible to tell who was affected.


You understand it right, but I think you get the conclusions wrong. If a site didn't have said features active, information from it could have leaked via other sites that had them on, and could be in archives of those other sites (= so it still should rotate secrets where necessary). But the archive of the page itself can be kept safely, since without the buggy features no leaked data from other sites has been included in it.


Ah, I get it now, thanks for explaining.


The key bit is that your page needs to have been in memory in that process for it to leak. So malformed pages that use the affected services would leak data from other pages that use those services.


Wouldn't this be hard to exploit from an attackers poitn of view? I mean there is no way to know what data was currently in RAM, it is at best a blind attack.


If I understand correctly, you could literally keep smashing f5 on an affected page and get a different chunk of memory every time. An attacker could have potentially collected a lot very quickly with a simple script.


That's, from what I understand, completely correct. Not only that, but because of the nature of the flaw, it's not clear to me that the attackers would be generating any real anomalies by doing so. They're just fetching a particular pattern of otherwise non-notable web pages.


But it's a blind attack that is much more likely to hit big players.

And seeing as cloudflare has some of the biggest pipes on the net, it would be easy to saturate a pipe to gather as much data as possible from any sites you can find that are affected.


Can't they archive the content (but not expose), and then figure out how to filter the exfiltrated data? No history need be lost.


To some degree that's what happens. Internet Archive and Common Crawl use WARC files and deleting an individual entry or set of entries from them can be markedly difficult. It's easiest to mark it with some manner of tombstone for later handling.

The complication comes with the edge cases they'd need to face and @eastdakota's call to "get the crawl team to prioritize the clearing of their caches"[1]. This is also work they now need to do due to Cloudflare's blunder.

For Internet Archive and Common Crawl, these aren't caches, they're historical information. You can't just blow that away - but you also can't serve it if it has PII in it. Either they need to find all the needles and filter/tombstone them - which we'd expect to be very difficult given leaked information is still sitting in Google and Bing's cache now - or wipe/prevent access to the affected domains.

Wiping the donaldjtrump.com would be historically painful but even temporarily blocking access to the domain would be problematic.

Finally, and most importantly, the fact that non-profit projects need to worry about how to exfiltrate such information is ridiculous anyway. Exfiltrating the information is non-trivial as well and may well be destructive with even a minor bug in processing. Having looked at much of the web, it can be hard to tell what's rubbish and what isn't :)

(Fun example: I had an otherwise high quality Java library for HTML processing crash during a MapReduce job as someone used a phone number as a port (i.e. url:port). The internet is weird. The fact it works at all a mystery ;))

[1]: https://news.ycombinator.com/item?id=13721644


One way cloudflare could mitigate the damage they caused is by coughing up some money to facilitate non-profits in their cleanup activities on behalf of cloudflare.

Polluter pays.


It seems like google and bing should be able to sue cloudflare for the hours they're spending cleaning up their mess.


They decided to cache the internet, as soon as they decide to copy it to their servers, it's their problem, imho.


Cloudflare seems to be of the opinion they deserve priority treatment. Personally I think Google, Bing and others are mainly doing this to protect the users of Cloudflare powered websites, not Cloudflare.


Well ya, nothing to do with helping out Cloudflare, but however it got there random creds are being served to the world from their product. No specific criticism on their efforts to fix it, just that they bear some responsibility to make it inaccessible from their servers as fast as possible, even if its still cached in a million other spots. Shitty task but it comes with the territory.


robots.txt


it's not my fault, robots.txt made me do it!


To be fair, access to the archives of donaldjtrump.com could be blocked anyway if the owners at some point decide to add a robots.txt blocking the Wayback Machine.


>>While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache.

Why the help was not 100% appreciated...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: