You don't get it guys, when Google scrapes the web and downloads everyones data ...

SpicyLemonZest · on April 1, 2019

In case this is meant to be a serious comment, there's a standard mechanism called the robots.txt file to tell crawlers you don't want them to scrape your website. You don't have to let them if you don't want to.

HeWhoLurksLate · on April 1, 2019

Not to be argumentative here, I'm seriously asking- is there anything keeping them from doing so anyways, and just not publishing it?

dTal · on April 1, 2019

Yes, everyone who ran a server on the internet would know, and make a big stink about it.

HeWhoLurksLate · on April 2, 2019

Well, that makes sense.

wsh · on April 1, 2019

Except archive.org doesn’t obey robots.txt files any more [1], and they also ignore requests to remove content.

[1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

0815test · on April 1, 2019

They don't obey robots.txt files posted after-the-fact by domain hoarders that have zilch to do with the original content. This is entirely proper on their part.

superkuh · on April 1, 2019

Archive.org is not the archive team.

trevyn · on April 1, 2019

I doubt they ignore DMCA requests.

OrgNet · on April 1, 2019

but google often violates copyright by showing so much of your data on the search engine's page that users might not even need to visit your page to get what they need... I'm surprised nobody is suing Google for that yet (or maybe they did and I missed it)

Operyl · on April 1, 2019

The problem with your line of thinking is that even that can be manipulated with meta tags and what have you (oembed, etc) and that’s what Google would argue in court.

OrgNet · on April 2, 2019

what can be manipulated with meta tags?

soup10 · on April 1, 2019

how does that make it any less hypocritical for Google and others that vacuum up everyones data for free and monetize it to ban or api-restrict others

KirinDave · on April 1, 2019

It's not even clear that Google is specifically doing that at this point, but ultimately a service like G+ is quite expensive to run. It's a bit weird to suggest it's public property because Google has an unrelated product it does make available without direct monetary cost to most of the world.

soup10 · on April 1, 2019

Google and other tech companies shouldn't be banning services that do productive things with their data, especially not when that data was cheaply collected or volunteered to them. bandwidth is not expensive. Not saying that's what happened here, though it very likely could be.

KirinDave · on April 1, 2019

I don't understand why you think a problem that was only there briefly and went away in less than an hour is proof positive of a positive policy action.