Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't get it guys, when Google scrapes the web and downloads everyones data then serves up parts of it with sponsored ads next to it in searches its OK because they are Google. But if you scrape their data it's not OK because you're not Google. Once you understand this it makes perfect sense.


In case this is meant to be a serious comment, there's a standard mechanism called the robots.txt file to tell crawlers you don't want them to scrape your website. You don't have to let them if you don't want to.


Not to be argumentative here, I'm seriously asking- is there anything keeping them from doing so anyways, and just not publishing it?


Yes, everyone who ran a server on the internet would know, and make a big stink about it.


Well, that makes sense.


Except archive.org doesn’t obey robots.txt files any more [1], and they also ignore requests to remove content.

[1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...


They don't obey robots.txt files posted after-the-fact by domain hoarders that have zilch to do with the original content. This is entirely proper on their part.


Archive.org is not the archive team.


I doubt they ignore DMCA requests.


but google often violates copyright by showing so much of your data on the search engine's page that users might not even need to visit your page to get what they need... I'm surprised nobody is suing Google for that yet (or maybe they did and I missed it)


The problem with your line of thinking is that even that can be manipulated with meta tags and what have you (oembed, etc) and that’s what Google would argue in court.


what can be manipulated with meta tags?


how does that make it any less hypocritical for Google and others that vacuum up everyones data for free and monetize it to ban or api-restrict others


It's not even clear that Google is specifically doing that at this point, but ultimately a service like G+ is quite expensive to run. It's a bit weird to suggest it's public property because Google has an unrelated product it does make available without direct monetary cost to most of the world.


Google and other tech companies shouldn't be banning services that do productive things with their data, especially not when that data was cheaply collected or volunteered to them. bandwidth is not expensive. Not saying that's what happened here, though it very likely could be.


I don't understand why you think a problem that was only there briefly and went away in less than an hour is proof positive of a positive policy action.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: