Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you actually read your link? That's not at all what it says.


To be clear, stopped supporting robots.txt noindex a few years ago.

Combined with the fact that Google might list your site [based only on third-party links][1], robots.txt isn't an effective way to remove your site from Google's results.

Sorry, could have been clearer.

[1]: https://developers.google.com/search/docs/advanced/robots/in...


This page has a little more detail: https://developers.google.com/search/docs/advanced/crawling/...

"If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex. "


>noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.

Seems clear enough to me


Quote from the linked article:

“ For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options:”

The first option is the meta tag. It does mention an alternative directive for robots.txt, however.


What about the blocking google bot by their IPs, also combined with user-agent wouldn't that stop the crawlers

Google crawlers IPs https://www.lifewire.com/what-is-the-ip-address-of-google-81...


That will stop the crawlers but you could still show up in the search results, because of other web pages. From GP:

> If other pages point to your page with descriptive text, Google could still index the URL without visiting the page




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: