Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that is unlikely because then SEO specialists would generate and send such data to improve the positions of their websites.

But Google could use this to discover new URLs that are not linked anywhere.



Some URLs are private by design, so I think it would be an awfully bad idea to discover new URLs in this way.


There were cases when such private URLs got into search engines. You should protect such URLs with authorization or at least hide them behind a POST form. Or block them with robots.txt.

Also if a user follows a link from such page, the URL would be leaked via Referer header so it is not secure anyway.


I feel robots.txt is the most effective to prevent a private site from being crawled by Google followed up by an inline <meta name="robots" content="noindex"> tag.


Chrome by default searches Google with whatever you type in the top bar - so by not disabling that private URLs are sent to them and then are scanned (however if the correct robots.txt file is setup on those URLs then Google Crawlbot would stop at that point).

I believe it was this setting (I have it disabled as well as other options - however I believe it's only this option that sends data back) [1].

[1] https://i.snag.gy/cmkGzp.jpg




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: