Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh hey I thought I was the only one. lucb1e.com and another site are also not indexed, though I blocked it based on the user agent string. That way it doesn't get page data or non-HTML files from my server. I introduced this when they were pulling this AMP thing: https://lucb1e.com/?p=post&id=130 It personally doesn't impact me, but it impacts other people on the internet and I figured it was the only thing I can do to try and diversify this market (since I myself already switched to another search engine).

There are zero other restrictions on my site. Use any search engine other than google. Or don't, up to you.



That’s a good idea,but google sometime crawls without the google user agent. So that’s not going to be 100 percent foolproof.

You’d be better off just blocking all of the ip addresses that google crawls from. There are lists of those out there.

When I used to cloak website content, and only serve up certain content to google, the only reliable way was to use ip cloaking. Because google crawls using “partners”, such as using Comcast IPs.

So if you’re want to really get your site out of the index, serve up the page with a noindex tag or noindex in the server header based on google ip addresses.


Hey! Googler here!

We don't use our hardware located on partner networks to do indexing. Those machines are searching for malware and serving some YouTube videos and Play Store downloads.


You forgot to add the word "currently"


"Because google crawls using "partners", such as using Comcast IPs."

Is this different than when others use proxies to evade access controls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: