Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>There's a big difference too in that when you publish a website, you intend for it to be aggregated.

You're framing this as an absolute and that's just not true. Were it the case, robots.txt wouldn't exist and I wouldn't be using it.



I’m not framing it as an absolute. I’m generalising. I accept edge cases exist but my point stands for the majority use cases.

Also you shouldn’t really be using robots.txt any more unless it’s a simple “Disallow: /“ because ironically bad actors use it to decide what URIs to hit. If you have content up you want to limit access to, you’re much better off putting it behind an auth layer (even if it’s just simple HTTP auth + fail2ban)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: