Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
It's time to brush up robots.txt (lucaskostka.com)
4 points by greatNespresso on Sept 2, 2023 | hide | past | favorite | 1 comment


To the extent that robots.txt still works, see Block the Bots that Feed “AI” Models by Scraping Your Website: https://neil-clarke.com/block-the-bots-that-feed-ai-models-b...

You can also block user agents directly, for example in nginx. https://www.xmodulo.com/block-specific-user-agents-nginx-web...

The UAs I'm aware of:

    CCBot
    ChatGPT-User
    GPTBot
Does anyone know of a resource that tracks these crawlers' UA strings?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: