Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Having a copy of your data "reclaims" your data only in terms of access but it does not reclaim the control over it.

Once you upload data into many of those services, in most cases you them give a permanent license to use it in whatever way they want.

Then, if you have your own website, nothing prevents Clearview AI [or some equivalent company] from crawling your website, index their photo into their facial recognition db. I don't think those companies care at all about robots.txt.



That's absolutely true. The GDPR gives you a right to request deletion, but it's not a great defense against companies like Clearview harvesting your data for other purposes. Hopefully the legal framework will continue to get stronger around that.

In the meantime, we can still have a LOT of fun by pulling our data back into systems that let us run our own queries.


It's not super-difficult to add HTTPS and HTTP basic auth (or similar) to websites these days.

That's enough to keep automated scrapers from getting into the private parts of your website.


HTTPS is not a deterrent for bot. Authentication is.

Unless the bot identifies itself as such by setting a user-agent header honestly you cannot distinguish a browser from a bot.


Right. That's why I said HTTPS and auth.

Auth to keep the bots out, and HTTPS because it's free and easy to add, and keeps network admins from sniffing your username and password.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: