Hacker News new | past | comments | ask | show | jobs | submit login

Crawling and archiving everything, including personal writings, is a chilling effect. It is the same situation people are seeing with social media, where the past remains to haunt the present and none of our future leaders are using it without a mask. It was most surprising to people when some Libraries decided 'published' meant anything put on the WWW or posted to Usenet. It seemed grasp for funding and to keep relevant in an age where information was moving out of published media and into opinions virtually scrawled on a toilet door. The stuff I needed to get removed from the Australian National Library's archive is exactly the sort of stuff that shouldn't be in there, directly against the statutory rights and mission, and the sort of thing that could be pointed to when you wanted to defund the project. Because some twit thought meaningful Australian published materials meant anything under a .au top level domain, all the dross hoovered up by IA including all the stuff since removed because it is in nobodies interest or causing harm. And it was a pain in the arse.



I'm sorry you had some issues with the National Library of Australia's collections. I've never been an expert on Australian law, & it's been a while since (when I was at IA a decade+ ago) I worked with that library. But the impression I had at that time was that their governing law & budget, as dictated the Australian legislature, required them to collect broadly, & deeply, from the `.au` domain-names. So it seemed a compulsory part of their "statutory rights & mission" then, rather than "against" such things. Their governing laws & strategies may have modified over time since with experience – which is the point of trying, observing, correcting in new murkier frontiers of tradition, technology, and law.

On the larger issues, & specific to the Internet Archive:

You should assume there are several other larger "dark" web archives, by nations and large private organizations, collected without the awareness or available-remedies of the Internet Archive's or various national library public efforts. There are also uncountable other private and ad-hoc collections. Depending on what kinds of harms you expect from retained copies of older writings, these may be far larger threats than any holdings of an open, public, correctable non-profit library.

I would emphasize that anyone (like a web host or app) who gave any authors, especially the young & net-novices, the impression that something would stay private, or recallable, after being placed on a public webserver, at a published link, and open to browsing by all, did those authors a disservice by mis-informing them of risks, and the best-practices for preserving privacy.

That the Archive's well-identified, blockable crawlers sometimes surprise people with what they collect, and then make-available for lookup, helps correct that misunderstanding, both for individuals and the wider culture. Any "chilling effect" is unfortunate, but it's inherent to the web technology & practices of many independent actors. It's moreso documented, than created, by the Archive's own activities. And further – at least with respect to the Wayback Machine – the surprise availability is then fairly straightforward to undo, and prevent from recurring.

The broader risk that anything on the web – once offered to the public – will remain available from others persists no matter what the Archive does. Those concerned about such risks should take extra privacy-preserving steps, because blocking the Archive's crawls, or correcting the Wayback Machine, only limits this one polite, above-ground actor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: