Hacker News new | past | comments | ask | show | jobs | submit login

Could be! However, making a direct attack on individual privacy should never have been an option. To make matters worse, the logic of, "We did this to government and military websites, so now we're going to roll it out everywhere" was quite broken for the time and remains so.

There's examples of how this works in a healthy way. Martin Manley is one scenario that comes to mind, where he overtly opted-in to having an archive stored about him upon his death: https://martin-manley.eprci.com/




Neiher 'flaunting privacy' nor 'direct attack on individual privacy' are fair descriptions of any of the Archive's web collection policies.

People who freely publish information, to the worldwide public, on the 'World Wide Web' should reasonably expect all sorts of entities to collect, save, analyze, & repurpose that info, unless they take specific steps to discourage such access & use.

The Archive's crawlers identify themselves, and collect things that are publicly linked, or specifically nominated-for-collection by library patrons or partners. Except in some focused specialized collection projects, they don't "log in" as any user, only visiting & collecting what's published freely to any anonymous person/organization/process.

For material needing more privacy, websites always have the option to block any and all unwanted visitors/crawlers with a wide variety of standard techniques, like requiring logins or simple challenges that automated crawlers won't pass.

And, as your linked articles report, the process for a later exclusion by request is pretty quick and simple. (The 2nd post concludes: "So, hats off to the Internet Archive for making the process smooth and relatively painless.") And, such exclusion does not require any sort of "DMCA request".


This is victim blaming. In my jurisdiction, you retain copyright under any information you publish, even to the worldwide public. This means I can reasonably expect entities to collect, save, analyze and repurpose that info within reason, and without specific steps to discourage access & use. This is why there are laws such as 'fair use' and 'satire', because we wanted to extend what is considered reasonable use of public works. But redistributing copyrighted works without permission? Legally actionable, if you have the money and lawyers and access to the necessary courts. If this was software, such as free software license violations, people in this forum would be calling for the lawyers to nuke them from orbit.

Thankfully DMCA should make the removal process easier now, especially in situations where control over the domain has been lost or being hosted by a third party. Although last I saw there were still artificial barriers, such as needing to list every single individual page needing to be taken down. But this is after the fact, after you discovered your reasonable expectations and privacy have been violated. And then you have to track down the other copies that IA illegally distributed your now-private and copyrighted information to, such as a few libraries around the world with similar projects.


I'm talking about the unfair allegation of privacy violations, here.

Note that when the Archive shares crawled content with other libraries, those other libraries often have their own legal right to collect, preserve, and make-available that data even stronger than the Archive's rights via fair use, implied-license, library privileges, and other grounds. For example, many of the Archive's partners in government libraries, archives, & educational institutions have a statutory right & mission to collect copies of everything 'published', including via the world-wide-web, in their sphere of national interest.

As to what some unstated jurisdiction might consider "within reason", I prefer to think they'll find what's reasonable what I find to be reasonable – the IA's crawling policies – unless & until some actual governing authority finds otherwise in a clearly applicable/legible decision.

See my root post (ggggggp): in a vital, evolutionary, true-law-made-on-the-ground civilization, what actually winds up as "within reason" depends on the real implementations & multi-decade demonstrations of how things can beneficially work, as much or more than any copyright loyalist's strict reading of older statutory laws.


Crawling and archiving everything, including personal writings, is a chilling effect. It is the same situation people are seeing with social media, where the past remains to haunt the present and none of our future leaders are using it without a mask. It was most surprising to people when some Libraries decided 'published' meant anything put on the WWW or posted to Usenet. It seemed grasp for funding and to keep relevant in an age where information was moving out of published media and into opinions virtually scrawled on a toilet door. The stuff I needed to get removed from the Australian National Library's archive is exactly the sort of stuff that shouldn't be in there, directly against the statutory rights and mission, and the sort of thing that could be pointed to when you wanted to defund the project. Because some twit thought meaningful Australian published materials meant anything under a .au top level domain, all the dross hoovered up by IA including all the stuff since removed because it is in nobodies interest or causing harm. And it was a pain in the arse.


I'm sorry you had some issues with the National Library of Australia's collections. I've never been an expert on Australian law, & it's been a while since (when I was at IA a decade+ ago) I worked with that library. But the impression I had at that time was that their governing law & budget, as dictated the Australian legislature, required them to collect broadly, & deeply, from the `.au` domain-names. So it seemed a compulsory part of their "statutory rights & mission" then, rather than "against" such things. Their governing laws & strategies may have modified over time since with experience – which is the point of trying, observing, correcting in new murkier frontiers of tradition, technology, and law.

On the larger issues, & specific to the Internet Archive:

You should assume there are several other larger "dark" web archives, by nations and large private organizations, collected without the awareness or available-remedies of the Internet Archive's or various national library public efforts. There are also uncountable other private and ad-hoc collections. Depending on what kinds of harms you expect from retained copies of older writings, these may be far larger threats than any holdings of an open, public, correctable non-profit library.

I would emphasize that anyone (like a web host or app) who gave any authors, especially the young & net-novices, the impression that something would stay private, or recallable, after being placed on a public webserver, at a published link, and open to browsing by all, did those authors a disservice by mis-informing them of risks, and the best-practices for preserving privacy.

That the Archive's well-identified, blockable crawlers sometimes surprise people with what they collect, and then make-available for lookup, helps correct that misunderstanding, both for individuals and the wider culture. Any "chilling effect" is unfortunate, but it's inherent to the web technology & practices of many independent actors. It's moreso documented, than created, by the Archive's own activities. And further – at least with respect to the Wayback Machine – the surprise availability is then fairly straightforward to undo, and prevent from recurring.

The broader risk that anything on the web – once offered to the public – will remain available from others persists no matter what the Archive does. Those concerned about such risks should take extra privacy-preserving steps, because blocking the Archive's crawls, or correcting the Wayback Machine, only limits this one polite, above-ground actor.


You are arguing about copyright in a thread discussing accusations of privacy violations.


There is an overlap in the two. Copyright can be used as a defense against folk who believe, "Everything on the internet not behind authentication is commons". Often these folks point to books, magazines, etc in reference to their argument, which is certainly bad faith, but that's why copyright arguments come up.

A reference to one such comment in this thread: https://news.ycombinator.com/item?id=32150193


Wait, why are books, magazines, newspapers, newsletters, pamphlets, & flyers a bad faith analogy?

Those are exactly what hundreds-of-years of copyright law, by explicit statute and court interpretation, have addressed. The precedents for private actors, and especially noncommercial entities like libraries & schools, to retain those copies, and to a large extent, reshare/redisplay them, are very strong.

Further, by design, every delivery of content across the web necessarily creates copies at every network node, and perhaps multiple proxies/caches, on the way to the web browser. The web browser necessarily creates & displays a copy – and normally keeps one, at least for a little while for user convenience. Anyone choosing to core web protocols has already implicitly authorized lots of necessary copying.

Why wouldn't the recipients of such display-copies, and especially non-profit libraries, have on the web the same assumed right to keep/transfer/format-shift/redisplay that freely-delivered copy, in the same way they've always had the right to do with copyrighted books/magazines/newspapers/newsletters/pamphlets/flyers?

If copyright maximalists & DRM fans want a new right to remotely recall/destroy such copies – indefinitely, retroactively, and unlike the traditional copyright balancing-of-interests – they should make the case to lawmakers & courts for that, or use the technical measures already built-into the web for expressing such limits, and opting-out of the web's and copyright's defaults. You shouldn't let them simply assert that right without reasoning or a case for why it's better than tradition. Nor, allege criminality or 'bad-faith' against people just using the worldwide-web as it was designed, and enjoying readers' rights as they've been traditionally interpreted.


> Wait, why are books, magazines, newspapers, newsletters, pamphlets, & flyers a bad faith analogy?

Because they're comparing an individuals blog or Twitter profile to those things, of which they are not analogous. Not to mention you use rhetoric like this:

> If copyright maximalists & DRM fans

It just goes to show the juvenile nature that some people will stoop to in order to prove their point. In this case, "some people" is you. Not everybody out here are the little demons you've dreamed up in your soul; most of the people responding on this thread are just privacy advocates who have seen how these policies go wrong, often first-hand. A little further down the thread someone makes a very salient point about the queer community and how these tools are used in unmasking.

> Nor, allege criminality or 'bad-faith' against people just using the worldwide-web as it was designed, and enjoying readers' rights as they've been traditionally interpreted.

The internet as a technological invention did not arrive with legalities already paved. They were very much in flux and have been in flux. It's okay if you don't like that, but asserting that the internet was created with commons as the default is junk; that's a very US-American law that has dominated cultural perception. Meanwhile, on the other side of the pond, we have countries figuring out nuanced ways to implement the right to be forgotten - notice none of that legislation is geared towards large corporations, it's focused on individuals.

---

Getting to the rest of your argument that wasn't distastefully written: I do agree that it shouldn't be a copyright free for all, but you can't have the Internet Archive and other folks creating weapons. There should be limits on both ends and I don't think those exist.

One really obvious limit is stop treating government entities and individuals the same. Stop treating large publishers and individuals the same. The former have immense resources to coordinate their communication and undergo thorough review processes, so their publications are more well thought out. Most blogs and social media posts are not nor do they have the same level of impact. Privacy advocates wouldn't need a copyright crutch if people could summon enough humanity and empathy to understand that. That would separate privacy advocates from copyright trolls on this issue.

---

Another edit

This is rich: https://twitter.com/gojomo http://xavvy.com/

You used to work for archive.org? That might be a thing you should call out in discussion.

Some posts:

- http://gojomo.blogspot.com/2001/01/

- http://gojomo.blogspot.com/2000/08/

- http://gojomo.blogspot.com/2002/07/

- http://news.oreilly.com/2008/06/gordon-mohr-takes-us-inside-... (This link looks dead)

- http://www.wired.com/news/business/0,1367,42438,00.html (This link looks dead, but was summarized by you as, "about the tug-of-war between personal privacy and copyright enforcement, March, 2001.")


The internet has given everyone the tools to publish a personal newsletter/blog/profile-page, just as only the few could do so earlier. It not only allows essentially-free publishing to the whole public, but also extreme narrowcasting with any level of access-control one desires.

You've not made a case that personal writings should be treated any differently, on either copyright or privacy grounds, nor that the law does treat them any differently.

You've made unsupportable allegations of "flaunting privacy" or "direct attack on individual privacy", and accused those who simply reason from copyright-history as making "bad faith" arguments. And, you are asking for the roughly the same level of expansive copyright interpretation – not at all a feature of the jurisdictions where the Internet Archive primarly operates – that copyright maximalists and DRM advocates do.

Just as you misrepresented the Archive's process for exclusion as requiring a DMCA request – even though your own links complimented the Archive's "straightforward" process – you're now confusing an imaginary claim of "commons" (no rights) versus my narrower claim of traditional balance, fair use, and implied licenses.

And if you think the straightforward, sympathetic, norm-respecting noncommercial policies of the San Francisco' based Internet Archive are a threat to queer and other often-persecuted lifestyles – rather than the opaque data-collection efforts of hundreds of other unobserved entities, platforms, apps, & persistent threats, up to and including actual nation-states – I believe you've made a dangerous category error. The Wayback Machine is a friendly canary reminding people of the risk and responsive to their concerns; others represent the fatal dangers of privacy blowback.

Thanks for promoting my Twitter & old blog posts!

My current & former affiliations are well-disclosed across my web presences - and I often mention my once-upon-a-time Archive involvement here on HN if more directly commenting on Archive details, as opposed to broad principles involved.

But I've not been full-time there for about a decade – and the specific blogspot posts you've chosen to highlight actually predate my tenure at IA. I don't speak for IA nowadays, only myself, as myself.

Yes, my work history is congruent with my beliefs about privacy & copyright on the internet, and prominently disclosed. (My jobs don't dictate my views; my views dictate my jobs.)

Your broken links are, thankfully, available at the Wayback Machine:

2008 "Gordon Mohr Takes Us Inside the Internet Archives" https://web.archive.org/web/20080619045327/http://news.oreil...

2001 "Security Fears for Peers" https://web.archive.org/web/20010331094133/http://www.wired....

How heavenly it might be if only every paid employee (and potentially, compensated advocate) of big tech, big copyright, big nation-state, big regulation, and big ideologies – as they pile-on the votes & comments here & elsewhere – were similarly open about their affiliations!


> How heavenly it might be if only every paid employee (and potentially, compensated advocate) of big tech, big copyright, big nation-state, big regulation, and big ideologies – as they pile-on the votes & comments here & elsewhere – were similarly open about their affiliations!

Luckily I'm not paid to talk about privacy. While there's probably some people who monetize privacy they're usually looked at negatively. Apple is a good case study of that. The closest you could get to saying that I'm paid to talk about privacy is my work on cryptography orchestration, but that was not built to be monetized - it was built to protect information and put users in control.

> My current & former affiliations are well-disclosed across my web presences...

> But I've not been full-time there for about a decade

> Yes, my work history is congruent with my beliefs about privacy & copyright on the internet, and prominently disclosed. (My jobs don't dictate my views; my views dictate my jobs.)

Doesn't really matter, disclose your affiliations - especially for the kind of wild statements you make (eg: comparing thread commenters to "big tech, big copyright, big nation-state, big regulation, and big ideologies".)

> Just as you misrepresented the Archive's process for exclusion as requiring a DMCA request – even though your own links complimented the Archive's "straightforward" process –

I didn't misrepresent it. https://medium.com/wednesday-genius/how-to-remove-your-websi... Quite literally, the most expedient way to get them to remove content is to frame it as a DMCA. Just because the changed process is "easy" or "straight forward" right now, doesn't mean that won't change on a dime. I already noted they removed a web norm and replaced it with email.

> you're now confusing an imaginary claim of "commons" (no rights) versus my narrower claim of traditional balance, fair use, and implied licenses.

I agree, commons is the wrong term, "fair use" is what IA legally rides on.

> And if you think the straightforward, sympathetic, norm-respecting noncommercial policies of the San Francisco' based Internet Archive are a threat to queer and other often-persecuted lifestyles

IA is part of the larger problem, I'm not playing whackamole with giant businesses acting badly, or as you put it pushing boundaries for some imaginary libertarian-esque greater good. Regulation will solve anyone who wants to host or do business on US soil.

> The Wayback Machine is a friendly canary reminding people of the risk and responsive to their concerns; others represent the fatal dangers of privacy blowback.

I am literally speechless at this logic. The idea that doing harm is somehow a canary for larger potential harm and is worth continuing to do is awful reasoning. IA can make their services less harmful without harming their larger mission, I've also proposed ways to do that that you have not responded to.

> You've made unsupportable allegations of "flaunting privacy" or "direct attack on individual privacy"...

I supported both of those statements. They took a self-service, automated system that is a web norm (which you apparently like) and replaced it with emails for DMCA takedowns. You can disagree with me, but they're not unsupported.

> ...accused those who simply reason from copyright-history as making "bad faith" arguments

I reasoned that people like you know that your comparison to organized, for-profit publishers are not cogent. Every time you respond I become more confident of that assertion, especially when you accuse me of being a shill for some "big copyright" conspiracy.

Lastly, probably the most salient point I've ever heard:

> "Copyright holders aren't going to be happy with Freenet and Gnutella," Mohr said. "They are going to want to start monitoring people at the ISP level, and that means there is going to be a coming war between individual privacy versus network security."

Ironically, years later you went to work on what would end up becoming a privacy eroding tool. I wish I was talking to the gojomo of back then, I think there'd been a much more productive conversation than this has been.


Your other link (https://jonathanwthomas.net/how-to-get-your-website-out-of-t...) explicitly and accurately reported, "You don’t need to file a DMCA notice or anything (but you can if you want to go nuclear)." Your insistence on sloppily referring to the process as such, and further your unwarranted fear it might "change on a dime", are odd slurs to insistently deploy.

> Doesn't really matter, disclose your affiliations - especially for the kind of wild statements you make (eg: comparing thread commenters to "big tech, big copyright, big nation-state, big regulation, and big ideologies".)

My current & prior affiliations are well-disclosed – better than yours, it seems.

And I wasn't "comparing" commenters to those things, I was pointing out: many pseudonymous voters, & commenters, here are often in the employ of, & de facto advocates for, the very topics of contentious discussion – such as Google, the US federal government, the entertainment industry, activist organizations, etc – with no disclosure.

So to harp on my open book of work history is again odd.

> I agree, commons is the wrong term, "fair use" is what IA legally rides on.

Indeed, it was unfair of you to characterize my arguments, or separately any rationales used by the Internet Archive, as involving a simplistic 'commons' assertion.

> I've also proposed ways to [make their services less harmful] that you have not responded to.

I've not noticed these proposals, just vague platitudes about "trying to establish consensus in good faith first" – what does that mean? – or an example of one person (Manley) who consciously chose to publish a life-archive along with their suicide note (!). How exactly does that outlier case convert to actionable policies for a web library, that legitimately seeks to be as comprehensive about publicly-published web materials as traditional printed-material libraries?

> Regulation will solve anyone who wants to host or do business on US soil.

No, it can't 'solve', and barely even helps, because the most serious threats are from government who themselves use regulations to force the violation of privacy – as with KYC rules, or demands for intercept capabilities – and from aggressive sub-state actors whose activites are largely invisible to regulators.

You're still alleging non-specific "harm" from the Archive without examples or magnitudes.

If the Archive helps makes people aware that what they've published persists in the public record unless they take conscious other steps, and lets them both correct the storage that surprised them, and leads them towards the truly-reliable practices for avoiding unwanted disclosure/persistence, it's done a better job than those promising safety from superficial 'regulation' that doesn't actually limit most threats.

Archiving publically-offered web pages isn't a "privacy-eroding tool" no matter how many times you repeat that allegation - it's a tool for cultural memory, honest history, and teaching people the ground realities of privacy (or lack thereof) in these new media. Directing ire at the Archive's well-behaved collection activities is getting angry at the smoke alarm, not the fire.


Copyright is a mechanism used to protect privacy in these situations. When you don't have copyright, you are stuck needing a court to protect your privacy. Copyright is also what is required to prove in order to get stuff taken down by IA when the content is not obviously illegal or personally identifiable information (or at least it was when I last needed to deal with it).


With respect, I fail to see how a public website is a privacy matter.


Information on a public website is public until it is taken down or the information changed. The Internet Archive removes an individuals control over when the information remains public. This is privacy. We might be caught naked, and we can't unsee what has been seen, but it is a basic human instinct to draw the curtains and contain further damage. Perfectly innocent individuals suffer because the IA rules are designed around edge cases where public figures try to hide misdeeds.


If you print a magazine you also don't get to recall all copies if you change your mind about something. Giving individuals this kind of control over other's ability to freely share information is dangerous because it is easily abused to hide information that is in the public's interest and that is not an edge case at all. Making a decision to publish something on the public web is hardly analogous to being caught naked even if you may come to regret either.

If anything, the IA should be more reluctant to remove information without a court decision.


> The Internet Archive removes an individuals control over when the information remains public.

And that's a good thing in the vast majority of cases. Unless we're talking about sensitive information that was published without the consent of the person in question, all public information should remain public forever.


In my experience, it is the vast minority of cases. Most of the content of the IA is not in the public interest, now or in the future. It is crap. It is noise. It is the contents of the Internet at a point in time. Actual information is the wheat in the chaff, and why you need search engines to find it. We know this, because of the Usenet archives that are intermittently available. Almost completely useless apart from people having a giggle at how the Internet used to be, a quick browse and search for naughty words. And a few gems in the mountain of noise, in such dire need of curation people hardly know it exists and barely justifiable enough for libraries to keep it alive.


Agreed, bulk collection gets dominated by crap, which individually has little value.

But there's some absolutely essential priceless diamonds hidden in the crap. And they can't be found/known at the time of collection: only with the future development of other events & knowledge do they become retroactively evident. So you've got to collect & preserve as much as you practically can, or else great things are lost forever.

Further, even the mounds/magnitudes of crap can turn out to be important for understanding the past. Ads that annoyed readers at the time help communicate how people, & businesses, & technology were really operating – not just the self-serving stories people craft later. The most-fumbling and awkward early uses of a new medium – hypertext, or RealAudio, or Shockwave Flash, or whatever – reveal enduring lessons about the evolution of technology & culture, including roads-not-taken that could still hold promise.

This shouldn't surprise us. Much of what we know of past civilizations comes from archeologists studying trash dumps that, via dumb luck, were well-preserved.

So if you tell me, "the Wayback Machine is a giant unedited trash heap of the internet", my response is: "Yes! That's the point! You get it!"


Some people discover much too late that there are some things they wish they could take back. Often before trying to get a better job or when trying to escape an abuser. Given the ramping up of attacks (legal and otherwise) on queer people, this is going to be a huge issue over the next decade or so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: