Hacker News new | past | comments | ask | show | jobs | submit login
You can help Anna's Archive by seeding torrents (annas-archive.org)
193 points by FabHK 10 months ago | hide | past | favorite | 114 comments



Anna’s Archive, a mirror (arguably the successor) of Library Genesis and Sci-Hub, is asking for help seeding over 500 TB of book and papers for long-term preservation. Currently, 40% of that are seeded by fewer than 4 nodes.

Note: This is 30M+ books, 100M+ papers. Depending on your philosophy and jurisdiction, you might be stealing a few billion $. Or not.

> By seeding these torrents, you help preserve humanity’s knowledge and culture. These torrents represent the vast majority of human knowledge that can be mirrored in bulk.


Is LibGen & Sci-Hub at risk of coming down? I know it's a game of cat and mouse, but I was just under the assumption that they were both doing fine relatively speaking. And what makes Anna's Archive better/different?


If nothing else, investment diversification.

Z-Library was significantly attacked recently. There was a huge takedown in 2022, and I'm finding reports as of March 2024 as well:

2024: "FBI Carries Out Fresh Round of Z-Library Domain Name Seizures" <https://torrentfreak.com/fbi-carries-out-fresh-round-of-z-li...>

2022: "Z-Library operators arrested, charged with criminal copyright infringement" <https://www.theregister.com/2022/11/18/zlibrary_operators_ar...>

The Z-Library arrests were of Russian nationals, but were arrested in Argentina.

The first Torrentfreak article mentions two other actions in Spring and November 2023.

Sci-Hub has played cat-and-mouse with domains for years, and AFAIU has still withheld posting new scientific articles given pending litigation/prosecution in India.

Another Torrentfreak article from Nov 2023 gives an update on numerous issues with text liberation sites, including Anna's Archive, Sci-Hub, and Z-Library. At the time, the Anna's Archive twitter account had been "wiped out", so much for that platform's "free speech" stance.

<https://torrentfreak.com/copyright-piracy-news-brief-1-extra...>


Disclaimer: not an expert in any of this, just stumbled across that page on Anna's Archive.

Sci-Hub is down (as in, you can't search/view/download papers from sci-hub.se), AFAIK.

Anna's Archive aggregates into a unified database material from Libgen.rs, Sci-Hub, Libgen.li, Z-Library, Internet Archive Controlled Digital Lending, and DuXiu 读秀. (See: https://annas-archive.org/datasets )



What are the most favorable jurisdictions to run seeding out of, mitigating risk of copyright related reporting and account closure (assuming virtual compute)?


Ukraine, Russia, Moldova, Romania, Vietnam, Africa. Often anywhere near those too


How about favorable places given an anonymous virtual credit card that you bought on a spamming-services forum? (As in, if you don't have to worry about being identified, and it's just a question of wanting the seedbox itself to be unlikely to get taken down)


A lot of english-advertising hosts from the regions in my comment are going to advertise DMCA-ignore, freedom of speech, and/or privacy. And you basically can't go wrong with any of them. Seedbox-as-a-service providers specifically are going to be generally be fine too although many of those are hosted in Netherlands which is slowly becoming less copyright-infringement -friendly.

One specific host I'll mention is vsys.host (UA) (which anna's archive uses too) but they're not going to be the cheapest option.


Turkey?


I think Turkey would be fine but I don't know or have anything to base it off of


Probably Sweden.

Edit: I knew Sweden is a piracy hub but I was under the wrong impression regarding how Sweden legally sees piracy. Yes this is where TPB was raided, but also this is where TPB launched, and there seems to be a certain ideology in Sweden that propels sites like TPB. Either way, please stay safe online.


Isn't that where The Pirate Bay folks were prosecuted?

https://en.wikipedia.org/wiki/The_Pirate_Bay_trial


Absolutely not for anything large scale like this.


Russia


> By seeding these torrents, you help preserve humanity’s knowledge and culture. These torrents represent the vast majority of human knowledge that can be mirrored in bulk.

I am having a hard time reading that claim as anything other than a bad-faith justification. Torrent nodes are not at all a good way to "preserve humanity's knowledge and culture".

EDIT: I'm no longer convinced I'm correct.


I only quoted a small bit from their website. Immediately below the quoted bit is this explanation:

> These torrents are not meant for downloading individual books. They are meant for long-term preservation. With these torrents you can set up a full mirror of Anna’s Archive, using our source code and metadata (which can be generated or downloaded as ElasticSearch and MariaDB databases).

I'm not an expert in any of this, but it doesn't strike me as a bad-faith justification of anything.


Can you elaborate on why you believe it’s in bad faith.


As a PSA, one additional reason to seed is because Anna accidentally doxxed herself via GitHub. So it’s worth preserving the archive on the basis that we should expect {the centralized portion of} it to disappear within the next couple years.

I was sad to see that happen, but it’s important to be objective and plan future actions accordingly.

(And sure, there’s always the chance that some random person on GitHub just so happens to be named Anna and is an archival enthusiast, but a jury of one’s peers may find that it passes the reasonable doubt threshold.)

My legal troubles with books3 weighed on me pretty heavily, and I wasn’t even the target. Yet. I can only imagine what it feels like to be waiting for an indictment.

There ought to be some sort of protection for preserving books in bulk. No one is going to read two million books. But of course, one could also argue that having a readily available archive is harming the economic profitability of the works, on the basis that content licensing for AI is now a multimillion industry. It’s weird, because it feels like important work, rather than criminal — someone should put into words exactly what the distinction is.


They did deny involvement so it'll be interesting to see what happens (you probably know this though) https://torrentfreak.com/key-defendant-in-annas-archive-laws...

>It’s weird, because it feels like important work, rather than criminal — someone should put into words exactly what the distinction is.

Important work can be criminal


> Important work can be criminal

Yeah, illegal/criminal merely means that it's against the law and will be prosecuted if it can be pinned to you.

As an outlandishly severe example: helping Jews in Nazi Germany was criminal.


I think you’ve stumbled upon a paradox where the data is worth preserving because of its scarcity, but once it no longer scarce, the value is diminished along with the priority to preserve and make it discoverable. Similar in some way to the antique/collectibles market and the U shaped curve. The issue is that the data can become so devalued that we create a scarcity in the future. It’s one of those self-regulating systems.


Link to the doxxing incident? When / how did this happen?

Edit: I'm asking as I find no mention of this elsewhere. Notably not on Torrentfreak, which tends to keep on top of such things, on Anna's subreddit, blog, or her Telegram channel


Maybe this is it? https://torrentfreak.com/key-defendant-in-annas-archive-laws...

> On information and belief, in addition to her extensive online presence, she has a GitHub (a software code hosting platform) account called, "anarchivist," and she developed a repository for a python module for interacting with OCLC's WorldCat® Affiliate web services.


Yeah. There was some HN discussion about it when the story broke too; maybe someone can dig it up on HN Search.

I think the civil suit will probably be dismissed due to lack of evidence, but it’s likely the police have probable cause to start a surveillance warrant. At that point, either she’ll stop all activity (which it doesn’t seem like she’s doing) or it’s inevitable they’ll catch her in the act.

The slim hope is that maybe it’s not her, or that police won’t pour a lot of resources into surveilling her. Maybe she’ll get lucky. But given the aggressive ways law enforcement has gone after people for e.g. scraping AT&T’s website, I wouldn’t bet on it.


Thanks, and I'm hoping likewise.

Here's the 2-month-old discussion on the Torrentfreak article:

<https://news.ycombinator.com/item?id=40143549>

And that seems to be the only HN submission with > 5 points that matches this description since 1-Apr-2024:

<https://hn.algolia.com/?dateEnd=1718327852&dateRange=custom&...>

A similar search for "OCLC" turns up nothing.


I think the Library of Congress already does preservation; it seems to be legally required to give the LoC two copies of every published work in the US (https://www.copyright.gov/circs/circ7d.pdf).

Preservation of works is very obviously not why Anna's Archive is asking for torrent seeders. Seeding is for distribution and availability, not preservation. It would be more honest to say "preservation of ML training fair use access".


> Preservation of works is very obviously not why Anna's Archive is asking for torrent seeders.

Can you elaborate on that? They write:

"These torrents are not meant for downloading individual books. They are meant for long-term preservation. With these torrents you can set up a full mirror of Anna’s Archive, using our source code and metadata (which can be generated or downloaded as ElasticSearch and MariaDB databases)."


I'm no longer convinced I'm correct.


Preservation is, unfortunately, not access for not only the vast majority of US citizens, but of people worldwide locked out of the proprietary document system.

Even such half-aborted systems such as Hathi Trust permit downloads only by the page, even from out-of-copyright works, which is absolutely infuriating. Full access to the Hathi archives (that is, in-copyright works, which cannot otherwise be viewed at all) is restricted to college libraries only, not public libraries generally.

The Internet Archive, LibGen, Z-Library, Sci-Hub, Anna's Archive, and other similar efforts are really the only viable means of access to much of the world's published information, whether inside, outside, or straddling extant copyright law. Which of course was written and lobbied for by existing copyright holders.

I'd much rather we burnt down that law than our true libraries.

As to Anna's Archive and the question of preservation vs. access: AFAIU the full archive is already available and stored in multiple fully-independent copies. As such it will all but assuredly survive even extreme legal attacks, let alone other threats. Access to those works is what torrent seeding provides, and as a means of making the archive available and useful is key to its function and service, but (probably) not its survival.


Interesting.

IIRC, libgen used IPFS for preservation efforts.

Anna's Archive (seemingly the successor) appears to have migrated to BitTorrent.

I wonder what motivated the move?

Edit: asking as someone who works daily on building p2p software. We've abandoned mainline BitSwap (IPFS) in our work for similar reasons as the rest of the rust-libp2p community, but haven't found a particularly good "successor" protocol for a generalized use case yet. We are currently using our own ad-hoc hand-rolled chunking/transfer protocol as needed.


Anna here.

Libgen still uses torrents primarily for preservation. It also hosts on IPFS but that is more for access, and there are very few IPFS seeders.

We tried IPFS for a bit but found it not stable and usable enough for preservation purposes. We're closely watching IPFS development and hope that it will get there, since it would be wonderful to merge the preservation and access use cases in one system.


That's wild (in a good/interesting way) to me.

I've found the BitTorrent protocol tries to be more suited to accessing popular data on-demand (i.e. streaming a popular file) vs. archival.

IPFS' BitSwap protocol strikes me as trying to be optimized for longer-term preservation (higher latency time to first byte in exchange for more resilient pinning/discovery/propagation of rare data).

It's cool you're observing the opposite. I've had a growing suspicion that both protocols haven't quite realized the benefits they were hoping to get from the trade-offs they made in their transfer/discovery protocols.

Would love to compare notes at some point if you'd be open to it.

We've been playing around with both BitTorrent and IPFS. Some of the datasets we are working towards supporting are approaching the scale you work at (100TB archives).

Ultimately both BitTorrent and IPFS have fallen short for me when trying to seed 100TB datasets.

I've got a hunch that we're going to need to roll a new protocol to tackle these larger datasets that merges some of HTTP's, BitTorrent's, and IPFS' approaches to sharing content.

I have personal R&D list for pushing a file sharing protocol past the 100TB limit (not in any particular order):

* Better chunking using a mix of:

  * Rolling hashes

  * File boundary splitting

  * (should enable deduplication of identical files across nonhomogeneous archives, and allow for adding content to an archive with without losing the existing seeders)

  * (inspired by prior art in container storage: https://github.com/hinshun/ipcs)
* "online" deterministic archive formats w/ detached metadata

  * Ability to share a directory as an archive, or partial slice of an archive, without having to generate the archive on disk. (Announce a "tarball" like archive on the DHT without having to generate it by being able to generate the "chunks" on demand from the directory)

  * Detatch the manifest containing the archive's contents from the archive, so you can download/parse the manifest without downloading the full dataset. (You can use this to find the chunks specific files are in. So you can download a single file from a 1TB archive, and the client can seed that file back to the network as part of the archive.)

  * Chunking of manifest files for large datasets, since the manifest itself might grow to many GBs in size (manifest resolution inspired by IPLD's data structure)

  * Normalize file metadata in the archive header so timestamps etc. don't muck up your CIDs

  * Deterministic ordering of files in an archive
* Chunking/Transfers/Announcing/Discovering

  * Supporting increasing the chunk sizes for large files past 1MB. A 100TB dataset w/ 1MB chunks requires ~209M CIDs just for the chunks, that's a lot of load on the DHT and a lot of work on the seeding node to keep the data available.

  * Support interruptible/resumable/recoverable downloads from peers using something similar HTTP RANGE header semantics

  * Merge BitTorrent's DHT query approach w/ IPFS' DHT query approach, asking connected peers for CIDs and tit-for-tat reciprocity while simultaneously hedging your bet by kicking off the slower DHT traversal to find more peers
* Connectivity

  * Bringing mobile devices and browser tabs into the fold as first class peers that can both download and seed content

  * (i.e. WebRTC: https://github.com/libp2p/rust-libp2p/tree/master/examples/browser-webrtc)

  * (proof-of-concept NAT hole punching appliance for end-users: https://github.com/retrohacker/turn-it-up)
Thank you for everything you do


I suspect in part it's the required capacity. This Project is far beyond what even the largest private trackers could host but if anyone comes even close to be able to keep this alive when the copyright mafia comes it's the torrent community.


Nexus - another very large archive - is using IPFS. But in my experience Bittorrent works a lot better at this scale. The IPFS UX is full of papercuts, when it isn't outright bugging out or crumbling under the size of the dataset.


IIRC, Nexus is using Iroh[0] instead:

> Starting with v0.3.0, Iroh is a ground-up reimagination of the InterPlanetary File System (IPFS) focused on performance.

Also see, A New Direction for Iroh[1].

[0] https://www.iroh.computer/docs/

[1] https://www.n0.computer/blog/a-new-direction-for-iroh/


I'm guessing the decision comes down to ease of use for people to participate in mirroring. My underestanding is IPFS tends to require more infrastructure, and still requires someone to pin the data.

Many bittorrent clients let you click a button to continue seeding the data over time.


I want to help but I guess I'm not being able to, maybe someone can help me.

I want to give 5GB (don't have much storage), so I put "Max TB:0.005" and "Type:URLs".

It give me this url:

https://annas-archive.org/dyn/generate_torrents?max_tb=0.005...

Who has this two torrent files:

https://annas-archive.org/dyn/small_file/torrents/external/l...

https://annas-archive.org/dyn/small_file/torrents/external/l...

I put that torrents on Transmission, one is 5GB and the other 4MB, the 5GB is not downloading/seeding, the 4MB was downloaded and is seeding:

https://imgur.com/a/80k3y1D

Any help?


Anna here. Thanks for the broadcast.

We're soon releasing another few hundred TB of completely unique materials (mostly books, also lots of magazines), so help with preserving all of this is sorely needed.

Much, much thanks to everyone who has contributed in one way or another already!


Okay so is there somewhere on the Internet that I can get a VPS with crypto? Because I'm not going to use traditional payment methods to access this.

EDIT: Thank you to everyone for your recommendations. I shall find an anon seedbox. I don't mind if it's nuked. I imagine I'll just pay per month and losing one month's spend won't hurt.


There are some server hosters that you can buy with a locally bought paysafe card. They mostly cater to young gamers that want to host private servers but don't necessarily have access to the banking system yet. They will however very likely nuke your server the second they get a dcma notice. So you need a vpn setup on that server too. Several providers offer bitcoin payment, with mullvad you can buy scratch-off tickets on amazon, which would be the most anonymous option. Of course setup has to be done on a public wifi and requires some custom setup.

Alternatively, there's /r/seedboxes over on reddit where most vendors accept bitcoin and get you a complete setup.


A couple lists:

https://kycnot.me/?t=service (not just hostings)

https://bitcoin-vps.com/ (an extensive list for ones that accept BTC, most accept other coins too)

There are hundreds of VPS hosters that accept crypto, but the important part is that a lot of them are not happy about abuse reports either way, so you'll probably have to use a VPN (like Mullvad) on the VPS itself to not get it suspended :)


I highly recommend https://mullvad.net/


Sorry, maybe I wasn't clear. I'm not storing this data locally. I want to seed it from the Internet without a connection to me except for SSH. Essentially, how can I get a seedbox with crypto.

Though it does strike me now that any such service will be primarily used by criminals.


>Though it does strike me now that any such service will be primarily used by criminals.

Just think of them as "anti-state actors"

Getting a small VPS with crypto and tunneling from your home seems like a better pathway to me?


I'd recommend SeedHost in that case!


I don’t think they sell VPSes though


Is it even possible to do this effectively using Mullvad after they removed support for forwarded ports?

[1] https://news.ycombinator.com/item?id=36113215


You're looking for a "seedbox" which is essentially the torrenting equivalent of a VPS


The service you're after, I think, is a Seedbox. Feral Hosting is once such service that accepts payment via crypto.

They're not a VPS in the traditional sense, but they give you a slice of a server, with a torrent client pre-installed.


Mullvad will even let you pay with cash (stuff dollars into an envelope).


That is the mentality that keeps the world down.

Perhaps Great Depression 2.0 will set our priorities straight.


Njalla has VPS and VPNs.

I'm a happy VPN customer, they've been excellent.


I'm told Anna's Archive forces a javascript Cloudflare human-browser check on the visitor.

Given how much data Cloudflare (and other similar giants providing this service) has, this pretty much identifies the person to them.

Any plans to change that to a more privacy-protecting solution?


We'd welcome a Merge Request on our Gitlab for a Captcha implementation that is as good as Cloudflare but can be done without any external dependencies. If anyone cares enough to make that, we'll merge it!


Your mistrust is not unreasonable. However I think if state actors/rightsholders/CF were serious about monitoring user activity covertly they could just simply volunteer to host a mirror/CDN for Anna. And that would also expsoe paying users.


From what I gather, in some jurisdictions downloading copyrighted material is legal (eg via HTTPS from a website), or at any rate not prosecuted, while uploading (offering to download) it (eg while downloading it via BitTorrent) is illegal (and prosecuted).


Not an expert on laws/jurisdictions; but data is collected, stored forever, and in some places is (legally) on sale. And laws get changed quickly based on lobbying.


Would their plan of keeping the entire archive in torrents alleviate that concern? Like, if a person sourced an index from somewhere privacy-centered, they could directly download from the torrents in whatever private method they wish, right?


I'd expect the number of users from browsers to always be orders of magnitude larger than torrent users.


I have this mentality problem where I don't like using my VPN except when I specifically need too.

To permanently seed some of these torrents I'd have to keep it on all the time. Would have to keep my desktop running 24/7 too?


I used to be there, until I discovered Mullvad, which is so fast and reliable that there’s no downside to using it permanently. The only time I ever notice is when, very rarely, a particular node is flagged by some website (most often Reddit) as problematic and I have to switch to a different one.


> I have this mentality problem where I don't like using my VPN except when I specifically need too.

This is extremely suboptimal. "I only hide my activity when it's worth hiding" paints a giant target on your back. VPNs as a matter of course protect you from all manner of anti-consumer tactics, and if you don't obfuscate all of your traffic, it tips off surveilling parties to only focus on the subset of traffic that routes through a VPN.


"I don't want to receive DMCA notices passed through my ISP because it might disrupt my service" type use cases don't imply you also care about general privacy concerns or other use cases of VPNs too.


And I'm expressing to OP that they should, especially given the repeal of net neutrality and exposed NSA dragnet programs. Unless we just think a social credit system will magically never make its way to the US and impact every portion of our lives. Imagine getting higher health insurance because your parents browsed websites labeled as unhealthy.


I think this is a fine soapbox but also that stating it in this "I think you should care because" style you just did is both better and different than starting by asserting the conclusion from your stance. It's not only more conversational but easier for those who don't already hold your stance to understand how your claim applies to their situation. That's all my example was trying to clarify, not necessarily argue against that people should/shouldn't consider or ultimately care.

Thanks for the clarification of your stance on VPNs and privacy, I appreciate the insight into why one should care.


I appreciate the feedback, I'll take that into consideration!


I agree with you. I was more or less describing a negative trait of mine for why I haven't participated in seeding information thus far.

I'm definitely already on a health insurance bad persons list. My employer forwarded myself and other employees terminated over "that thing" to the FBI!


What a load of malarkey. I'm convinced the US will never get free healthcare because healthcare is used as leverage to keep the middle and wealthy class in check while also oppressing the lower class.


You don't have to seed 24/7. It's OK as long as (parts of) torrents get seeded regularly.


Exactly. If you seed it a couple hours once a week that still ensures that whoever wants the torrent can get it.


> Would have to keep my desktop running 24/7 too?

You could build a seed box out of a old ARM board running the Transmission daemon and a USB key mounted read only to avoid wear; power draw would be just a few watts and total cost could be less than 50 bucks. The desktop would be needed only when adding torrents or changing configuration from its web interface, although Transmission also has remote control apps running on phones and tablets. If the router permits it, QoS rules can be set up on the router so that the seed box can use all bandwidth, although at lower priority than other machines on the LAN, so that it will never clog the network, which comes handy for example with online gaming.

https://transmissionbt.com/


You don't have to keep it going 24/7. If you connect every once in a while, you will assist clients that come and go for parts they need. You don't even need to have the complete torrent either -- the tracker will know you have those parts and direct clients to you.


I don’t get it. If you do not want to participate in the preservation and distribution of the archive, why don’t you just move on instead of complaining?

Besides, gluetun+chihaya+qbit containers do the job without breaking a sweat, and without ever having to remember that you run a VPN - as it’d only be tunneling the containers of your choice. gluetun is the best image ever made!


Hang on. I think the OP is trying to find a middle ground between their level of comfort and contributing to the project. I think their intentions are positive; they want to help but are constrained by their other needs or priorities.


They’re not complaining they’re asking a question. I think you completely misread the tone.


No I definitely want to. I was stating why I haven't thus far. I support piracy.


Permanently seed is more about the data being reliably available at some point in the future more than continuously seeding.

As for the VPN side you should be able to configure your VPN to always tunnel your torrent app but not always tunnel your entire computer. The best/easiest way to do this varies by the specific VPN application and your OS.


In qBitorrent or similar torrent clients you can explicitly set the vpn network interface they should use. Use split tunneling for you browser or other applications you don't want to use the vpn for.


What I'd do is look at getting some small used PC (the 1 liter ones, tinyminimicro on servethehome, etc.) that doesn't take much power to do it. Cheaper to leave running 24/7 and it's easier to setup to work ONLY if the VPN is up. You can likely even setup virtual machines to do it (i.e. setup an OpenWRT virtual machine that can access the rest of the network as the "wan", but the "lan" to it is all virtual, and doesn't route traffic from the lan to the wan, only the lan to the vpn).


We made a simple way to lock QBittorrent into a VPN in a container [1]. It's probably simple enough to follow what we did in the config script to set it up for your use case (all open source [2]).

[1] https://www.youtube.com/watch?v=PrH6Ci_4eig

[2] https://github.com/ipv6rslimited/cloudseeder


Or take an old laptop and use as a seedbox


What's the problem? Seed when you want to seed.


They also use Telegram actively for latest announcements, fyi.

https://t.me/annasarchiveorg


Are any of these legal to seed in the US, i.e. not in violation of copyright?

E.g. Consider someone with a gigabit connection that their VPN can't keep up with. It would make sense to seed the legal items without the VPN, and the other items with the VPN.


I wonder if there is any benefit to using something like DwarFS (https://github.com/mhx/dwarfs) for something like this.


Is there a link to the legal considerations of contributing as a seeder?


copyright law would've already burnt the library of alexandria several times over, just something to consider about the validity of these laws written and bribed into law by publishers


It's fine that you don't agree with the laws, but the guy is asking what the actual legal impact may be if he participates. Answering with "who cares, the laws are dumb" isn't helpful to him... people get arrested and charged against dumb laws all the time, and it sounds like he'd like to avoid that.

edit: this is maybe the link he's after? looks like you need to be logged in to see it though.

https://annas-archive.org/copyright


so what if I didn't answer their question? its called a comment for a reason. (also: its "them", don't assume their gender like that, come on its 2024)


How so? We literally have libraries today, and copyright law hasn't burned them down.


There are a few reasons for that. The primary being that sharing a physical book do not count as copying, but sharing a digital book does.

The second reason is that libraries tend to operate under government control, and governments has done things to enable libraries and work around copyright law. An old one that my country (used to?) have was to require publishers to send copies to the national library. In return, national authors got a symbolic sum (very tiny) each time a copy was taken out. Being forced to send a copy to the government isn't technically against copyright law, since no unlawful copying is being made, but the result has a very similar feeling as unlawful copying.


Sure, but if anything that just reinforces how ridiculous it is to claim that "copyright law would've already burnt the library of alexandria several times over" - the LoA was very much backed by government power - https://en.wikipedia.org/wiki/Library_of_Alexandria#Early_ex... describes not merely going out and buying anything they could get their hands on, but outright using government mandate to seize and copy new books that passed through port.


What does that mean? The library of Alexandria was happy to burn down all on its own, centuries before copyright.


He's not saying that copyright literally burned down the library of Alexandria.


Depends on where you are located; in some countries they may not give a damn if you seed some torrents, but will happily take down your site and prosecute you if you run a torrent service.


Wouldn't this be a good application for IPFS?


It's already on BitTorrent. IPFS doesn't do much BitTorrent doesn't already, most of it is a new coat of paint and making the same mistakes BitTorrent figured out years ago.


It does one thing BitTorrent doesn't — you can compose a new CAR file by combining a few new chunks with a bunch of existing chunks. So you don't get the problem where releasing a new version of an archive means nobody's seeding it; and anyone moving over to seeding the new version stops seeding the old version. Instead, the new file is already pre-seeded by all the old version's seeders on all but the new chunks (because they're seeding the chunks, not the file); and the old file stays seeded as the seeders find the new version and seed its blocks too.

Really, BitTorrent could do this by making all torrent files a small fixed size and then having "torrent files of a directory of torrent files" where the torrent client knows to queue the sub-torrents as they're discovered+downloaded in the parent torrent. But that's not how any part of the ecosystem works. IPFS is a "do over" that allowed them to fix this.


>releasing a new version of an archive means nobody's seeding it; and anyone moving over to seeding the new version stops seeding the old version

BitTorrent v2 would in theory be able to seed individual files even if they come from a different torrent. But clients have no reasonable way to look for other versions of a torrent that contain a file they already have.

The main Bittorrent clients already support creating and seeding v2 torrent. But there's just no infrastructure for seeding at the individual file level.


One major benefit of IPFS is that people seeding individual works and people seeding the large archive groups can share data. It seems that these torrents are blocks of data that aren't of direct use.

That being said while the IPFS protocol is decent the implementations kind of suck. Bittorrent is well established with many high quality implementations.


Why is everyone acting like this is some radio active data.

If you don't distribute further, it should be pretty ok to download and play with this?


Kindly recall the predicament of Aaron Schwartz that led us to this point.

https://en.wikipedia.org/wiki/United_States_v._Swartz


He broke into a network though


If you download a torrent, your client will offer to upload pieces for other clients to download.


You can choose to not seed further or set your upload levels to zero


It looks like you don't understand how torrenting works and what "helping by seeding torrents" means.


Seeding 500TB I'm guessing requires 500TB of disk space. I don't have even close to that.

I would if I could though, so excited to see anyone answer the call!


The individual torrents are only a couple GB to a couple TB each. You can automatically generate a list of torrents of how ever many terabytes you are willing to seed

1: https://annas-archive.org/torrents#generate_torrent_list


It seems you can specify the storage space you want to devote, and it will suggest torrents to match. For example, duxiu is only ~40gb. This is separate from torrent-client limiting -- you can just seed specific parts of the collection if you want.


You don't have to download all files in the torrent. Most torrent clients support partial downloads and you would only seed those parts. The main concern would be that you would be undoubtedly distributing copyrighted material


Most torrent clients let you select subsets of file to download (and thus seed) so you can choose a portion to download whatever amount of disk space you are willing to lend.


100 people sharing a shard of 5TB each is even better (more resilient).


It's possible to download/seed just a fraction of a torrent.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: