Does IPFS actually solve the problem they set out here though?
IPFS is a distributed CDN; but not very good for storing things persistently or reliably from my experience.
At the moment; the nixos cache is stored in very durable and reliable S3 storage; with very high durability guarantees. Why is that not good enough?
sure it's centralised. But IPFS doesn't offer distributed durability; it offers a CDN. It doesn't seem to address this issue the authors seem to claim it solves. (To me)
I still think it's _super cool_. And once builds are also content-addressed and not just fixed-output derivations; having trustless content delivery is a valuable addition. But IPFS doesnt' seem like a robust answer for the "what if we lose access to the source code problem" given it's not a durable storage system
S3 durability works as long as somebody pays for it. When the bill is not paid then it is 0.0000000 and objects are gone forever.
With distributed storage like IPFS or BitTorrent the availability of resource is proportional to its popularity.
So as article explains, initially there will be group of seeders. I assume anyone who downloads an keeps a package on the system becomes a part of sharing swarm. This dramatically reduces the burden of hosting from package creators as the cost gets distributed among all available peers. As long there is one peer who has the complete package it will be available even forever.
"Distributed CDN" sells ipfs a bit short. It has properties of a distributed CDN but usually people think of a CDN as not a good fit to store the primary assets, but ipfs can have pretty good persistence and reliability guarantees if setup accordingly.
Usually a project like Nixos would not use the simple standard ipfs daemon but something like cluster.ipfs.io and i think i have also seen ipfs servers on top minio or s3 for storing the data. So the actual databits durability and reliability can be the same or better than the s3 comparison with ipfs.
A discussion like this is best split up into the api and storage system parts:
so S3 becomes the S3 Rest API and an S3 object store and IPFS becomes the IPFS access protocol (and IPFS gateway REST api) and whatever IPFS storage system a projects choses to use.
The cool thing about IPFS for package managers lies more in the protocol and architecture part than the low level data storage.
- content addressable storage as core design principle solves a lot of problems that need to be manually build on top of s3 anyways if you want to build a CDN or Package manager directly on S3, but in a standard and documented way . I would argue even without the p2p distributed part of ipfs, it would be a great fit for a content addressable storage system.
- "discovery": finding packages across different servers, this is such a game changer. Traditionally you would have 1 or 2 repositories configured and packages are fetched from those, if these servers do not have the right package or versions or are offline you are screwed (having to manually find alternative mirrors or finding copies of the package elsewhere). In ipfs, if your peers don't have a file, it can just ask the network for available "mirrors" and even users who have a copy and contribute to the network can provide you their copy automatically.
We built an alternative to IPFS called Skynet that attempts to solve a couple of the major issues with IPFS. The biggest one being data durability and uptime.
On Skynet, pinning doesn't mean hosting the file from your machine, it means paying a bunch of service providers to host the file for you. When you pin content to Skynet, you can turn off your computer 5 minutes later and the data will still be available globally. Just like IPFS, data is content-addressed and anyone can choose to re-pin the content.
Service providers are held to 95% uptime each, which means doing something like 10-of-30 erasure coding can get you 99.99% uptime on the file as a whole. The low uptime requirement for individual providers dramatically cuts costs and allows amateurs to be providers on the network. The erasure coding algorithms ensure data availability despite a relatively unreliable physical layer.
The other problem Skynet set out to solve is performance. In our experience with IPFS, if you aren't talking directly to a node that's pinning the content, IPFS is very slow. We've heard stories of lookups taking greater than 10 minutes for data that isn't pinned on a major gateway.
Skynet uses a point-to-point protocol rather than a DHT, which not only makes it faster, it's also more robust to abuse. DHTs are pretty famously fragile to things like DDoS and active subversion, and Skynet has been designed to be robust and high performance even when malicious actors are trying to interrupt the network.
Other than that, we've tried to make sure Skynet has feature parity to IPFS. Content-addressed data, support for building DAGs, support for running applications and static web pages, and then we've added a couple of elements of our own, such as APIs that allow applications and web pages to upload directly to Skynet from within the application.
Say what you will about IPFS and ProtocolLabs, but I really admire that they made IPFS completely blockchain agnostic.
There is no requirement to buy or receive coins to use IPFS. And multiple different implementations of persistent stores can use the same IPFS network to ensure durability.
This means that FileCoin, other coins, static hosting providers, self-hosting can all coexist and strengthen the same network.
Every other distributed content addressable file store seems to only be interested in supporting their own coin only. For that reason alone, I find IPFS the most interesting.
Yeah the layering in Protocol Lab's work is really nice.
The tendency of funded / for-profit software is to eschew layering, so it's easy to try out but ultimately ill-fitting and inflexible. This is a core problem of capitalism with IT.
IPFS, libp2p, IPLD, Filecoin etc. all resist that temptation, and I think it will help them greatly in the end.
I really appreciate hearing things like this :) We put a ton of work into making the layering of our projects just right so that they can be reused over and over even if other things we're working on fail. It often does make it slower for us to make progress (could take soooo many shortcuts if we tried to collapse the stack) but I really do believe it's worth the effort.
This looks interesting. Why is the payment method based on a crypto coin? How does using a blockchain help make Skynet and Sia work?
I am wary of crypto coins as they tend to have wild swings in value over time as they are used primarily for speculation... can I be sure that if I upload something today worth $2/month, in 2 years I will still be paying this much or less?
A common theme in distributed filesystem conversations is the idea of "socializing" the costs of intermittent loads. If I 'pay in' a little more than my mean traffic capacity, then my surplus during low traffic cancels out some of my peak traffic. Peak shaving and trough filling.
If you are doing business in India, you get payed in rupees. If the workers are in India, you just pay them in rupees. If you have to exchange currencies you end up with several types of friction that just create headaches and potential losses. If you periodically cash out or inject cash it's easier to deal with than on every transaction.
Denominating file replication services in a "coin of the realm" just seems like the same sort of rationale.
One of the problems with capacity planning is that you get punished for being wrong in either direction. You bought too much hardware or not enough, too soon or too late. With an IPFS or a Skynet, putting your hardware online two months before you need the capacity at least affords you some opportunity to make use of the hardware while your Development or PR team figures out how to cross the finish line.
> Why is the payment method based on a crypto coin?
A blockchain gives a way to make a payment and provide identity/ecrypt functions to keep the resource private while it is active.
Yes, one could create a system that attaches other authentication (user/pass or oauth), but then one has to create/connect a payment system which then uses that login information in conjunction with credit card information. To sell a product online taking credit cards requires an excess of 10 pieces of information that must be provided by the user.
In the case of compute resources, I may want to deal with 100s of providers to host my resources (blog, images, video, code) and I'd need to use an intermediary if I wanted to be efficient about it.
With something like Lightning payments, a system like this can provide resources without the need for a "signup" process or intermediary.
> can I be sure that if I upload something today worth $2/month, in 2 years I will still be paying this much or less?
What does the future value of a fiat currency have to do with the current rate of storage on something like AWS? Would those prices not go up if the currency was undergoing deflation? Would you not have paid less integer values before the deflation? Would you not pay more integer values after? Where does something like AWS provide cost protection, other than spot instances?
If you create a crypto contract and put the funds in escrow, then there is ZERO ways for the cost to go up over the life of the contract. Other than a bad actor scenario, which is why having multiple providers is the way to go.
> I may want to deal with 100s of providers to host my resources (blog, images, video, code) and I'd need to use an intermediary if I wanted to be efficient about it
Why multiple providers? With no-features data storage, S3 can easily provide that without an intermediary and in a pretty well automated way.
> Would those prices not go up if the currency was undergoing deflation?
Unless you're in a country with very unstable currency, the exchange rate change will be very slight. Sia changed ~60x over the last 2 years, which is significant.
The price of course won't change during the contract, but what happens after is not trivial to plan for.
Sia is a decentralization first protocol. We believe strongly that the biggest advantage is an immunity to de-platforming and a commitment to uncompromisingly open protocol.
Crypto is the only means of payment I am aware of that does not have a centralized middleman with the power to deny a transaction.
Decentralization aside, there are efficiency gains as well. Every transaction on the Sia network is point-to-point, and in some cases we've had nodes that average more than 1 million discreet transactions per day for over a month. The total cost of doing that was something like $100 (including the cost of all the resources bought with those millions of transactions), I struggle to imagine a traditional payment system providing that kind of value.
There's also a lot more flexibility to innovate. For example, every single one of our payments is accompanied by a cryptographic proof that the accompanying storage or computation (not many people know, but the Sia network does support a limited form of computation) action was completed correctly. The payment and computation are fundamentally/cryptographically tied together in way that we could not reasonably achieve on a traditional payment system.
I imagine you would ensure the persistence of the data you care about by either running your own IPFS nodes that pin the data, or by using a pinning service like Pinata [1].
The problem with Pinata is that they charge $150 per TB for their Individual plan. That's nearly 6.5x the cost of storing the data on S3. Sure it works, but that high barrier to entry pricing scares off a lot of people. Why not just use S3?
Meanwhile, we're still waiting for Filecoin to launch, and networks such as Sia have seized that opportunity and created great things like Skynet [1]. Skynet itself still has some overhead if you want to ensure data persistence and availability, but the cost is orders of magnitude lower. In addition, new layer 2 providers have emerged to address those gaps, such as Filebase [2]. They provide S3 compatible object storage that is backed by decentralized cloud storage. You get high availability (of the storage layer), geo-redundancy, and less than S3 pricing out of the box.
It is this type of offering where we are going to see the most impact and adoption as the underlying technology not only makes things more efficient, but cheaper too.
I started moving images of my wordpress blog to ipfs using 3 most popular gateways. I'm moving slowly, image by image, but so far it was quite a success. As images expire from the gateways I had super simple and cheap IPFS nodes - unused raspberry pi! My main IPFS node is RPi 0W (the wireless one). It overall dropped my main page loading times, and it costs £5 (rpi) + £5 32Gb sd card. First images - the smallest and most often loaded few were migrated 7 months ago.
I hear it in every discussion, yet I'm still waiting for any of my SD cards to fail. I've had the same rpi0 for 3 years as CCTV with motiond (motion detector). Recording all activity from my window, filling the card in 100% and removing all recorded data, every 2 days, no errors, no failure. I no longer need that CCTV so using the rpi, with the same SD card, as IPFS node.
IME, the people who complain about SD cards failing on raspberry pi bought cheap cards that were intended for bulk storage, not frequent writes or running an OS card. Regular SD cards are not SSDs.
On my network of random Pis and other stuff, I use only high-endurance SD cards which can withstand lots of writes have durability much closer to SSDs.
Here’s from someone who only bought the supposedly highest recommended ones and had several failures before I switched to usb. Maybe roughly average 10-20% failure/ year uptime for something that has heavy log writing?
A thing people tend to forget is temperature. Keep them cool and they have higher chances of surviving longer.
Use external USB SSD (what I did for a mail server). One could also be very aggressive with making the kernel lazy about flushing to SD card. (It has downsides, but if you put a UPS shield on, it's a pretty neat hack.)
I absolutely agree it's hard to go head-to-head with s3 in the short term --- this is why I am most excited about sharing sources. Once the ecosystem is bootstrapped, it will make more sense to use IPFS for binaries too. (e.g. if you wanted to build some fancy multiple build farms and reputation system.)
- IPFS as CDN means software heritage can be "seeder of last resort"
- Original authors uploading to CDN, using IPNS or similar for git tags/versions, should make it easier for software heritage to archive the code in the first place.
> Does IPFS actually solve the problem they set out here though?
No, none of these distributed P2P networks (that I've seen) do. The problem isn't just building a DHT (kademlia-based networks have existed for years) the problem is incentivizing people to seed - ideally people with high-bandwidth and massive amounts of storage who are seeding data that people want. In other words you need to build an economy on top of the network.
Cryptocurrency could be used for this, so long as its a cryptocurrency that supports instant micro-transactions (ie. you don't want to be writing to a blockchain every time you download a 10KB gif). So maybe someone will get around to building an IPFS-clone but on top of bitcoin's lightning network or something like that. Clients would need a way to decide which peers to send requests to based on who's offering the best speed/reliability vs price. Servers would serve higher-paying requests with higher-priority and would drop any requests that are too stingy to even cover bandwidth costs. Using the network wouldn't be free but it would be extremely cheap and fast and reliable.
One problem this doesn't solve is getting other people to backup/seed your data for you. An idea I think would work for that would be a prediction-market based reputation system for peers acting as storage hosts. That is, peers could advertise themselves as storage hosts, you could upload your data to them (for a fee) and they'd give you a cryptographic receipt promising that they'll still be able to deliver the data at some later date. People could then make bets on whether a host will fail to uphold any promises before a certain date, and the betting odds would be a measure of a hosts reliability. Clients that are uploading their data to the network would take that reliability measure and the price into account when choosing hosts to upload to. At any point a client could publicly challenge a host to provide proof that they still have the data, and if the host fails to provide proof the bets would close in favor of the punters who betted against them. Otherwise, once the bets expire, they close the other way. This would all need to be built on top of a blockchain though you couldn't use bitcoin for this until/unless they add support for covenants or sidechains.
Protocol Labs (the guys behind IPFS) have been working on https://filecoin.io to address this exact concern – incentivising people via micropayments in their cryptocurrency (filecoin) to pin and seed files.
Yes it's really good to have the economics (mechanism design) and infrastructure in separate layers.
Also, I'd argue more important than even having the hosters is having the content addresses. We need well-known immutable data for people to want in the first place. And traditional system bury data under so much mutation/indirection that it's hard to know what that content is, or that content-addressing even exists.
I highly recommend https://www.softwareheritage.org/2020/07/09/intrinsic-vs-ext..., which is about software heritage trying to get the word out to the larger library/archival/standardization community that content addressing and other "intrinsic" identifiers are possible and desirable.
Git and torrents I think is the best counterexample to the above, and there is probably more legally-kosher git and bittorrent usage, so I am especially bullish on Git hashing being the bridge to a more distributed/federated world.
As a org supoporting a curated content-set, one could implement IPFS backed by S3 persistence (or some other cloud bucket). Possibly as an "open frontend" part of ones CDN infrastructure, where others can also freely "peer" and contribute CDN power.
The benefits for package management I believe can be pretty good, given widespread deployment. At our office, or hackerspace there are many computers which are likely to have the packages. Though until these things are enabled by default (can we ever get there?) I suspect only large IT departments or very interested people will set it up, unfortunately.
IPFS is a distributed CDN; but not very good for storing things persistently or reliably from my experience.
At the moment; the nixos cache is stored in very durable and reliable S3 storage; with very high durability guarantees. Why is that not good enough?
sure it's centralised. But IPFS doesn't offer distributed durability; it offers a CDN. It doesn't seem to address this issue the authors seem to claim it solves. (To me)
I still think it's _super cool_. And once builds are also content-addressed and not just fixed-output derivations; having trustless content delivery is a valuable addition. But IPFS doesnt' seem like a robust answer for the "what if we lose access to the source code problem" given it's not a durable storage system
In this case; projects like https://www.softwareheritage.org/ and https://sfconservancy.org/ seem better bets to solve the source code access issue