The other option is to rent a computer with lots of storage from hetzner / ionos...

duskwuff · on June 9, 2023

There's a hell of a lot more involved in hosting 400+ TB of data [1] than "renting a computer".

[1]: https://discourse.nixos.org/t/nixos-foundations-financial-su...

Palomides · on June 9, 2023

400TB of disk and a 100Gbe NIC in a single server is easily affordable hardware for a hobbyist, let alone any kind of organization

yeah, there will be maintenance and setup work, but we've (web software engineers) developed a weirdly extreme fear of self hosting, even in cases where we're being killed on transfer costs

ipsi · on June 9, 2023

Errr... What do you mean by "easily affordable"? A quick look on Amazon says that I'd be paying something like €18.80/TB, so buying 400TB would be €7,520 - far out of the range of the majority of hobbyists. Plus a 100GBE NIC would be another €800 or so, plus you'll need all the other ancillary things like a case and a processor and RAM and cabling and whatever the monthly cost to host it somewhere, unless your ISP gives you a 100Gbps connection (which I'm sure will be extortionately expensive if they do). End result is the hardware costs alone approaching €10,000 - significantly more if you want any sort of redundancy or backups (or are you planning to run the whole thing in RAID0 YOLO mode?), plus probably low-hundreds per month, not including the time investment needed to maintain it.

Palomides · on June 9, 2023

I mean, I spent about half that just on the computers in my apartment and I only have a 56Gbe LAN

I shouldn't say "easily affordable" like anyone can afford it, but rather something that you can do with off the shelf retail components (within the budget of a hobbyist software engineer, anyway)

it's cheap compared to hobbies like driving a sports car, at least!

klysm · on June 10, 2023

You seem to have seriously skewed notions of what’s considered a hobbyist

martin8412 · on June 9, 2023

Why? You can fit that in a single server. None of the data is unrecoverable. Most is binary cache for built packages and the ISOs can simply be rebuilt from what’s in git. They’re paying ridiculous amounts of money to serve easily rebuilt data.

bamfly · on June 9, 2023

Depending on how bad & frequent you want outages to be and whether you want the whole thing to go down for scheduled maintenance, it can get quite a bit more complicated. A pair of servers with some level of RAID is probably the minimum to achieve any amount of availability better than a server in someone's closet. Then you need a cold backup, on top of that, no matter how many live servers you have.

I've also had much better luck with connectivity, transfer speeds, and latency, for clients in some regions, with AWS than with some other hosts (e.g. Digital Ocean). It seems peering agreement quality really matters. Not sure how the big physical server hosts fare on that front vs. AWS (which is basically the gold standard, as far as I can tell) and it's hard to find such info without trying and seeing what happens. This matters if you're trying to serve a large audience over much of the Earth. The easy single-server solution might well see some clients lost 95% of their transfer speed from your service, or not be able to connect at all, compared with AWS.

viraptor · on June 9, 2023

The old binaries are not necessarily simple to rebuild. Projects move / go offline. Tags get moved. Binaries for the impure systems may not be public anymore. You can't easily rebuild everything from that cache.

wkdneidbwf · on June 9, 2023

this is an oversimplification that borders on absurdity.

dinvlad · on June 9, 2023

Not absurdity, it was seriously considered on Nix Discourse and Matrix. Hetzner offers very attractive options like CX line of dedicated VMs, three of which would be sufficient for this project. Alas, this still requires more consideration than the time available to switchover.

https://www.hetzner.com/dedicated-rootserver/matrix-sx

nijave · on June 9, 2023

S3 stores 3 separate copies across 3 AZs so for that redundancy (at 400TiB), wouldn't you need more like 6x 14x16TB servers if you're making a direct comparison?

dinvlad · on June 9, 2023

Yes, for redundancy it could be useful, but doesn’t change the affordability materially.

wkdneidbwf · on June 10, 2023

there is human cost associated with operating this that s3 and other fully managed object stores don’t have.

there is just no reality where you build your own service for this with better reliability, less management, and less maintenance than s3 (and when i say s3 i’m including all managed object stores).

pointing at hetzner and saying, “look how cheap this is”, is missing the point.

convolvatron · on June 9, 2023

_shouldn't_ it work that way?