> An extra 131 GB of bandwidth per download would have cost Steam several million dollars over the last two years
Nah, not even close. Let's guess and say there were about 15 million copies sold. 15M * 131GB is about 2M TB (2000 PB / 2 EB). At 30% mean utilisation, a 100Gb/s port will do 10 PB in a month, and at most IXPs that costs $2000-$3000/month. That makes it about $400k in bandwidth charges (I imagine 90%+ is peered or hosted inside ISPs, not via transit), and you could quite easily build a server that would push 100Gb/s of static objects for under $10k a pop.
It would surprise me if the total additional costs were over $1M, considering they already have their own CDN setup. One of the big cloud vendors would charge $100M just for the bandwidth, let alone the infrastructure to serve it, based on some quick calculation I've done (probably incorrectly) -- though interestingly, HN's fave non-cloud vendor Hetzner would only charge $2M :P
Isn't it a little reductive to look at basic infrastructure costs? I used Hetzner as a surrogate for the raw cost of bandwidth, plus overheads. If you need to serve data outside Europe, the budget tier of BunnyCDN is four times more expensive than Hetzner.
But you might be right - in a market where the price of the same good varies by two orders of magnitude, I could believe that even the nice vendors are charging a 400% markup.
Yea, I always laugh when folks talk about how expensive they claim bandwidth is for companies. Large “internet” companies are just paying a small monthly cost for transit at an IX. They arent paying $xx/gig ($1/gig) like the average consumer is. If you buy a 100gig port for $2k, it costs the same if you’re using 5 GB a day or 8 PB per day.
> I imagine 90%+ is peered or hosted inside ISPs, not via transit
How hosting inside ISPs function? Does ISP have to MITM? I heard similar claims for Netflix and other streaming media, like ISPs host/cache the data themselves. Do they have to have some agreement with Steam/Netflix?
Yea netflix will ship a server to an ISP (Cox, comcast, starlink, rogers, telus etc) so the customers of that ISP can access that server directly. It improves performance for those users and reduces the load on the ISP’s backbone/transit. Im guessing other large companies will do this as well.
A lot of people are using large distributed DNS servers like 8.8.8.8 or 1.1.1.1 and these cansometimes direct users to incorrect CDN servers, so EDNS was created to help with it. I always use 9.9.9.11 instead of 9.9.9.9 to hopefully help improve performance.
The CDN/content provider ships servers to the ISP which puts them into their network. The provider is just providing connectivity and not involved on a content-level, so no MITM etc needed.
Honestly, I'd give FUSE a second chance, you'd be surprised at how useful it can be -- after all, it's literally running in userland so you don't need to do anything funky with privileges. However, if I starting afresh on a similar project I'd probably be looking at using 9p2000.L instead.
I may be confusing two systems but I believe that AFS system was also encompassed the first iteration of “AWS Glacier” I encountered in the wild. A big storage that required queuing a job to a tape array or pinging an undergrad to manually load something for retrieval.
I know a lot of people who use it, in fact I'm one of them.
I have an @gmail.com account with about 20 years of stuff associated with it, from purchases to YouTube subscriptions, from calendars to GCP accounts.
However, I use a vanity email (me@somedomain.example) that everyone I know uses to get hold of me. Until about 10 years ago I could just forward emails but that slowly became unworkable as more and more stuff just broke due to SPF etc. So, I've been using POP pickup (and accepting the 5-30 minute delay) ever since.
As I understand it, I can't move all my gmail.com data into a GWork profile easily, and POP has worked for years. This is very frustrating.
From a network point of view, BitTorrent is horrendous. It has no way of knowing network topology which frequently means traffic flows from eyeball network to eyeball network for which there is no "cheap" path available (potentially causing congestion of transit ports affecting everyone) and no reliable way of forecasting where the traffic will come from making capacity planning a nightmare.
Additionally, as anyone who has tried to share an internet connection with someone heavily torrenting, the excessive number of connections means overall quality of non-torrent traffic on networks goes down.
Not to mention, of course, that BitTorrent has a significant stigma attached to it.
The answer would have been a squid cache box before, but https makes that very difficult as you would have to install mitm certs on all devices.
For container images, yes you have pull through registries etc, but not only are these non-trivial to setup (as a service and for each client) the cloud providers charge quite a lot for storage making it difficult to justify when not having a check "works just fine".
The Linux distros (and CPAN and texlive etc) have had mirror networks for years that partially addresses these problems, and there was an OpenCaching project running that could have helped, but it is not really sustainable for the wide variety of content that would be cached outside of video media or packages that only appear on caches hours after publishing.
BitTorrent might seem seductive, but it just moves the problem, it doesn't solve it.
> From a network point of view, BitTorrent is horrendous. It has no way of knowing network topology which frequently means traffic flows from eyeball network to eyeball network for which there is no "cheap" path available...
As a consumer, I pay the same for my data transfer regardless of the location of the endpoint though, and ISPs arrange peering accordingly. If this topology is common then I expect ISPs to adjust their arrangements to cater for it, just the same as any other topology.
Two eyeball networks (consumer/business ISPs) are unlikely to have large PNIs with each other across wide geographical areas to cover sudden bursts of traffic between them. They will, however, have substantial capacity to content networks (not just CDNs, but AWS/Google etc) which is what they will have built out.
BitTorrent turns fairly predictable "North/South" traffic where capacity can be planned in advance and handed off "hot potato" as quickly as possible, into what is essentially "East/West" with no clear consistency which would cause massive amounts of congestion and/or unused capacity as they have to carry it potentially over long distances they have not been used to, with no guarantee that this large flow will exist in a few weeks time.
If BitTorrent knew network topology, it could act smarter -- CDNs accept BGP feeds from carriers and ISPs so that they can steer the traffic, this isn't practical for BitTorrent!
> If BitTorrent knew network topology, it could act smarter -- CDNs accept BGP feeds from carriers and ISPs so that they can steer the traffic, this isn't practical for BitTorrent!
AFAIK this has been suggested a number of times, but has been refused out of fears of creating “islands” that carry distinct sets of chunks. It is, of course, an non-issue if you have a large number of fast seeds around the world (and if the tracker would give you those reliably instead of just a random set of peers!), but that really isn't what BT is optimized for in practice.
Exactly. As it happens, this is an area I'm working on right now -- instead of using a star topology (direct), or a mesh (BitTorrent), or a tree (explicitly configured CDN), to use an optimistic DAG. We'll see if it gets any traction.
bittorrent will make best use of what bandwidth is available. better think of it as a dynamic cdn which can seamlessly incorporate static cdn-nodes (see webseed).
it could surely be made to care for topology but imho handing that problem to congestion control and routing mechanisms in lower levels works good enough and should not be a problem.
> bittorrent will make best use of what bandwidth is available.
At the expense of other traffic. Do this experiment: find something large-ish to download over HTTP, perhaps an ISO or similar from Debian or FreeBSD. See what the speed is like, and try looking at a few websites.
Now have a large torrent active at the same time, and see how slow the HTTP download drops to, and how much slower the web is. Perhaps try a Twitch stream or YouTube video, and see how the quality suffers greatly and/or starts rebuffering.
Your HTTP download uses a single TCP connection, most websites will just use a single connection also (perhaps a few short-duration extra connections for js libraries on different domains etc). By comparison, BitTorrent will have dozens if not hundreds of connections open and so instead of sharing that connection in half (roughly) it is monopolising 95%+ of your connection.
The other main issue I forgot to mention is that on most cloud providers, downloading from the internet is free, uploading to the internet costs a lot... So not many on public cloud are going to want to start seeding torrents!
If your torrent client is having a negative effect on other traffic then use its bandwidth limiter.
You can also lower how many connections it makes, but I don't know anyone that's had need to change that. Could you show us which client defaults to connecting to hundreds of peers?
My example was to show locally what happens -- the ISP does not have control over how many connections you make. I'm saying that if you have X TCP connections for HTTP and 100X TCP connections for BitTorrent, the HTTP connections will be drowned out. Therefore, when the link at your ISP becomes congested, HTTP will be disproportionately affected.
For the second question, read the section on choking at https://deluge-torrent.org/userguide/bandwidthtweaking/ and Deluge appears to set the maximum number of connections per torrent of 120 with a global max of 250 (though I've seen 500+ in my brief searching, mostly for Transmission and other clients).
I'll admit a lot of my BitTorrent knowledge is dated (having last used it ~15 years ago) but the point remains: ISPs are built for "North-South" traffic, that is: To/From the customer and the networks with the content, not between customers, and certainly not between customers of differing ISPs.
Interesting... It's been ~15 years since I last used BitTorrent personally, and I had asked a friend before replying and they swore that all their traffic was TCP -- though perhaps that may be due to CGNAT or something similar causing that fallback scenario you describe.
Thanks for the info, and sorry for jumping to a conclusion! Though my original point stands: Residential ISPs are generally not built to handle BitTorrent traffic flows (customer to customer or customer to other-ISP-customer across large geographic areas) so the bursty nature would cause congestion much easier, and BitTorrent itself isn't really made for these kinds of scenarios where content changes on a daily basis. CDNs exist for a reason, even if they're not readily available at reasonable prices for projects like OP!
The number of connections isn’t relevant. A single connection can cause the same problem with enough traffic. Your bandwidth is not allocated on a per-connection basis.
If you download 2 separate files over HTTP, you'd expect each to get roughly 1/2 of the available bandwidth at the bottleneck.
With 1 HTTP connection downloading a file and 100 BitTorrent connections trying to download a file, all trying to compete, you'll find the HTTP throughput significantly reduced. It's how congestion control algorithms are designed: rough fairness per connection. That's why the first edition of BBR that Google released was unpopular, it stomped on other traffic.
I took my test nearly 25 years ago, and this was present then -- for the avoidance of doubt, the UK test has always been very thorough, though not quite as thorough as those in places like Finland where apparently they have skid pans and similar!
Makes sense that Finland has such things though, when the roads are covered in snow and ice for a lot of the year.
Though this year we did good in our capital: "Helsinki has not recorded a single traffic fatality in the past 12 months, city and police officials confirmed this week."
Seems like we were either side of a threshold - I took mine ~35 years ago and the only "theory" test was the examiner asking me three basic questions after the practical test, like "what can lead to skidding" to which the answer was "rapid acceleration, steering or braking". The theory side of things hardly existed essentially.
Same in Norway. Skid pans and also motorway driving. The course also includes a piece where the instructor picks a place an hour's drive away and tells the student to get there and demonstrate that they can not only drive under instruction but also plan their own route and react properly to challenges along the way.
Interestingly, I saw data from a road safety programme for young people that showed skid pan training actually made young men less safe not more, because they became even more overconfident about their ability to “react quickly” if bad things happened. Turns out that a bit of humility and slowing down are the main skills needed to avoid accidents!
That's true, on the other hand it made young women safer. This happened in Norway when the skid pan was made a compulsory part of the course a couple of decades ago and the insurance companies soon noticed an increase in reckless driving among young men but the opposite in young women.
The usual 'explanation' is that young men had fun on the skid pan and came to no harm there and wanted to continue having fun on the road while young women discovered how little control they had of the car and hence became more cautious.
Just watch as most libraries now update their go.mod to say 1.25, despite using no 1.25 features, meaning those who want to continue on 1.24 (which will still have patch releases for six months...) are forced to remain on older versions or jump through lots of hoops.
This is a common issue with Rust projects as well. At least with Rust you have the idea of "MSRV" (minimum supported rust version). I've never heard it discussed within Go's community.
There's no MSGV. Everyone pins the latest.
This also plagues dependencies. People pin to specific version (ie, 1.23) instead of the major version (at least 1.0 or at least 1.2, etc).
The "go x.yy" line in go.mod is supposed to be that MSGV, but `go mod init` will default it to the current version on creation. While you could have tooling like `cargo-msrv` to determine what that value would be optimal, the fact that only the latest two Go versions are supported means it's not particularly useful in most cases.
Now that I think about it more, when I've seen it happen before, it tends to be on projects that use dependabot / renovate. If any of those updates depend (directly or transitively) on a later version of Go, the go.mod would be bumped accordingly for them.
I have a vague feeling it was related to testcontainers or docker, and at the time that job's Go install was always at least 6 months behind. At least with recent Go, it'll switch to a later version that it downloads via the module proxy, that would have helped a lot back then :S
> Compared to 6+ intermediaries in a standard SWIFT payment
Huh? I think the message format in ISO20022 can only 3 intermediaries before having to resort to more complex mechanisms. Most SWIFT payments have 0 or 1 intermediaries.
I store my authorized_keys in DNS TXT records, that are DNSSEC signed, with a validating resolver on the box. I then just use "/usr/bin/hesinfo %u ssh" as my AuthorizedKeysCommand in OpenSSH.
I wrote a little tool that allowed you to "#include" other DNS records etc, but "hesinfo" is generally easily installable/available so it's just easier.
I know people do this, but I can't get my head around it. SSH is end-to-end secure even if the entire DNS hierarchy is corrupted. The DNSSEC PKI is controlled at its roots by governments, and one level of branches down by a set of companies not known for integrity and especially strong security practices. Why would you give any of these entities any influence over your authorized keys?
DNSSEC is top-down securing chain, DNSCrypt bottom-up. Each has their pros and cons. Relying on your government to keep you secure can be a valuable factor, depending on your threat model.
Ok, these are words, but again I'm not talking about DNS security here, I'm talking about SSH key distribution. Why would you elect to have your key distribution controlled by the DNS PKI? What's the upside? The downside is, an actor with control over the DNS PKI (there are many of those; see, for instance, every DOJ seizure of a domain) gets a degree of control over your SSH authorized keys. Seems... bad?
Agreed completely. The threat model for DNSSEC is vast; why would you diminish the security of a perfectly good end-to-end model in SSH keys or certificates, as long as you control 100% of that infrastructure.
Introducing any outside actors at all objectively diminishes the security of the whole model by becoming another link in the chain, even if that link isn't necessarily the weakest (and I certainly believe it would be, because both DNSSEC and the public Certificate Authority industry are object lessons for the abject failure of highly centralized global, government-wide, or even just company-wide security), but simply increasing any of the surface area is enough to decrease the security of the system.
You'd have to compromise not only the root keys just to get into my infrastructure, but sign all the zones from there downwards, and intercept all DNS traffic leaving that server in the first place. That's secure enough for me.
This method puts a lot of reliance on DNSSEC working and trusting that it is preventing spoofing. I personally wouldn't rely on this in production, there are too many stories about DNSSEC cutovers rendering the domain unresolvable for hours+. Imagine not being able to get to your servers too...!
My DNS zones are not hosted on those servers, Google Cloud DNS does the dnssec signing, and there is a breakglass key installed on there too that when used automatically sends alerts out.
And I may well do. But it's probably not the best idea to do this on a larger scale, there are valid reasons why this is not a good thing to recommend -- if you miss one part (DNSSEC signing, or running a local validating resolver) you can end up with a vulnerable system.
There is technically one minor one, which really isn't one - but you should be aware of.
Someone can take your authorized keys and add them to a box they control, and trick you into logging in.
However, this would trigger the "new host" warning SSH gives you, and you can minimize this by minimizing which hosts you allow your private keys to be used on.
And if someone is so actively trying to attack you they probably have more direct methods available.
What? How? What does putting my authorized keys file on another host do in terms of tricking me to log in? Authorized keys only matters on the host you are using when you type `ssh <some.host>`. The ssh client compares the public key of the remote host to the list in your `authorized_keys` file and, only if there is a match, skips serving you TOFU.
EDIT: I mixed up authorized_keys and known_hosts. But, the remote server doesn't need your authorized_keys file to grant you access so not sure the visibility of authorized_keys matters.
Unless your keys are generated with low entropy (like the Debian CVE-2008-0166), publishing the public key file should not be an issue; that's from a cryptographic pov.
& as bombcar said, obviously if you ignore "unknown host" warnings, you can be tricked into logging into an attacker-controlled machine.
Often key files also contain "user@host" for the user&host the key was generated by&on. This identifier is then leaked, and you might want to avoid that. On my personal (and very objective! /s) paranoia scale this a 8/10. I'd definitely point it out to a customer during a pentest, but wouldn't really care if they "fixed" this (most of the time there is a lot of stuff that's more serious than knowing that the devops person is 'bro2000@jims-laptop').
A public key can be an identifier. Depending on what other information you share online or if a key is reused on GitHub it can be used to identify other places you visit and other artifacts associated with you. If your goal is anonymity then keeping you public keys secret or not attached to other personal information like a domain name is ideal. Almost all registrars are going to have a way to reveal your true identity.
Good luck with that. DNS TXT records are used for a lot of infrastructure right now, from DMARC/SPF/DKIM to DNS-01 validation through LetsEncrypt afaik.
1) You don't have to ssh-copy-id to new boxes, which is nice.
2) You can de-auth a key for all machines by changing the DNS record. This would depend on some propagation time but perhaps you can point the resolver at your nameserver directly which would avoid that.
You could simply choose a short TTL, or your tool could check for e.g. "some-name._sshkeys.whatever.tld" as well as "_revoked.some-name._sshkeys.whatever.tld" to handle revocation instantly
Now you've just moved your authentication to the SSL PKI.
In that case, use the SSL certs directly. You'd have add support OpenSSH of course, or just convert the certificates to SSH format, but it would be architecturally much simpler.
As to the original question here, the benefit compared to other PKI alternatives (including the SSH PKI in the original question) is that revocation is much easier.
Nah, not even close. Let's guess and say there were about 15 million copies sold. 15M * 131GB is about 2M TB (2000 PB / 2 EB). At 30% mean utilisation, a 100Gb/s port will do 10 PB in a month, and at most IXPs that costs $2000-$3000/month. That makes it about $400k in bandwidth charges (I imagine 90%+ is peered or hosted inside ISPs, not via transit), and you could quite easily build a server that would push 100Gb/s of static objects for under $10k a pop.
It would surprise me if the total additional costs were over $1M, considering they already have their own CDN setup. One of the big cloud vendors would charge $100M just for the bandwidth, let alone the infrastructure to serve it, based on some quick calculation I've done (probably incorrectly) -- though interestingly, HN's fave non-cloud vendor Hetzner would only charge $2M :P