Why did sourcehut not take the offer to be added to the refresh exclusion list l...

ddevault · on Jan 9, 2023

For a number of reasons. For a start, what does disabling the cache refresh imply? Does it come with a degradation of service for Go users? If not, then why is it there at all? And if so, why should we accept a service degradation when the problem is clearly in the proxy's poor engineering and design?

Furthermore, we try to look past the tip of our own nose when it comes to these kinds of problems. We often reject solutions which are offered to SourceHut and SourceHut alone. This isn't the first time this principle has run into problems with the Go team; to this day pkg.go.dev does not work properly with SourceHut instances hosted elsewhere than git.sr.ht, or even GitLab instances like salsa.debian.org, because they hard-code the list of domains rather than looking for better solutions -- even though they were advised of several.

The proxy has caused problems for many service providers, and agreeing to have SourceHut removed from the refresh would not solve the problem for anyone else, and thus would not solve the problem. Some of these providers have been able to get in touch with the Go team and received this offer, but the process is not easily discovered and is poorly defined, and, again, comes with these implied service considerations. In the spirit of the Debian free software guidelines, we don't accept these kinds of solutions:

> The rights attached to the program must not depend on the program's being part of a Debian system. If the program is extracted from Debian and used or distributed without Debian but otherwise within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the Debian system.

Yes, being excluded from the refresh would reduce the traffic to our servers, likely with less impact for users. But it is clearly the wrong solution and we don't like wrong solutions. You would not be wrong to characterize this as somewhat ideologically motivated, but I think we've been very reasonable and the Go team has not -- at this point our move is, in my view, justified.

jlarocco · on Jan 9, 2023

Supposedly the operator of sourcehut has been banned from posting to the Go issue tracker: https://github.com/golang/go/issues/44577#issuecomment-11378...

So, obviously he's supposed to know the even more obscure and annoying method of opting out.

rakoo · on Jan 9, 2023

Why should you opt-in to a No-DDOS list ? Why is it not the default ?

psanford · on Jan 9, 2023

You are basically arguing that sr.ht is taking a "principled stand" against google. If that is what they are doing they should just say that and not pretend like there were no other options.

I'm ok with saying "google should do better!" But the compromise solution from the Go team seems reasonable to solve the immediate issue in a way that doesn't harm end users. The author should at least address why they have chosen the more extreme solution.

rakoo · on Jan 9, 2023

Or, we should not assume that Google's stance is correct, that sr.ht is expected to explain themselves. We should ask why is Google continuing on that path of DOSing upstream servers by default, not willing to use standards for properly using network resources, expecting all of them to do all the work.

EDIT: Moreover, sr.ht doing a workaround only for sr.ht, and lubar doing a workaround for lubar, etc... is not what Free Software is about. The point is that we're supposed to act as a community, for the betterment of the collective. Individualism is not a solution.

nickelpro · on Jan 10, 2023

The author addressed this by way of their by-line

xigoi · on Jan 9, 2023

IIRC, because that would only help the official SourceHut instance, not other instances.

fluidcruft · on Jan 9, 2023

I wondered that too, but then I wondered if that's what sourcehut have actually done. I didn't notice any details about how go module mirror will be blocked.

Wouldn't the effect on sourcehut users be identical?

psanford · on Jan 9, 2023

No. The Go team offered to add sourcehat to a list that would stop background refreshes. It would still allow fetches initiated by end users. The change sourcehat is making is to break end users unless they set some environment variables.

I've not seen any explanation about why the solution offered by the Go team was unacceptable. Its weird that that is completely left out of the blog post here.

cbrozefsky · on Jan 9, 2023

They could also just add the site to that list, or better yet, make it opt in for sites instead of slamming them with shitty workers, and shitty defaults.

You know, like be good neighbors and respectful of other people's resources, maybe read robots.txt and not make excuses for why you are writing shitty stateless workers that spam the rest of the community.

Tomte · on Jan 9, 2023

I think Google should DDoS noone. Not everyone until they opt out.

Spivak · on Jan 9, 2023

But it’s not Google DDoSing them, it’s every user downloading packages. Without the proxy it would just be millions of users hammering their servers.

Edit: Uh okay, if it's not user traffic then why wasn't the "don't background refresh" not an option?

Twirrim · on Jan 9, 2023

> Without the proxy it would just be millions of users hammering their servers.

Doing shallow clones, which are significantly cheaper.

Google is DDoSing them, by their service design. Why a full git clone, why not shallow? Why do they need to do a full git clone of a repository up to a hundred times an hour. It doesn't need that frequency of a refresh.

The likely answer is that the shared state to handle this isn't a trivial addition, it's a lot simpler to just build nodes that only maintain their own state. Instead of doing it on one node and sharing that state across the service, just have every node or small cluster of nodes do its own thing. You don't need to build shared state to run the service, so why bother? That's just needless complexity after all, and all you're costing is bandwidth, right?

That's barely okay laziness when you're interacting with your own stuff and have your own responsibility for scaling and consequences. Google notoriously doesn't let engineers know the cost of what they run, because engineers will over-optimise on the wrong things, but that also teaches them not to pay attention to things like the costs they inflict on other people.

It's unacceptable to act in this kind of fashion when you're accessing third parties. You have a responsibility as a consumer to consume in a sensible and considered fashion. Avoiding this means you're just not costing yourself money through your laziness, you're costing other people who don't have stupid deep pockets like Google.

This is just another way in which operating at big-tech-money scales blinds you to basic good practice (I say this as someone who has spent over a decade now working for big tech companies...)

jefftk · on Jan 9, 2023

> Google notoriously doesn't let engineers know the cost of what they run

Huh? I left a few months ago but there was a widely used and well known page for converting between various costs (compute, memory, engineer time, etc).

nazgulsenpai · on Jan 9, 2023

Per TFA > More importantly for SourceHut, the proxy will regularly fetch Go packages from their source repository to check for updates – independent of any user requests, such as running go get. These requests take the form of a complete git clone of the source repository, which is the most expensive kind of request for git.sr.ht to service. Additionally, these requests originate from many servers which do not coordinate with each other to reduce their workload. The frequency of these requests can be as high as ~2,500 per hour, often batched with up to a dozen clones at once, and are generally highly redundant: a single git repository can be fetched over 100 times per hour.

TUSF · on Jan 9, 2023

The issue isn't from user-initiated requests. It's from the hundreds of automatic refreshes that the proxy then performs over the course of the day and beyond. One person who was running a git server that hosts a Go repo only they use was hit with 4gb of traffic over the course of a few hours.

Thats a DDoS.

ddevault · on Jan 9, 2023

That's not how the proxy works. The proxy automatically refreshes its cache extremely aggressively and independently of user interactions. The actual traffic volume generated by users running go get is a minute fraction of the total traffic.

adql · on Jan 9, 2023

They wouldn't recommend users getting the data directly from them if the user traffic was the problem

politician · on Jan 9, 2023

sourcehut's recommendations seem absolutely reasonable: (1) obey the robots.txt, (2) do bare clones instead of full clones, (3) maintain a local cache.

I could build a system that did this in a week without any support from Google using existing open source tech. It's mind boggling that Google isn't honoring robots.txt, is requesting full clones, and isn't maintaining a local cache.

jdiff · on Jan 9, 2023

Despite the issue, I'm not convinced that Go doesn't do shallow fetches vs deep clones. Other issues (like Gentoo's issue with the proxy, I don't have a link handy sadly) point to fetches being done normally, not clones.

TeMPOraL · on Jan 9, 2023

It's not about what Go allows, it's about what Google's proxy does on its own schedule. If there was a knob sr.ht could use to change this, it would've come up in the two years since this issue was raised with the Go team.

captainmuon · on Jan 9, 2023

What does a local cache even mean at Googles scale though? Some of the cache nodes are likely closer to Sourcehuts servers than to Google HQ. I guess local would mean here that Google pays for the traffic. But then it is not a technical problem, but a "financial" one.

If you disregard the question who pays for a moment and only look at what makes sense for the "bits", the stateless architecture seems not so bad. Just a pity that in reality somebody else has to foot the bill.

politician · on Jan 9, 2023

Are you serious? Google Cloud Storage is a service that Google sells to folks using its cloud. If they can't use it for their own project, that would be shocking, no?

captainmuon · on Jan 10, 2023

They are probably already using something like GCS to store the data at the cache nodes.

I was not talking about how the nodes store data, but about a central cache. Purely architecture wise, it doesn't make sense to introduce a central storage that just mirrors Sourcehut (and all other Git repositories). Sourcehut is already that central storage. You would just create a detour.

It's also not an easy problem. If the cache nodes try to synchronize writes to the central cache, you are effectively linearizing the problem. Then you might as well just have the one central cache access Sourcehut etc. directly. But then of course you lose total throughput.

I guess the technically "correct" solution would be to put the cache right in front of the Sourcehut server.

politician · on Jan 10, 2023

Go's Proxy service is already a detour for reasons of trust mentioned in the article. They are in a position to return non-authentic modules if necessary (e.g. court order). That settles all architecture arguments about sources-of-record vs sources-of-truth. The proxy service is a source of truth.

If Google is going to blindly hammer something because they must have their Google scale shared nothing architecture pointed at some unfortunate website, then they should deploy a gitea instance that mirrors sr.ht to Google Cloud Storage, and hammer that.

It's unethical to foist the egress costs onto sr.ht when the solution is so so simple.

Some intern could get this going on GCP in their 20% time and then some manager could hook the billing up to the internal account.

meibo · on Jan 9, 2023

[flagged]

TeMPOraL · on Jan 9, 2023

Drew has some... strong opinions on some things, but a straight reading of the issue suggests he's being perfectly reasonable here, and it's Google who can't be arsed to implement a caching service correctly - instead, they're subjecting other servers to excessive traffic.

It's about the clearest example of bad engineering justified by "developer velocity" - developer time is indeed expensive relative to inefficiency for which you don't pay because you externalize it to your users. Clearest, because there are fewer parties affected in a larger way, so the costs are actually measurable.

I do have a dog in this, in a way, because as one of the paying users of sr.ht, I'm unhappy that Google's indifference is wasting sr.ht budget through bandwidth costs.

tylersmith · on Jan 9, 2023

> I didn't notice any details about how go module mirror will be blocked.

It says in the post they'll check the UserAgent for the Go proxy string and return a 429 code.

65a · on Jan 9, 2023

That's especially silly because 429 is a retriable error

TeMPOraL · on Jan 9, 2023

Sure, but with it, the biggest impact will likely be... spam in logs on the Google side. Short-circuiting a request from a specific user agent to a 429 error code is cheap, compared to performing a full git clone instead.