I was a sysadmin (at uni, in the early 2000s) and I am an SRE today (at Google).
The two jobs are nothing alike, at all, whatsoever.
Sysadmins are support roles. Their functional role is to provide a healthy substrate to run the application layer on top of.
SREs work at the application layer itself. If the system can't scale due to internal architecture, an SRE would be expected to propose a new, scalable design. That would be in addition to maintaining the substrate.
To be clear, there is also nothing inferior about performing a support role. No org can succeed without support.
But the two roles are not the same, and if a job's set of responsibilities don't include shared ownership over application layer architecture, then it can be a great job but it's not an SRE role.
It has been 10 years since I left Canonical (on good terms), but what popey describes (hi popey) about the intentional lack of human review in the Snap store sounds very Canonical to me.
I agree with all the recommendations - add human gates. Yes, it's expensive, but still far cheaper than the unbounded reputational damage that just occurred around the untrustworthiness of the store (hi Amazon).
Full body scans are a common preventative measure in Taiwan.
My parents (expats, living in the US for over 50 years) flew back and got routine scans (MRI, PET, CT) in February for about $1000 USD total.
Similar to this story, they found a tumor on my dad's pancreas. A biopsy confirmed it, and he had surgery in August. They caught it at stage I. We're very lucky.
The latency from February til August was entirely convincing the US medical system to take his Taiwanese images seriously. They finally gave up and went back to Taiwan to get the procedure done.
I'm getting older myself and will absolutely be paying for any sort of imaging available.
This should be more broadly available to everyone. I'd be happy for more of my tax dollars to go to preventative care rather than rear guard action.
Real world usage is you only get to use ~70% of the stated range on a road trip, so we're really talking about 350 miles of range, which is, as you say, what most people actually want.
Why 70%? You obviously don't run the battery to zero, 10% is a common amount of buffer to leave. And then when you DC fast charge, the rate of charging drops dramatically around 80%, so people don't charge to full.
These are for ideal conditions, add in any sort of weather and the range drops again as you run a heater, etc.
Living in the Bay Area, driving to Tahoe in the winter without a mandatory recharge should be the gold standard.
It's not an unusual use case, "only" about 180 miles, and yet there aren't any EVs that can do it confidently because going uphill in the cold with aerodynamic-destroying ski rack is really hard.
A car with 500 miles of fair-weather range could probably do it?
You need to use the fully loaded cost of an employee when estimating opex savings, which includes health care costs, retirement funding, etc.
Rule of thumb is that fully loaded cost for US employees is approximately 2x yearly salary (although people who've actually run a company can correct my potentially stale or incorrect understanding).
Understand 2x, I was also looking across all the jobs at spotify. There are some that go as low as 70k salary, a majority seem to be 170k+. It was just for ease of calculation. even if we take 500k that's still 0.033% of revenue -_-
You aren't using percentages correctly, I think you mean 3.3% (and 1% in your original comment). Also, as pointed elsewhere, revenue is not really relevant, majority of that cash flows directly to artists/labels.
Again, revenue is not really all that relevant for this kind of business. They’re a low margin business because they must pay huge bills to record labels. R&D doesn’t cut record label costs.
Payroll is a cost, so it’s more relevant to think about a layoff in terms of its impact to profit (your example) than revenue. Lots of companies sell at a tight margin, so tiny cost savings as a percent of revenue can be a big difference in profit.
2X is my understanding as well. Whatever you think an employee costs based on TC, double it to get the rough cost to the employer. Some other big employer costs related to employees you forgot include employment taxes, hardware/software expenses and licenses, and office space and related perks.
Also I suspect that $150k as the mean TC of those being let go is low. Spotify might be saving up to $500k all-in per employee let go.
I applied for an "entry-level" EM role with Spotify in 2021 and the base was 260-275, plus a generous bonus target and stock. TC would have been pushing $400k, for a fully remote USCAN-based role. I say that strictly as a calibration point - it's unlikely engineers are pushing half a million (maybe at the Staff+ level) but there's also not likely any engineers before $150k TC. I'd expect even mid-levels to be in the $200-225 ballpark but could be wrong.
I think $150k median is probably on the lower side of correct, but not enough to meaningfully impact any of the numbers anyone is discussing here. It's close enough.
The stock price has more then halved since 2021, and based on their business model and history of profit, as an employee, I would not value the stock portion of compensation much.
As far as I can tell, Apple/Google/Amazon will always provide the ceiling price for how much Spotify can charge its customers, hence capping revenue, and the 3 record labels will always extract just enough to keep Spotify operating.
In a similar situation to Netflix, Spotify’s play would have to be to create their own content to lower their costs, but that is much easier said than done.
The average software developer (not sure that it's pertinent to constrain it to "senior" devs) is $120k in the US. In San Francisco, the median is $161k [1]
I commented above, whether the salary is 150k, 250k, 500k... the impact doesn't change 0.1%, 0.2%, 0.33%. Not sure it really matters. Again I'm not defending lazy employees or saying employers are bad for doing this. I feel there is a hidden cost of layoffs from my own experience of being at a company doing round after round of layoffs.
Agreed it doesn't matter much in this case, but it is a common misconception that employees don't cost nearly as much to employers as they actually do, so I wanted to step in and correct that.
Point of clarification on "losing skills" by not being oncall enough.
Google designs its SRE teams to scale sublinearly to the service, which means we're often responsible for entire portfolios, not just a single service.
It's common for individuals to be SMEs on a subset of the portfolio, as the focus of their coding projects.
Obviously the rest of the portfolio is also undergoing continuous change, and so to remain a broadly effective oncaller across the entire portfolio, you need regular production exposure to the parts you interact with less frequently.
Those are the specific bits that get rusty with disuse.
I think there's a bit of a distinction between patches that add functionality not accepted by upstream (as here) vs the more typical distro patches which do things like replace bundled libs with shared ones, fix locations for things like TLS bundles, that sort of thing. These can still break things, but much less frequently.
I haven't been a distro packager in many years, but my recollection is that in other distros (debian, fedora, arch, etc) patches that add new functionality would generally not be considered okay unless accepted by upstream. I'd be interested to learn the rationale for not upstreaming this patch before including it.
Arch's policy is to minimize patches. I think the linux kernel runs on about 2-3 patches on average per release, most other packages aren't much more either. The policy "minimal patches & close-to-default" is in my experience usually a great one to avoid package maintainer issues.
We don't have any policy. Sticking with upstream is a shared value between the packagers but it's important to note that we generally don't enforce any policy. Most packages has no patches. Usually it's regressions or security patches if there is anything.
And as a packager for Arch the past 3 years: I had no clue this page existed. Evidently we are bad at these policy things. But I'd rather call them social norms then packaging policies.
On arch, patches are usually done to either customize the build version (the kernel is 5.9.xy-arch1 for example, that's the only patch), or to make them build with the newer compiler and libs present on the system. Additionally any patches necessary to make them work at all, though in my experience rare.
The semantic tagging is nice, I might start incorporating that into my notes.
On the overall topic of meeting notes, I picked this up as a new skill in the past year and it's been immensely valuable for myself and the people I meet with.
Specifically, I learned how to take realtime notes during the meeting while also listening and paying attention. It took practice but was achievable.
One key to success here is to explicitly ask for a helper to take notes while I speak. I've found this helps make the note taking seem like a whole team effort.
Colleagues have noticed and valued the notes and they do seem to lead to better meetings.
Google networking SRE here (my team runs ns[1-4].google.com among other services).
Regardless of original intent, the blog doesn't land well with me. It could have provided the background on flowspec, using their own past outage as a case study, without any of the speculation or blameyness that came across here. The #hugops at the end reads quite disingenuously.
We see other networks break all the time and we often have pretty good guesses as to why. But I personally would never sign off on a public blog speculating on a WAG of why someone else's network went down. That's uncouth.
I think you're stuck on the politics. Level3 is their competition but initially CF was blamed. CF owes it to their customers and investors to explain to them why they had an outage and how they responded to it, and they do not need talk in detail about an unrelated past incident (just because it was related to flowspec does not mean it was a similar outage), and they certainly should not wait for Level3's investigation.
I would expect Google to have a similar explanation if a significant number of GCP customers faced an outage.
You should know, it wasn't just someone else's network that went down, that network brought down a big chunk of the internet with it. I think technical honesty comes before political appearances. The #hugops and mention of their past experience with a flowspec outage is clearly there to signal that the blogpost is not there for blaming or making L3 look bad.
The professional way to write a blog post like this is from your own perspective. Identify the proximate cause (the peer), name names if you must, talk about how awesome your own systems are, show some of your monitoring if you like, and talk about what you'll do in the future to be even more resilient to this class of problems.
That's all to the good and much of Cloudflare's blog was exactly that. Would've been fine if they left it like that.
Acknowledging there is no postmortem (yet) but then pointlessly speculating about what it might contain is what I have a problem with.
I don't speak for Google but if I found out we had written a post like this, I would speak up and advocate to change it.
There is nothing professional about avoiding a topic for the sake of appearances. Level3 put out details knowing others in the industry will discuss and speculate based on that information. They could have witheld details such as flowspec and edge routers bouncing but they did not, it's perfectly professional to discuss speculative details of someone elses outage that affected your customers based on details they chose to make public.
In infosec for example, it's extremely common to speculate about a vulnerability based on details in the CVE. Entire news articles are based on such speculation. Like I said, you are giving too much weight to optics and appearances. I would like to see anyone actually at Level3 complain about this post.
Honestly, I'm wondering if this blog is a response to https://web.archive.org/web/20200830171114/https://www.cnn.c... which has since been heavily modified, but was on the front page of CNN making it sound like Cloudflare was responsible if you only read the first bit.
Normally CF throws a lot of mud in these situations. Karma got them on their recent last outage.
While reading this I got impression they were genuinely trying to tone down the mud slinging they normally do while also trying to make it clear the outage wasn’t their fault. They just need more practice.
A provider as large as Cloudflare will always be impacted by other providers. Hopefully that point is clear to them now. The worst thing that can happen is they get a reputation they can’t play ball nicely and their peers and partners get tired of them. Service to their customers will erode over time because those peers and partners will screw with CF behind the scenes. It’s better to have friends than enemies with the type of business CF is in.
The two jobs are nothing alike, at all, whatsoever.
Sysadmins are support roles. Their functional role is to provide a healthy substrate to run the application layer on top of.
SREs work at the application layer itself. If the system can't scale due to internal architecture, an SRE would be expected to propose a new, scalable design. That would be in addition to maintaining the substrate.
To be clear, there is also nothing inferior about performing a support role. No org can succeed without support.
But the two roles are not the same, and if a job's set of responsibilities don't include shared ownership over application layer architecture, then it can be a great job but it's not an SRE role.