More

achiang · on March 3, 2024

I was a sysadmin (at uni, in the early 2000s) and I am an SRE today (at Google).

The two jobs are nothing alike, at all, whatsoever.

Sysadmins are support roles. Their functional role is to provide a healthy substrate to run the application layer on top of.

SREs work at the application layer itself. If the system can't scale due to internal architecture, an SRE would be expected to propose a new, scalable design. That would be in addition to maintaining the substrate.

To be clear, there is also nothing inferior about performing a support role. No org can succeed without support.

But the two roles are not the same, and if a job's set of responsibilities don't include shared ownership over application layer architecture, then it can be a great job but it's not an SRE role.

achiang · on Feb 21, 2024

It has been 10 years since I left Canonical (on good terms), but what popey describes (hi popey) about the intentional lack of human review in the Snap store sounds very Canonical to me.

I agree with all the recommendations - add human gates. Yes, it's expensive, but still far cheaper than the unbounded reputational damage that just occurred around the untrustworthiness of the store (hi Amazon).

popey · on Feb 21, 2024

Hi Alex!

achiang · on Sept 11, 2023

Full body scans are a common preventative measure in Taiwan.

My parents (expats, living in the US for over 50 years) flew back and got routine scans (MRI, PET, CT) in February for about $1000 USD total.

Similar to this story, they found a tumor on my dad's pancreas. A biopsy confirmed it, and he had surgery in August. They caught it at stage I. We're very lucky.

The latency from February til August was entirely convincing the US medical system to take his Taiwanese images seriously. They finally gave up and went back to Taiwan to get the procedure done.

I'm getting older myself and will absolutely be paying for any sort of imaging available.

This should be more broadly available to everyone. I'd be happy for more of my tax dollars to go to preventative care rather than rear guard action.

achiang · on Aug 3, 2023

Real world usage is you only get to use ~70% of the stated range on a road trip, so we're really talking about 350 miles of range, which is, as you say, what most people actually want.

Why 70%? You obviously don't run the battery to zero, 10% is a common amount of buffer to leave. And then when you DC fast charge, the rate of charging drops dramatically around 80%, so people don't charge to full.

These are for ideal conditions, add in any sort of weather and the range drops again as you run a heater, etc.

Living in the Bay Area, driving to Tahoe in the winter without a mandatory recharge should be the gold standard.

It's not an unusual use case, "only" about 180 miles, and yet there aren't any EVs that can do it confidently because going uphill in the cold with aerodynamic-destroying ski rack is really hard.

A car with 500 miles of fair-weather range could probably do it?

ytdytvhxgydvhh · on Aug 3, 2023

Well the Lucid Air with the big battery could probably make that drive no sweat, but that’s out of reach to almost everyone.

achiang · on Jan 23, 2023

You need to use the fully loaded cost of an employee when estimating opex savings, which includes health care costs, retirement funding, etc.

Rule of thumb is that fully loaded cost for US employees is approximately 2x yearly salary (although people who've actually run a company can correct my potentially stale or incorrect understanding).

phamilton4 · on Jan 23, 2023

Understand 2x, I was also looking across all the jobs at spotify. There are some that go as low as 70k salary, a majority seem to be 170k+. It was just for ease of calculation. even if we take 500k that's still 0.033% of revenue -_-

DeRock · on Jan 23, 2023

You aren't using percentages correctly, I think you mean 3.3% (and 1% in your original comment). Also, as pointed elsewhere, revenue is not really relevant, majority of that cash flows directly to artists/labels.

thehappypm · on Jan 23, 2023

Again, revenue is not really all that relevant for this kind of business. They’re a low margin business because they must pay huge bills to record labels. R&D doesn’t cut record label costs.

pc86 · on Jan 23, 2023

I'm having trouble understanding why 'revenue is not really all that relevant.'

If you're burning $40M a year, and you press a button to save $90M a year, yes your label costs are the same but how are you not now at +$50M a year?

thehappypm · on Jan 23, 2023

Payroll is a cost, so it’s more relevant to think about a layoff in terms of its impact to profit (your example) than revenue. Lots of companies sell at a tight margin, so tiny cost savings as a percent of revenue can be a big difference in profit.

CydeWeys · on Jan 23, 2023

2X is my understanding as well. Whatever you think an employee costs based on TC, double it to get the rough cost to the employer. Some other big employer costs related to employees you forgot include employment taxes, hardware/software expenses and licenses, and office space and related perks.

Also I suspect that $150k as the mean TC of those being let go is low. Spotify might be saving up to $500k all-in per employee let go.

hello_moto · on Jan 23, 2023

> Whatever you think an employee costs based on TC, double it to get the rough cost to the employer.

Hardware/Software expenses, office spaces, health insurance are fixed cost.

$150k employee vs 200k employee will have the same amount of fixed cost (assuming both are in the same function).

WastingMyTime89 · on Jan 23, 2023

> Also I suspect that $150k as the mean TC of those being let go is low.

Probably not low. It's an enormous salary for a developer outside of the Valley and Spotify has plenty of employees which are not in the USA.

pc86 · on Jan 23, 2023

I applied for an "entry-level" EM role with Spotify in 2021 and the base was 260-275, plus a generous bonus target and stock. TC would have been pushing $400k, for a fully remote USCAN-based role. I say that strictly as a calibration point - it's unlikely engineers are pushing half a million (maybe at the Staff+ level) but there's also not likely any engineers before $150k TC. I'd expect even mid-levels to be in the $200-225 ballpark but could be wrong.

I think $150k median is probably on the lower side of correct, but not enough to meaningfully impact any of the numbers anyone is discussing here. It's close enough.

lotsofpulp · on Jan 23, 2023

The stock price has more then halved since 2021, and based on their business model and history of profit, as an employee, I would not value the stock portion of compensation much.

As far as I can tell, Apple/Google/Amazon will always provide the ceiling price for how much Spotify can charge its customers, hence capping revenue, and the 3 record labels will always extract just enough to keep Spotify operating.

In a similar situation to Netflix, Spotify’s play would have to be to create their own content to lower their costs, but that is much easier said than done.

sarchertech · on Jan 23, 2023

That’s what they attempted with podcasts.

phamilton4 · on Jan 23, 2023

TBH I think the average Sr. Developer is ~130k in the US. Of course this varies so much depending on the role and company.

bumby · on Jan 23, 2023

The average software developer (not sure that it's pertinent to constrain it to "senior" devs) is $120k in the US. In San Francisco, the median is $161k [1]

[1] https://www.onetonline.org/link/summary/15-1252.00

CydeWeys · on Jan 23, 2023

Spotify pays significantly above the average, though. Not as high as the top-tier FANGs, but still high.

CydeWeys · on Jan 23, 2023

Levels.fyi supports me at least: https://www.levels.fyi/companies/spotify/salaries

phamilton4 · on Jan 23, 2023

I commented above, whether the salary is 150k, 250k, 500k... the impact doesn't change 0.1%, 0.2%, 0.33%. Not sure it really matters. Again I'm not defending lazy employees or saying employers are bad for doing this. I feel there is a hidden cost of layoffs from my own experience of being at a company doing round after round of layoffs.

CydeWeys · on Jan 23, 2023

Agreed it doesn't matter much in this case, but it is a common misconception that employees don't cost nearly as much to employers as they actually do, so I wanted to step in and correct that.

achiang · on Nov 29, 2021

Google SRE here.

Point of clarification on "losing skills" by not being oncall enough.

Google designs its SRE teams to scale sublinearly to the service, which means we're often responsible for entire portfolios, not just a single service.

It's common for individuals to be SMEs on a subset of the portfolio, as the focus of their coding projects.

Obviously the rest of the portfolio is also undergoing continuous change, and so to remain a broadly effective oncaller across the entire portfolio, you need regular production exposure to the parts you interact with less frequently.

Those are the specific bits that get rusty with disuse.

achiang · on Jan 17, 2021

I haven't seen anyone recommend net/http yet, but that is the library that was recommended to me to read to get a sense for modern, idiomatic go.

achiang · on Nov 11, 2020

As mentioned elsewhere, the worse bug was in upstream, unpatched gdm.

It's been a very long time since I worked on a distro (former Canonical employee here) but every distro carries patches of some sort.

jfindley · on Nov 11, 2020

I think there's a bit of a distinction between patches that add functionality not accepted by upstream (as here) vs the more typical distro patches which do things like replace bundled libs with shared ones, fix locations for things like TLS bundles, that sort of thing. These can still break things, but much less frequently.

I haven't been a distro packager in many years, but my recollection is that in other distros (debian, fedora, arch, etc) patches that add new functionality would generally not be considered okay unless accepted by upstream. I'd be interested to learn the rationale for not upstreaming this patch before including it.

zaarn · on Nov 11, 2020

Arch's policy is to minimize patches. I think the linux kernel runs on about 2-3 patches on average per release, most other packages aren't much more either. The policy "minimal patches & close-to-default" is in my experience usually a great one to avoid package maintainer issues.

Foxboron · on Nov 11, 2020

We don't have any policy. Sticking with upstream is a shared value between the packagers but it's important to note that we generally don't enforce any policy. Most packages has no patches. Usually it's regressions or security patches if there is anything.

Current linux release has one patch changing one default: https://github.com/archlinux/linux/commits/v5.9.8-arch1

nemetroid · on Nov 11, 2020

The wording on this page suggests that it is a policy:

https://wiki.archlinux.org/index.php/DeveloperWiki:Patching

As you say, the page notes that "[the] policy is intended to suggest, not to enforce", but having a policy is orthogonal to enforcing it.

Foxboron · on Nov 11, 2020

And as a packager for Arch the past 3 years: I had no clue this page existed. Evidently we are bad at these policy things. But I'd rather call them social norms then packaging policies.

zaarn · on Nov 11, 2020

Even unspoken or badly specified policies can be policies. Just unwritten ones in that case. I thoroughly enjoy this one.

krzyk · on Nov 11, 2020

Or why are other packages patched? Is it because it takes long to accept them at upstream?

But why is kernel is patched by distros at all? I run kernel from kernel.org always and don't see any issues.

zaarn · on Nov 11, 2020

On arch, patches are usually done to either customize the build version (the kernel is 5.9.xy-arch1 for example, that's the only patch), or to make them build with the newer compiler and libs present on the system. Additionally any patches necessary to make them work at all, though in my experience rare.

pknopf · on Nov 11, 2020

Yup. Even Arch.

galangalalgol · on Nov 11, 2020

What about slackware? Its been a couple decades but I used to get my kernel source from linux direct and never had problems.

ldarby · on Nov 12, 2020

With respect to upstream patching, Slackware is similar to Arch, it has a few, but generally tries to stick to upstream. Kernel is unpatched though.

achiang · on Sept 22, 2020

The semantic tagging is nice, I might start incorporating that into my notes.

On the overall topic of meeting notes, I picked this up as a new skill in the past year and it's been immensely valuable for myself and the people I meet with.

Specifically, I learned how to take realtime notes during the meeting while also listening and paying attention. It took practice but was achievable.

One key to success here is to explicitly ask for a helper to take notes while I speak. I've found this helps make the note taking seem like a whole team effort.

Colleagues have noticed and valued the notes and they do seem to lead to better meetings.

achiang · on Aug 31, 2020

Google networking SRE here (my team runs ns[1-4].google.com among other services).

Regardless of original intent, the blog doesn't land well with me. It could have provided the background on flowspec, using their own past outage as a case study, without any of the speculation or blameyness that came across here. The #hugops at the end reads quite disingenuously.

We see other networks break all the time and we often have pretty good guesses as to why. But I personally would never sign off on a public blog speculating on a WAG of why someone else's network went down. That's uncouth.

badrabbit · on Aug 31, 2020

I think you're stuck on the politics. Level3 is their competition but initially CF was blamed. CF owes it to their customers and investors to explain to them why they had an outage and how they responded to it, and they do not need talk in detail about an unrelated past incident (just because it was related to flowspec does not mean it was a similar outage), and they certainly should not wait for Level3's investigation.

I would expect Google to have a similar explanation if a significant number of GCP customers faced an outage.

You should know, it wasn't just someone else's network that went down, that network brought down a big chunk of the internet with it. I think technical honesty comes before political appearances. The #hugops and mention of their past experience with a flowspec outage is clearly there to signal that the blogpost is not there for blaming or making L3 look bad.

achiang · on Aug 31, 2020

The politics is exactly the point of my comment.

The professional way to write a blog post like this is from your own perspective. Identify the proximate cause (the peer), name names if you must, talk about how awesome your own systems are, show some of your monitoring if you like, and talk about what you'll do in the future to be even more resilient to this class of problems.

That's all to the good and much of Cloudflare's blog was exactly that. Would've been fine if they left it like that.

Acknowledging there is no postmortem (yet) but then pointlessly speculating about what it might contain is what I have a problem with.

I don't speak for Google but if I found out we had written a post like this, I would speak up and advocate to change it.

badrabbit · on Aug 31, 2020

There is nothing professional about avoiding a topic for the sake of appearances. Level3 put out details knowing others in the industry will discuss and speculate based on that information. They could have witheld details such as flowspec and edge routers bouncing but they did not, it's perfectly professional to discuss speculative details of someone elses outage that affected your customers based on details they chose to make public.

In infosec for example, it's extremely common to speculate about a vulnerability based on details in the CVE. Entire news articles are based on such speculation. Like I said, you are giving too much weight to optics and appearances. I would like to see anyone actually at Level3 complain about this post.

timcosta · on Aug 31, 2020

Honestly, I'm wondering if this blog is a response to https://web.archive.org/web/20200830171114/https://www.cnn.c... which has since been heavily modified, but was on the front page of CNN making it sound like Cloudflare was responsible if you only read the first bit.

badrabbit · on Aug 31, 2020

Could be, even yesterday the BBC was blaming CF at the text scroll thingy on their newscast

iJohnDoe · on Aug 31, 2020

Normally CF throws a lot of mud in these situations. Karma got them on their recent last outage.

While reading this I got impression they were genuinely trying to tone down the mud slinging they normally do while also trying to make it clear the outage wasn’t their fault. They just need more practice.

A provider as large as Cloudflare will always be impacted by other providers. Hopefully that point is clear to them now. The worst thing that can happen is they get a reputation they can’t play ball nicely and their peers and partners get tired of them. Service to their customers will erode over time because those peers and partners will screw with CF behind the scenes. It’s better to have friends than enemies with the type of business CF is in.

evntdrvn · on Aug 31, 2020

pretty common pattern for CF though.