Hacker News new | past | comments | ask | show | jobs | submit login

> So you think its unacceptable to have an SLA? That's a very common way of making explicit the amount of downtime the organization feels is acceptable.

Perhaps we mean different things by "acceptable". SLAs are a promise that downtime won't exceed certain levels. They are not a declaration that downtime is "acceptable", only that it's inevitable and is an attempt to characterize that inevitability.

What I mean is that when downtime happens, nobody at the company should be think "this is fine". They should be very concerned and engaging in urgent and speedy resolution to the problem.

The idea that a service is expecting and accepting downtime as part of normal operation and, even worse, as part of some sort of tradeoff with regards to developing new features is just bizarre and unacceptable to me.

It indicates a level of unconcern about customer needs and experience that renders the service untrustworthy.




But again, this just acknowledges reality. You only have a finite number of employees. If you aren't devoting all of them to reliability and stability, you're making a trade off with feature velocity.

Being aware of that trade off is more organizationally mature than not

> What I mean is that when downtime happens, nobody at the company should be think "this is fine". They should be very concerned and engaging in urgent and speedy resolution to the problem.

If you think this, you've entirely misunderstood. Error budgets aren't about outages when they happen. Individual outages should be dealt with quickly and without delay. But when making planning decisions for the next year or quarter, that's when error budgets matter.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: