I am reminded of campaign by Paul O'Neill to improve worker safety at Alcoa. O'Neill was the first Treasury Secretary under George W. Bush -- to his credit, he got fired early on. I came across this in Charles Duhigg's book, The Power of Habit. It's summarized here:
In my DevOps experience, I've seen a lot of room for the application of his approach. Unfortunately, what often prevails is hurried workarounds and post-facto ass-covering.
Here's how O'Neill sums up his approach in the article above:
> He saw his safety goal as part of a broader emphasis on creating "habitual excellence."
> "I believe an organization has the potential for greatness if every person can say yes to three questions without reservation," Mr. O'Neill said. "The first is, 'Can I say every day I am treated with dignity and respect by everyone I encounter without respect to my pay grade, or my title, or my race, or ethnicity or religious beliefs or gender?' And you know, there are not a lot of places like that.
> "The second question is, 'Am I given the things I need - education, training, tools, encouragement - so I can make a contribution to this organization that gives meaning to my life?'
> And the third question is, 'Am I recognized for what I do by someone I care about?'
> "Every company I know of says somewhere in its annual report, 'People are our most important resource,' but my observation from all these places I had worked was that there was no evidence it was true."
This goes back to the recent findings that are leading to how to motivate employees better now instead of the tradition of compensation (staying under inflation for most workers while management bonuses trend upward, anyway) and stability / loyalty (which nobody has anymore culturally besides small companies perhaps). The new drivers to motivate modern workers is around three of their core desires: autonomy, mastery, purpose. With most corporations, workers get mostly none of these and so you get decreased engagement, lower productivity, etc. which translate into just plain worse outcomes despite the old fashioned idea that spending more money on people will correlate with better results. Start-ups are a clear antithesis to the usual corporate model, but I have almost no faith that managers at large corporations really look carefully at how small companies succeed because they tend to be suffering from being beaten down by the same cultural problems that led to them giving up on changing anything - even managers have little autonomy, mastery, and purpose relatively, after all.
I know we hate anecdotes here, but I left my last position despite feeling like I was respected (my opinions were taken into account somewhere) and provided certain tools and some recognition. The problem was that I was given responsibility with no authority to enact sufficient changes to improve conditions for those that did not feel the same.
I would hope this boils down to "The end of on-call, forever."
Boeing doesn't require an aeronautical engineer in the cockpit of every 787, because they engineer their product to be operable. Have smart people build your infrastructure so that junior people can operate it, or else you wind up with unstable, burned-out, formerly-smart-people.
I'd say, the blurring of work/off-work hours is one of the greatest coups of industry. Having someone on-call means they don't have to hire someone else to be there during "off-hours". I understand it adds costs --yes, pass them on [but someone else which cheap out and just have on-call and undersell us: pass laws to eliminate the choice.
I'm no fan of the industrial age with its "dehumanization"/abstraction of labor, but I do appreciate the hard delineation between work hours and non-work hours.
Not a coup. More like the continued ass kicking of labor by capital.
I had a leash (pager) in 1992 that was tugged often. The fur really flew if I didn't call back immediately. eg I almost got fired because I was in the shower at the gym (at 7:00am) and took 15 min to call back.
From what I can tell, since pretty much everyone I know has done a tour of duty at Amazon or equiv, things have only gotten worse.
"means they don't have to hire someone else"
Natch. But I see it more like someone else's failure to plan creates an (recurring) emergency for me.
You're kidding yourself if you think there's no one on-call to help with aircraft issues. The are in fact legions of aircraft controllers that fulfill the role of ops.
The only difference for a website is that no one but you and your customers care if it goes down, so it's up to you to provide the manpower (unless you outsource it to IaaS/PaaS providers).
Yes, 24-hour-industries need to staff 24-hours-a-day.
Note that air control towers are staffed with air traffic controllers, not aeronautical engineers. Also note that they're being paid to be there, and they don't ATC from their laptops in bed.
Go read "Site Reliability Engineering" by Beyer et al. That's the clearest description for how on-call should work, and the way to get from Point A to Point B without burning anyone out. On-call is almost always necessaey, but should be thought of as high-deductible insurance. If you need it, use it, but just realize it's cheaper to prevent the problem than to fix it when things go wrong. And if you keep using it unnecessarily, your carrier will drop you.
The first thing that jumped out at me in "Site Reliability Engineering" is that 1-2 incidents per on call shift is a reasonable amount. That's insane, unless you're only on-call twice a year.
The biggest problem I've seen is business priority in work scheduling of development. When the business lead doesn't really get[1] that developers wanting to fix a bug that crops up in production will allow them to sleep the night. Worse when developers and support are separate. Lord help you if the DBA team cannot be bothered to wake up.
I swear, next time someone says "its just needs a little manual intervention, no big deal", I might fly into a violent rage. Particularly if the cause is documented and just waiting a "sprint" to slot it into.
1) often because the development manager is trying to work their way up the ladder and is being a kiss-a$$ about functionality - especially fun in companies that like to move managers around to get them "broad experience" in the company
I see several high level goals, but nothing implementable or implemented. Am I missing something? Is there a tool or procedural system as an end goal here? The goals and mantra are definitely admirable, but how?
Implementation details are going to be an exercise for the organization. But if you start from the wrong guiding principles (or no principles at all), you're going to end up with a lot of avoidable mistakes and pointless middle-of-the-night bridge calls.
Excellently stated. I had a discussion on /r/syaadmin a few weeks ago about I take my role as a PM to an interesting place.
In addition to keeping the project and developers on task, I'm anal about making sure our agile tools are NOT a chore to use. They should feel as natural to interact with as a mouse and as nonobtrusive to the workflow as adding lines to a text file.
Principle here for us is to make our entire ops wheelhouse as frictionless as possible.
Someone should write a retrospective on the Twelve-Factor App. Have these guiding operational principles changed with the technology / available network bandwidth?
12 factor is hard because it's a guiding philosophy and has shades rather than just being there or not there. Each factor needs to be looked at for each app. It takes time and experience to actually see 12 factor violations as real problems. Actually implementing 12 factor in an application requires a lot more work than just importing a library or choosing a stack. Developers get more excited about features than they do about infrastructure and so don't spend as much time as they should thinking about making refinements.
At my current job, three microservices exist in one code repository. Because of this, continuous integration makes one build of all three services rather than three builds. The CI service doesn't provision for different environment variables per build, so the app has to take the config for the CI environment from the codebase. This is recognized as a problem, but not enough of one to get a unanimous consensus on doing the work to change it. I want to change it now, because I know from experience that it's better to rip the bandaid off of architectural problems because it's only going to get harder to fix as time goes on. Also because it affects my daily workflow in weird ways.
Not sure why this needs fancy branding. This sounds like what any ops organization is supposed to be focused on. I agree that there are plenty of places out there that approach ops poorly, but to me what they outline here doesn't sound like HumanOps it just sounds like the way you are supposed to approach ops in your engineering organization. All of the big name tech places I've worked at have had this attitude. I have interviewed at some other large tech places that do not deal with ops his way and I choose not to work for a company with what I view as a "broken" approach to operations.
It needs fancy branding only because the kind of people who generally prevent these basics from happening (read: management, who don't have to feel the pain of their decisions) also tend to latch on to Manifestos and Systems and Techniques and so on...that have fancy branding.
It's packaging up common-sense humane workplace policies in a way that people lacking common-sense or humane approaches can swallow.
http://www.post-gazette.com/business/businessnews/2012/05/13...
In my DevOps experience, I've seen a lot of room for the application of his approach. Unfortunately, what often prevails is hurried workarounds and post-facto ass-covering.
Here's how O'Neill sums up his approach in the article above:
> He saw his safety goal as part of a broader emphasis on creating "habitual excellence."
> "I believe an organization has the potential for greatness if every person can say yes to three questions without reservation," Mr. O'Neill said. "The first is, 'Can I say every day I am treated with dignity and respect by everyone I encounter without respect to my pay grade, or my title, or my race, or ethnicity or religious beliefs or gender?' And you know, there are not a lot of places like that.
> "The second question is, 'Am I given the things I need - education, training, tools, encouragement - so I can make a contribution to this organization that gives meaning to my life?'
> And the third question is, 'Am I recognized for what I do by someone I care about?'
> "Every company I know of says somewhere in its annual report, 'People are our most important resource,' but my observation from all these places I had worked was that there was no evidence it was true."