Hacker News new | past | comments | ask | show | jobs | submit login

It always amuses me how people want reassurance that the next crisis will be a fresh, new problem, and not one the person can demonstrably solve.

A lot of 'lessons learned' analysis boils down to this: in order to prevent a recurrence of X, we introduced complex subsystem Y, the unexpected effects of which you can read about in our next post-mortem.




That's an overly cynical take, post-mortems are not for anyone's reassurance, they are a learning opportunity.

The airline industry is as safe as it is because every accident gets thoroughly investigated with detailed reports ("post-mortems") including what to do differently going forward. These are taken as gospel among all players in the industry and as a result, you very rarely see two different accidents caused by the same thing anymore.


That was entirely not what I was getting at and is a cheap shot that is well beneath you, especially because I suspect that you know that that wasn't what I was getting at.


My comment wasn't intended personally; your words about "will never recur" just reminded me of this peculiarity of software systems, where it's often error handling/monitoring/backups/etc. that cause cascading failures in the systems they're intended to safeguard.

I'm sorry if I misconstrued your meaning, but I am flattered that you think there are things beneath me!


Fair enough. I see the whole function of a postmortem in a very simple way: to avoid recurrence of the same fault. Yes, there will be plenty of new ones to make. But if you don't change your processes as the result of a failure you are almost certainly going to see a repeat because the bulk of the conditions are still the same. All it takes then is a minor regression and you're back to where you were before. This I've seen many times in practice and I suspect that Colin isn't immune to it. And yes, I look up to you, your writing is usually sharp and on point and has both amused me and educated me. So you have an image to live up to ;)


You'd love my team's recent postmortem, featuring the comment "action items haeve been copied from the previous postmortem".


Could be as simple as "test restore a new server every 1-2 years"




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: