Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The reason it failed in prod was entirely unrelated to it being prod. The same could have happened in staging. IIRC, the error was entirely due to a RST packet from some external system during the upgrade. It was a bug in the upgrading system that should have been accounted for, had anyone known it existed. Identifying the root cause of the failure, was what took the most time. Had deployments been idempotent it also probably could have been resolved in moments as well ... but here we are, 15 years later with lots of lessons learned.


Sounds annoying, but seems like you found a bug in the upgrading system that could have struck anyone during any change?

The time/work to investigate and fix it probably wasn't considered (or shouldn't have been, at least) part of the work on the component you were changing - that was just delayed, same as it would be in scenarios like "Dave got hit by a bus and he's the only one with the prod password" and "Our CI service suddenly went out of business and we need to migrate everything".


My point is that you can't estimate time with any accuracy. At the end of the day, even this fix and shenanigans was still "easy" once we knew what was going on. The effort never changed and we would have been dead on. The issue is when trying to say, "It will take me two weeks to do this," and it actually takes two weeks -- there are simply too many unknowns for ANY task in our industry for us to actually be confident in that assessment.


Not to the day, but you can estimate a range based on experience. After that deployment issue you may add "release could be delayed by up to a week" to future estimates until you're sure it's fixed.

I've written TV apps and in that world I've often given estimates that are 5 days of actual work but, because Samsung's QA process can take 6 weeks and spurious rejections are common, "deployment" will often take literally months.

Time to release and time for development can be totally different things and it's arguable whether "waiting" time should be included in any individual estimate at all. (You're adding 4 separate features and doing 2 bugfixes in one release, which one gets +2 months? In reality "submit/release" becomes a different ticket/task.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: