Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

10 hours ago we had a flood of network failures in us-central1 and saw no GCP status changes. We blindly attempted to mitigate in various ways (freezing HPAs because we thought that we were making excessive calls to external infra and getting throttled) and it resolved itself eventually. Maybe we were at fault the entire time but not seeing this issue exposed up on the GCP dashboard is infuriating.


At AWS, if the status changes then someone somewhere gets fired, so a lot of time incidents happen without being recorded on the status board. Maybe it’s the same issue with GCP, or maybe concern for their injured peers made everyone forget to update the status. I really hope the later.


> At AWS, if the status changes then someone somewhere gets fired, so a lot of time incidents happen without being recorded on the status board.

This can't be true, can it? What's the reason to lie, when the lie would be so incredibly obvious?


It's not true. The real answer is that execs don't want to pay the costs of slo violations. If the checkmark status green, who could say whether the service was down?


It used to be that one of the best things about working at Google was the "blameless postmortem". As long as you are able to learn from an incident and weren't attempting to look at private data, then you could write up a postmortem document and actually use that as part of a promotion packet. Google would loose a key part of its soul if they were to change that.


AWS has a process like that as well, it’s called a COE. GP is either misinformed or making stuff up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: