Anyone who works on large complex systems will read this is and go "this was all...

frankc · on March 2, 2017

I agree with that in general but having your monitoring system be dependent on the thing it monitors is a pretty big goof. It possible that the dependency was very non-obvious and many layers deep, which is more understandable, but still...its pretty fundamental.

jedberg · on March 2, 2017

The monitoring system was not dependent on the thing it was monitoring.

The website that shows the public results of the monitoring, which is updates only by humans, depended on it.

paulddraper · on March 4, 2017

I'm, I don't understand.

My us-east-1 RSS feed said S3 had no incidents.

abraves10001 · on March 2, 2017

An important distinction but doesn't negate the ideas that their users should be in the dark because of something this stupid.

kakarot · on March 2, 2017

They have a twitter account for such incidents and used it appropriately. They did not slack in relaying the outage to customers, and between that and the fact that no S3 services were operating I think the message was pretty clear: "We fucked up, give us a couple hours"