Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be clear: there is no failover happening, even though we do have a failover instance. If there was, we could at least detect that something had happened after the fact!

The Cloud SQL failover only occurs in certain circumstances, and in all our time using Cloud SQL the failover has not once kicked in automatically (despite many outages).

In fact, one of our earliest support issues was that the "manual failover" button was disabled when any sort of operation was occuring on the datbase, making it almost completely useless! Luckily this issue at least was fixed.



It sounds like the machine that the DB VM was running on was being taken offline or restarted for maintenance. The default behaviour for GCE is to live-migrate the VM to another machine. So I guess Cloud SQL uses the default here. (It may well be the best option as failover isn't instant either.) Live migration is usually much faster than 90s at well but if you are making heavy RAM updates that could definitely slow it down.

Either way I agree that some full-stack integration is needed on GCPs part to at least get that into the maitnaince log. It would also be nice to make most of these happen during the maitnaince window but IIUC they don't always have 24h notice of a machine reboot.


Yikes! That is alarming. I also don’t like the part about automated failover not working




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: