I built a nasty "HA" solution for Jenkins out of Ansible, JJB, and Consul/Vault ...

I built a nasty "HA" solution for Jenkins out of Ansible, JJB, and Consul/Vault once.

Basically, Consul would monitor the Jenkins master for liveness. If it discovered that the master had gone down, it would spin up the cold standby machine, first attempting to use a recent disk snapshot and then by re-running plugin installation, JJB, and copying in a secrets store file from Vault (and essentially starting the server fresh again). Then all the slaves would self-configure using Consul to figure out which "master" node was actually master.

It was gross, and I hated it, but it gave us Jenkins failover in under a minute in most cases. We only lost all our job history once in a year and a half, and this was in a flaky-ass openstack environment.