ECMP is used heavily in serving Anycast DNS worldwide; it's only the application to HTTP that's somewhat new. A high-end router can do stateless ECMP at 10s or 100s of millions of packets per second; it's hard to find load balancers that can compete with this.
BGP has heartbeats at usually 15 to 45 second intervals depending on configuration. If BIRD stops responding, the router will withdraw its routes.
BIRD can be controlled via a Unix socket. Usually people build a health-check daemon that does queries against the local app and communicates its finding to BIRD. Working through all of the failure modes here is tricky, but doable.
The ECMP hash is often implemented as something like a CRC-16 of (protocol, source IP/port, dest IP/port) modulo the number of next-hops. I suspect the trick to keep TCP happy is to try keep the number of next-hops (shards) constant for each route.
At least on most chips today, they use modulo-N hashing, which results in potentially breaking existing connections, see: http://tools.ietf.org/html/rfc2992
The reason that this isn't really a problem on the internet, is because you're typically not using ECMP, and just plain anycast.
BGP has heartbeats at usually 15 to 45 second intervals depending on configuration. If BIRD stops responding, the router will withdraw its routes.
BIRD can be controlled via a Unix socket. Usually people build a health-check daemon that does queries against the local app and communicates its finding to BIRD. Working through all of the failure modes here is tricky, but doable.
The ECMP hash is often implemented as something like a CRC-16 of (protocol, source IP/port, dest IP/port) modulo the number of next-hops. I suspect the trick to keep TCP happy is to try keep the number of next-hops (shards) constant for each route.