> For example, gRPC automatically does periodic health checking so you never see...

nostrademons · on Aug 19, 2016

Application-level keepalives can be handy in other regards, eg. you can assert that your application's event loop isn't blocked on an infinite loop, deadlock, or blocking call in user-level code, and you can pass operational statistics back to your (presumably proprietary) monitoring system.

rdtsc · on Aug 19, 2016

I found in practice that is usually not enough and every distributed system I designed or worked on ended up with heartbeats of some point. It could OS peculiarities, or could be inability to tweak the times of keepalives.

Sometimes the heartbeats are sent by the server, sometimes by the client, it depends. But they always end up in the application layer protocol.

wumpus · on Aug 19, 2016

... because the TCP keepalive is a minimum of 2 hours, per RFC. Which is far too long, so everyone adds one at the application level.

dap · on Aug 19, 2016

The minimum default is 2 hours, but applications can configure this to much smaller intervals, like 10 seconds.

derefr · on Aug 20, 2016

I feel like putting heartbeats themselves into the application level is a layering violation. They go in the session or presentation layer. WebSockets does it right, with its own heartbeat frame type that gets handled in userland but outside of your app's code.

gricardo99 · on Aug 19, 2016

On Linux you can set tcp_keepalive_intvl and tcp_keepalive_probed to make this much shorter, but it's global to all sockets, so app keepalives are better for finer control, among other things mentioned.

noselasd · on Aug 19, 2016

There's 3 socket options (TCP_KEEPCNT/TCP_KEEPIDLE/TCP_KEEPINTVL) that allows you to control this per socket too, it's not just global.

Matthias247 · on Aug 19, 2016

I guess the HTTP/2 PING messages.

toast0 · on Aug 19, 2016

A connection hanging for two hours is awful close to hanging forever.