Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For example, gRPC automatically does periodic health checking so you never see a connection hang forever.

So, a simple setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE) call?



Application-level keepalives can be handy in other regards, eg. you can assert that your application's event loop isn't blocked on an infinite loop, deadlock, or blocking call in user-level code, and you can pass operational statistics back to your (presumably proprietary) monitoring system.


I found in practice that is usually not enough and every distributed system I designed or worked on ended up with heartbeats of some point. It could OS peculiarities, or could be inability to tweak the times of keepalives.

Sometimes the heartbeats are sent by the server, sometimes by the client, it depends. But they always end up in the application layer protocol.


... because the TCP keepalive is a minimum of 2 hours, per RFC. Which is far too long, so everyone adds one at the application level.


The minimum default is 2 hours, but applications can configure this to much smaller intervals, like 10 seconds.


I feel like putting heartbeats themselves into the application level is a layering violation. They go in the session or presentation layer. WebSockets does it right, with its own heartbeat frame type that gets handled in userland but outside of your app's code.


On Linux you can set tcp_keepalive_intvl and tcp_keepalive_probed to make this much shorter, but it's global to all sockets, so app keepalives are better for finer control, among other things mentioned.


There's 3 socket options (TCP_KEEPCNT/TCP_KEEPIDLE/TCP_KEEPINTVL) that allows you to control this per socket too, it's not just global.


I guess the HTTP/2 PING messages.


A connection hanging for two hours is awful close to hanging forever.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: