Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lest somebody get the wrong idea from his post, note that he's not arguing to use poll on sockets that aren't non-blocking (i.e. without the O_NONBLOCK flag on the open file table entry).

When a socket polls for readiness in Unix, it does not mean that a subsequent read will succeed. The obvious case is when another thread reads from the socket before you do. A less obvious case is that some kernels, such as Linux, implement lazy checksum verification. Linux will wake up any waiting threads when a packet comes in (including marking an open file table entry as readable), but the checksum isn't verified until an actual read is attempted. If the checksum fails, the packet is silently discarded. If the socket wasn't in non-blocking mode, your application will stall until the next packet is received.

The JRE had (and maybe still has) a bug like this, where it assumed poll meant that a subsequent read was guaranteed to succeed or fail immediately.

This particular issue is less common today with checksum hardware offloading, but the correctness and robustness of your software probably shouldn't depend on particular network chipsets.

Another bug I've seen several times is assuming that a write to a UDP socket won't block. You can usually get away with this on Linux because the default buffers are so huge. As with the above issue, it really only shows when your application (and thus the network) is under significant load.

One conclusion I draw from this is that while people go to great lengths to implement a supposedly scalable architecture, most of the time developers never see the kinds of heavy load that such architectures are designed for. If they had, they would have discovered these sorts of issues. Fortunately or unfortunately for me, I discovered both of the above issues the hard way.

[1] If you're wondering why I kept writing "open file table entry" instead of descriptor, it's because they're not the same thing. And some day I expected a few CVEs to be issued related to overlooking such distinctions. For example, on BSDs /dev/fd/N duplicate a descriptor point to the same file table entry, just as dup(2) does. On Linux /dev/fd is a symlink to /proc/self/fd. /proc/self/fd creates a new file table entry. In the former case, software setting or unsetting O_NONBLOCK effects all other references to that entry.



> When a socket polls for readiness in Unix, it does not mean that a subsequent read will succeed.

Yikes, I didn't know Linux did that. That sounds like a serious spec violation to me. POSIX says:

> POLLIN

> Data other than high-priority data may be read without blocking.

http://pubs.opengroup.org/onlinepubs/009695399/functions/pol...

It's hard to interpret that other than as a promise not to block. Oh, and the Linux poll(2) man page doesn't even mention the caveat. The select man page does (I assume the actual behavior applies to poll too), but here POSIX is even more explicit:

> A descriptor shall be considered ready for reading when a call to an input function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. (The function might return data, an end-of-file indication, or an error other than one indicating that it is blocked, and in each of these cases the descriptor shall be considered ready for reading.)


There is more than one checksum. At layer 2, the checksum is its own thing. At layer three, a partial read means the checksum isn't necessarily here yet - assuming the checksum is relevant ( UDP makes checksums optional ).

IMO, you really need to make writes to a UDP socket explicitly nonblocking and check the error codes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: