TCP is an underspecified two-node consensus algorithm

nh2 · on March 19, 2018

shutdown() and half-closes are not "archaic" features.

You need them to get even the basic stuff right (see https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-...) and you need it even more to implement "modern" application layer protocols like HTTP2 (if you don't use it, you get data loss bugs like this: https://trac.nginx.org/nginx/ticket/1250#comment:4).

lkarsten · on March 19, 2018

I applaud the effort to hate on "smart" middleware proxies!

That being said, author gets no points for namedropping random distributed systems algorithms and using tcp keepalives (2 hours minimum!) as an argument against TLS terminating proxies.

Is there a reason to (as he says) "fully implement the protocol" in the proxy? I battled with websockets through Pound last week, and it simply doesn't work because the author took a non-postel stand on protocol specifics.

Having a protocol agnostic proxy like hitch (previously stud) fixed that without losing functionality, and I expect it to age better as well.

zaarn · on March 19, 2018

Sadly the internet isn't as nice. I've been to various hotels where the router would silently drop keep-alive packets (but god forbid informing the packet layer you do this!) and mangled DNS packets ("looking for mail.example.com? Here is the answer for example.com" and "looking for doesnotexist.com? Here is result for internalsearchengine.com which redirects you to a sponsored search page with ads")

pixl97 · on March 19, 2018

And this is why encrypted DNS is a must.

zaarn · on March 19, 2018

Even encrypted DNS will suffer because middleboxes with captive portals will attempt to tamper it.

Unless you pipe it over TLS or HTTP in which case you run into problems with not knowing why there is no connection in a captive portal (we obviously need to fix captive portals, they're source of 90% of problems)

geofft · on March 19, 2018

I learned recently about RFC 7710 which specifies a DHCP (v4/v6) and RA option for "You're on a captive portal, here's the website you should visit before you get access": https://tools.ietf.org/html/rfc7710

Do any of the major implementations of captive portals support it?

tiagod · on March 19, 2018

RFC 7710 is only a proposed standard.

It has been implemented on a few captive portal solutions[1] but as far as I know it is not understood by any client software.

[1] https://github.com/coova/coova-chilli/pull/274

vertex-four · on March 19, 2018

RFCs are never anything other than proposed standards, or informational documents. There is no point at which an RFC becomes a “recommended standard” or some such thing.

All internet standards, from IP to HTTP, are proposed standards, it doesn’t actually mean anything about whether they’re generally implemented or not.

mwcremer · on March 20, 2018

Actually, they do become standards:

https://www.rfc-editor.org/standards#IS

vertex-four · on March 20, 2018

OH yeah, that started happening recently, I forgot about that.

zaarn · on March 19, 2018

Not that I know. My pfSense firewall doesn't have it (IIRC), so my guess would be that poorly maintained router boxes in a hotel basement definitely don't have it.

I'm not sure if the various DHCP clients communicate this properly to the OS or browser even (I wouldn't know how to query for it on Linux)

geofft · on March 19, 2018

If you're using NetworkManager you can get DHCP options by being mildly angry at the D-Bus API:

    $ python3
    >>> import dbus
    >>> bus = dbus.SystemBus()
    >>> nm = bus.get_object("org.freedesktop.NetworkManager", "/org/freedesktop/NetworkManager")
    >>> conn = bus.get_object("org.freedesktop.NetworkManager", nm.Get("org.freedesktop.NetworkManager", "PrimaryConnection", dbus_interface="org.freedesktop.DBus.Properties"))
    >>> dhcp = bus.get_object("org.freedesktop.NetworkManager", conn.Get("org.freedesktop.NetworkManager.Connection.Active", "Dhcp4Config", dbus_interface="org.freedesktop.DBus.Properties"))
    >>> options = dhcp.Get("org.freedesktop.NetworkManager.DHCP4Config", "Options", dbus_interface="org.freedesktop.DBus.Properties")
    >>> str(options["subnet_mask"])
    '255.255.255.240'

I guess you can parse /var/lib/dhcp/dhclient.*.leases otherwise?

gtirloni · on March 19, 2018

For reference: https://en.wikipedia.org/wiki/Robustness_principle

"TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others." -- Jon Postel

jclulow · on March 19, 2018

If more software would take an apostel stand, I feel we would have net fewer interoperability problems.

bringtheaction · on March 19, 2018

By that you mean if more software was more strict about what it accepted instead of being liberal about it?

jclulow · on March 20, 2018

Yes, that.

teknopaul · on March 19, 2018

I dont grok this, if tcp's model has fundamental problems how come the Internet works. :)

The fact that a protocol technically is not perfect and causes jip for isps does not mean the application layer has to get involved.

I've been writing tcp based apps for years and the stream abstraction has never failed me. After reading this I dont see why I should change that assumption? I have to rebuild connections occasionally but its never cost my application so much that an alternative more complicated abstraction layer made sense. I usually write req/response over tcp, an even more inaccurate abstraction. Occasionally nonblocking code. Never have I wanted more complexity than nio in my application layer.

Devs do know that "tcp is not a stream of bytes" but deliberately do not want to get app code involved.

nvarsj · on March 19, 2018

> I dont grok this, if tcp's model has fundamental problems how come the Internet works. :)

Depends on your definition of "works"! Just ask any gamer what happens when someone starts using netflix while they're playing a game... For a lot of people the internet doesn't work very well.

TFA is describing, I think, the issue of stale / dead connections - which can be notoriously difficult to detect. Imagine you are building black boxes that sit in peoples' homes, with a long running TCP connection to push notifications to. With the state of the internet, it's very likely some percentage of those connections will die without either side realizing it. That's where the complexity comes in.

bringtheaction · on March 19, 2018

Off-topic on my part here but I think it is so weird that people say “TFA” when referring to the article in the way you are doing. I see this a lot on HN.

IMO it would be natural to use “TFA” in the same way you would use “RTFM” — when you are scolding someone for not having read what they should — but using “TFA” as just a synonym for “the article”... that always throws me off.

steveklabnik · on March 19, 2018

You’re both right and wrong. That is, the... anger? you’re picking up isn’t incorrect, but instead lost in translation.

Back in the /. days, which, as far as I know, is where this comes from, people would use it in this way. “Read TFA.” “Have you read TFA?” “Seems like you haven’t read TFA.” Eventually, that morphed into the usage you see on HN and everything else, as kind of a different version of OP.

One useage grows, another dies off.

xargon7 · on March 19, 2018

In a nice tone, it's "The Featured Article."

bringtheaction · on March 19, 2018

Ah, clever, I like that :) Thanks!

bb88 · on March 19, 2018

> Just ask any gamer what happens when someone starts using netflix while they're playing a game.

Every cable modem I have had suffered from some form of bufferbloat. In short it's not TCP's fault that your head shot packet sat in a cable modem's buffer for 5 seconds before being sent to the server.

Edited to add:

> TFA is describing, I think, the issue of stale / dead connections - which can be notoriously difficult to detect.

Normally all long lived connections have some kind of TCP keep alive to them. When the TCP keep alive fails, the connection dies, and it's up to the application to restart the TCP connection. The proxy can send these out as well.

jonathanstrange · on March 19, 2018

I remember that back in the 90s it was very hard for me on MacOS to debug network applications when the connection was interrupted or lost due to a crash or a test. It took minutes until the same port could be used again, even when another local port was chosen. To detect a lost connection people used ping-pong signals in their application protocol.

Since then network stacks seem to have improved tremendously but I thought it's still customary to ping the other side from time to time. Is that not enough to solve the problem?

ianai · on March 19, 2018

As a lover of the ICMP ping, hosts which do not respond to ping are increasingly prevalent. I’m sure it’s enough to wreck that usage.

jonathanstrange · on March 19, 2018

Sorry I wasn't specific enough, I didn't mean ICMP ping but some application protocol specific periodic request-acknowledgment exchange within a TCP connection in order to check that the connection is still alive regardless of what the OS socket layer says. Many application protocols have this, at least older ones, and I always thought the reason for it is that OS TCP stacks are sometimes unreliable or have unreasonable timeouts.

It's been a long time I've done network programming, though...

zaarn · on March 19, 2018

On IPv4, blocking ICMP has sadly become the norm since some people abused it for ICMP storms.

On IPv6 it is strongly recommended not to block ICMPv6 since it routes highly important information back to the sender or forwards to the receiver (like when MTU is reached)

zrm · on March 19, 2018

> On IPv6 it is strongly recommended not to block ICMPv6 since it routes highly important information back to the sender or forwards to the receiver (like when MTU is reached)

The same is true for IPv4. Path MTU discovery is not new for IPv6. Most systems set the DF (don't fragment) bit by default.

zaarn · on March 19, 2018

On IPv4 it's a bit of a problem because of the number of legacy middleboxes that block things they don't know, some olders routers have problems with DF (always fun to debug).

IPv6 makes ICMP path MTU discovery part of the protocol and disabling ICMP can (and will) wreak havoc on reliable connections (though TCP can compensate)

zrm · on March 19, 2018

> On IPv4 it's a bit of a problem because of the number of legacy middleboxes that block things they don't know, some olders routers have problems with DF (always fun to debug).

The DF bit has been part of IPv4 since day one. There is no excuse for anything in production to not support it. That doesn't mean broken routers don't exist, but "obviously broken router is obviously broken" is properly resolved by removing it from service and possibly finding a suitable museum to put it in.

That isn't the issue. IPv6 doesn't have a DF bit (it's always implied), but sending an IPv4 packet with DF set has the same effect, and most modern systems do that. Most routers will then correctly send back the "Packet Too Big" ICMP message as appropriate. But if the firewall drops that ICMP message, it's just as problematic as doing it for IPv6. The result is the same.

teknopaul · on March 19, 2018

OK now it makes sense. If you presume everything must go over https and sometimes you have little to no activity and want to back push events on a stream started by a client, tcp seems horrible! :) Its hard trying to push in the wrong direction, not having udp when its needed. And not having stable unNATed IPs because addresses have run out. Not sure that is TCPs fault tho. Arguably that is www + ip problem. More and more HTTPS is the only Internet allowed. This is bad IMHO and leads to stupid hacks like websockets. Gamers used to open port in their routers in my day and would receive a performance penalty if rhey did not.

anon9823675 · on March 19, 2018

> Just ask any gamer what happens when someone starts using netflix while they're playing a game...

Enable fq_codel on the bottleneck device.

est · on March 19, 2018

The internet largely doesn't work anymore. Only port 80/443 traffic works.

You can't even send to random SMTP port 25 these days. You will get blocked pretty fast.

dorfsmay · on March 19, 2018

Only mail servers should be talking on port 25. Clients should be using 587 to talk to servers.

I've never had an issue with any ISP nor VPS provider with port 25. You can't expect hotels and inflight internet providers to allow you to run a mail server inside their perimeter, they do block 25, but I have never had a problem with 587.

zaarn · on March 19, 2018

Problems with 587 usually come from Snakeoil^w AntiVirus software that wants to intercept your email traffic to scan for potential malware.

Obviously it doesn't have the certificate so some bruteforce it by only allowing plaintext. On 587.

0xffff2 · on March 19, 2018

Every home ISP I've had in the last 10 years also blocks 25. Why shouldn't I be allowed to run a mail server from my home?

dorfsmay · on March 20, 2018

This hasn't been my experience. I'm assuming you've check the FW rules in their modem/router?

0xffff2 · on March 20, 2018

More to the point, I've talked to second line tech support agents who have told me that they block all incoming connections on port 25 as a spam prevention measure.

anon9823675 · on March 19, 2018

My irc bnc, dns, ssh and bittorrent servers are humming along just fine.

megous · on March 19, 2018

Where that happens?

bb88 · on March 19, 2018

> I dont grok this, if tcp's model has fundamental problems how come the Internet works. :)

So he's arguing about using a proxy to "shield" the server from knowing about all the ins and outs of the myriad of TCP clients out there.

And then he laments that an HTTP proxy might very well be the root of all evil on the internet because the true client to server end-to-end connection is lost when a proxy is put in the middle, which is pretty standard practice these days.

So one thing that he's incorrect on is that a proxy can know the state of connections, because that's in the TCP packet, but not in the data section [1]. The SYN/SYNACK/ACK hand shake and data retry happens outside of the data section of the TCP packet.

[1] https://en.wikipedia.org/wiki/Transmission_Control_Protocol

kenforthewin · on March 19, 2018

I suppose you could call it a two-node consensus algorithm, the same way plugging a flash drive into your laptop is. Even after reading, I don't see the benefit of viewing TCP this way.

tonyarkles · on March 19, 2018

It's a layer under the "it's a bidirectional stream" abstraction. Having had to debug gnarly TCP problems before (generally with semi-"intelligent" load balancers), the two-node consensus concept is good to keep in mind.

Right now the "two-node consensus algorithm" idea has got me thinking more about some of the nasty problems I've debugged and how to look at them through that lens, especially considering partial partition tolerance. "Under what circumstances can the TCP state machine on either end lose consensus, and how will/won't it recover?"

0xbadcafebee · on March 19, 2018

The problem is in thinking of an HTTPS request-response through proxies as a single TCP connection. It isn't.

A TLS proxy is not a normal part of a layered TCP/IP connection. It's literally in the name: "terminating" proxy. It stops the connection right there. Anything after the TLS proxy is outside the scope of the initial connection. Applications have to be engineered to pass on data from one connection to another.

An example is stateful firewalls. Almost all stateful firewalls are NAT gateways with rules. NAT gateways are designed to pass certain things from one connection to another, but they are not simply unwrapping a layer from a connection and passing it on: they maintain separate connections. edit Apparently I'm wrong, as Netfiler apparently only defragments and then changes addresses and ports, but firewall vendors basically keep independent connections (for security reasons)

TCP is specified just fine for consensus on a single TCP connection. It isn't specified for an HTTPS connection through middleware. Hence, such middleware is complicated.

VMG · on March 19, 2018

title is borked