What happens if you write a TCP stack in Python?

tptacek · on Aug 12, 2014

The idea that Python is so slow that it's confusing TCP sounds wrong to me. I think it's more likely that your packet capture scheme is slow. It looks like you're using scapy, which I assume is in turn using libpcap... which may be buffering (in high-performance network monitoring, the packet capture interface goes out of its way to buffer). Which is something you can turn off.

About 13 years ago, I wrote my own programming language expressly for the purpose of implementing network stacks, and had a complete TCP in it; I didn't have this problem. But I do have this problem all the time when I write direct network code and forget about buffering.

Similarly: "Python is so slow that Google reset the network connection" seems a bit unlikely too. Google, and TCP in general, deals with slower, less reliable senders than you. :)

What's the time between your SYN and their SYN+ACK?

jvns · on Aug 12, 2014

my favorite thing about writing blog posts is comments like this. Thank you! I didn't consider that the packet capture interface might do buffering. That might explain a lot of the problems I was having :)

tptacek · on Aug 12, 2014

This is a great project. Keep playing with it! You might find that the serverside of TCP is more useful to have in Python than the clientside (having a userland IP/TCP serverside allows you to create fake hosts out of thin air).

danudey · on Aug 12, 2014

This sounds super handy for situations like honeypots. Take a /24, assign the hosts you actually have, and then spoof the rest to another server which fakes open SMTP, HTTPS, etc. connections, or which replies to every connection attempt (slowly!) with a successful (if slow!) open.

There's an iptables module called tarpit[1], which takes advantage of some peculiarities of the TCP protocol to essentially prevent the remote host from closing the connection, which can force (conforming?) TCP clients to take 12-24 minutes to timeout every connection. It can make portscanning unpleasantly expensive and time-consuming.

[1] http://linux.die.net/man/8/iptables

milkshakes · on Aug 12, 2014

this is in fact exactly how honeyd works: http://www.honeyd.org/

zerop · on Aug 13, 2014

Thanks for mentioning Honeyd. Are there any more software like this.

blutoot · on Aug 12, 2014

Your "userland TCP/IP" comment got me excited and I started digging around. I wonder if there's some benefit in trying to port something like TCP Daytona [1,2] in a high-level language like Python. I'm curious if that will somehow advance the adoption of SDNs.

[1] https://github.com/jamesbw/tcp-daytona [2] http://nms.lcs.mit.edu/~kandula/data/daytona.pdf

tptacek · on Aug 12, 2014

Like writing your own emulator, writing a full TCP stack is a project that is intrinsically worth doing.

You can probably also come up with real-world applications for it (as a security tester, there are lots of applications for having full control over a TPC stack, regardless of how performant it is), but just having done it offers a huge learning return on a modest investment.

You probably don't fully grok what TCP even is until you've made congestion control work.

MagerValp · on Aug 13, 2014

I just want to second this, I didn't fully understand networking until I wrote my own stack and threw packets on the wire. A simple stack with ARP/ICMP/IP/UDP can be written in a day and seeing your very own ping packets fly across the globe is pretty damn awesome.

eru · on Aug 13, 2014

> (as a security tester, there are lots of applications for having full control over a TPC stack, regardless of how performant it is)

And sometimes you even want your stack to be slow, eg in a slow loris attack.

Sami_Lehtinen · on Aug 13, 2014

Wrong. In these cases you want your stack to be as fast as possible. But you're introducing artificial delay on selected keypoints. If your attack code is slow, then you're basically attacking your self. Target is to consume (a lot if possible) more resources on the target side, than you're using on your own side.

tedchs · on Aug 12, 2014

Software Defined Networking is about moving the control plane out of switching/routing devices and into general purpose servers, so that they can make better forwarding decisions such as improved convergence time. In the sense that I understand SDN, I don't think doing TCP in userspace is part of it.

blutoot · on Aug 12, 2014

But if it can be done easily in a high-level language, then the control plane can be integrated better with a lot of applications written in the same language, no? I was thinking if that can serve as an impetus for SDN adoption.

corysama · on Aug 12, 2014

You should have a look over here :)

https://github.com/SnabbCo/snabbswitch/wiki

ay · on Aug 12, 2014

The python is quite fine at doing the TCP indeed, muXTCP is the proof.

Being implemented around Twisted, it actually allows you to fiddle with low-level TCP stuff, while e.g. offloading the SSL to the existing stack. It saved my bacon a few times when I wanted to reproduce a complicated network breakage scenarios.

https://www.youtube.com/watch?v=BEAKtqiL0nM - Video about muXTCP from 22C3

https://github.com/enki/muXTCP - github repo with it.

collyw · on Aug 12, 2014

I was surprised by this as well. The common quote is that Python is 10 - 100 times slower than C, yet computers have been doing TCP for ages, and Moores law has meant that my phone is probably more powerful / faster than my laptop from ten years ago. It didn't quite add up.

rectangletangle · on Aug 13, 2014

Python may be slow, but it's only relatively slow. It's still fast enough for the vast majority of use cases.

lutusp · on Aug 13, 2014

As with most things in software, speed can't be measured only by the running program -- the time required to write the program should be included in nearly all cases.

Python: fast to write, slow to run.

C/C++: slow to write, fast to run.

To put it succinctly, you can write fast programs, and you can write programs fast, but you can't write fast programs fast.

rectangletangle · on Aug 13, 2014

I entirely agree, for most situations the trade off is in favor of high-level languages like Python. The scale tips even further in favor of CPython when you start using C extensions like Numpy. All of the benefits of Python, with speed approaching that of pure C (there's some overhead when calling from Python). For use-cases like web applications, the language is almost never the bottleneck anyways. Network latency and database queries usually eat up far more time than the glue code holding it all together. And this isn't even touching on the security benefits of memory safe languages.

codygman · on Aug 13, 2014

> To put it succinctly, you can write fast programs, and you can write programs fast, but you can't write fast programs fast.

Haskell, Clojure, and Ocaml do pretty well at writing fast programs fast for what I consider appealing values of fast.

pm90 · on Aug 13, 2014

Now I'm genuinely interested in knowing which parts of python are 'slowing down' the application. With the help of Cython[0], one can give hints to the compiler to improve performance for most of the cases.

[0]:http://cython.org/

cbhl · on Aug 12, 2014

The Google homepage is only about 20000 bytes... if we assume an maximum segment size of ~1400 bytes, then 14 or 15 packets is about right.

I wouldn't be surprised if Google is sending the packets all at once and ignoring the ACKs altogether.

Heck, there's even a 2010 paper from Google on the subject of sending a bunch of packets at the beginning of the connection: An Argument for Increasing TCP's Initial Congestion Window[0]

[0]http://research.google.com/pubs/pub36640.html

bane · on Aug 12, 2014

If a microcontroller and a Commodore 64 can do TCP/IP, Python on a modern PC can handle it.

https://en.wikipedia.org/wiki/UIP_(micro_IP)

http://www.techrepublic.com/blog/classics-rock/surf-the-web-...

digitalsushi · on Aug 12, 2014

Our tcp stack is written in tcl. The window size is set to 1. It works fine.

qewrffewqwfqew · on Aug 13, 2014

As a heavy user of Tcl, I'm interested to know more. Is this related to hping3 or pktsrc or NS? Or are you F5?

digitalsushi · on Aug 13, 2014

It's pktsrc.

beneater · on Aug 12, 2014

This seems likely, but also easy to work around by advertising a smaller window size.

When you send ACKs, not only do you send the acknowledgement number indicating which byte you expect next, but you also send a window size indicating how many bytes you're willing to receive before the remote end has to wait for another acknowledgement. Normally you want this to be somewhat large so you don't spend lots of idle time waiting around for ACKs. (But not so large that packets get dropped). This is the key to TCP flow control, which was kinda glossed over in the blog post in the interest of keeping things simple.

But perhaps by default, you're advertising a too-large window considering the circumstances. I bet you could make this a lot more reliable just by advertising something smaller.

Good TCP implementations have overcome a lot more than some unwanted buffering.

tptacek · on Aug 12, 2014

If it's the buffering delay I'm thinking of, and your packet capture filter is tight, the delay we're talking about can be multiple seconds long. Changing window sizes won't fix it.

tjz77 · on Aug 14, 2014

I haven't read your code yet but I was curious about which packet you are ACK'ing?

Is your ACK sequence number the sum of all received data lengths? I think that is how that works?

e12e · on Aug 13, 2014

Python being too slow for tcp/ip sounds silly. Afaik the c64 can run a webserver. At what... 4mhz? Less? I think you could probably emulate the c64 tcp/ip stack in python in realtime+ (or -...) on a multighz cpu...

MagerValp · on Aug 13, 2014

1 MHz, and while I only implemented UDP in my stack (https://github.com/oliverschmidt/ip65), Jonno Downes took over and added TCP later. There's also Adam Dunkels uIP and Doc Bacardi's HTTP-Load.

So yes, if a C64 can handle it Python should have plenty of power.

FlyingAvatar · on Aug 12, 2014

I am curious if the author has tried it against other servers, such as a vanilla Apache server.

Google's webservers, including the TCP stacks themselves, may be very aggressively tuned to make sure you get the response absolutely as fast as possible, at the expense of re-sending packets more quickly than specified.

easytiger · on Aug 12, 2014

scapy is the slowest thing on earth. It shouldn't be used.

adestefan · on Aug 12, 2014

I agree that scapy is bad for just general packet capture and injection. However, it's a good tool to use for packet dissection and construction.

throwaway0010 · on Aug 12, 2014

Indeed. I can negotiate a connection to google.com:80 successfully on a 1000ms latency pipe just fine. Slowly, but fine.

sly010 · on Aug 12, 2014

If an 8MHz microcontroller is fast enough to implement TCP, then Python should be fast enough too.

Here is my two cents on the expirement:

1. You don't really have to ack every packet, you have to order them, drop duplicates and ack the last one.

2. Google ignores the TCP flow control algorithm and sends the first few hundred packets very quickly without waiting for acks. They do this to beat the latency of the flow control algorithm. That's why you end up with so many packets on your side. You could just try anything but google, and you would probably see that you have a less insane packet storm.

Dylan16807 · on Aug 12, 2014

Even 8MHz is massive overkill. It's almost fast enough to bit-bang ethernet.

jacquesm · on Aug 12, 2014

I'd like to see you bit-bang a 10MHz signal with an 8 MHz clock.

You'd need something substantially faster (Nyquist and such).

TickleSteve · on Aug 12, 2014

TCP != ethernet. A TCP stack can run over serial and othe link layers (PPP, SLIP, etc). PPP is absolutely doable on very small microcontrollers.

Sami_Lehtinen · on Aug 14, 2014

Problem is that using PPP or SLIP doesn't relieve you from having to implement at least some kind of crude TCP stack.

But if you use TCP offloading, then you can communicate using TCP as cheaply as doing any traditional serial communication. I've seen those crude TCP stack implementations and those are barely LAN capable. Using those over Internet wouldn't be a great idea. Often these TCP stacks are used just as serial - tcp converters. Using RWIN of 16 (max) bytes due UART buffering limits etc. Not sending ACK before whole 16 byte is cleared and so on.

So with really low end devices, it's better to off load TCP to custom hardware. Many embedded devices use that approach. In such cases you'll only call to IP address and port, and when it connects, you'll have your TCP/IP connection in pure serial. TCP-modems isn't any different from other good old Hayes modems. It's just dial-up over Internet and TCP.

Btw. If the system is able to run CPython, Linux, or something similar, it's already a pretty powerful system.

Dylan16807 · on Aug 14, 2014

Even without any kind of offloading, you can manage a perfectly good TCP implementation and super-minimal web browser with a few kilobytes of ram and 100KHz.

jacquesm · on Aug 18, 2014

It said 'ethernet' in the parent comment to mine, not TCP.

phaker · on Aug 12, 2014

Some guy actually did it on a 20MHz avr: http://www.cesko.host.sk/IgorPlugUDP/IgorPlug-UDP%20%28AVR%2... (hope the url doesn't get borked).

Sadly 10BASE-T ethernet uses manchester coding, so you need 20MHz :( you could pull it off with a slower processor if you had a fifo that can reach 20Mbaud on the serial end, but the usart in avr micros runs at a fraction of the system clock.

VLM · on Aug 13, 2014

"you could pull it off with a slower processor if you had a fifo that can reach 20Mbaud on the serial end"

For a surprise, google for / research how wiznet based ethernet controllers are used on Arduino shields or just wired up by hand (I did a hand wired wiznet 5100 once upon a time...). Yes yes its possible to run it in full TCP or UDP termination mode which is the way most people use it, but it can also terminate at IP level, and with some limitations, raw ethernet packet level. If you pull that data sheet and want to actually try this, look for documentation about "MACRAW" mode. This basically turns the controller into an ethernet packet FIFO as you describe with ethernet on one side and SPI bit banging on the other side.

Disclaimer: I've never done anything with MACRAW mode on wiznet controllers other than read about it in the data sheet and wonder why I'd want to do that. If I was going to blackhat a weird portable appliance or otherwise do heavy ethernet weirdness I'd use a rasp pi or similar SBC and linux not arduino compiled C, but whatever.

kashif · on Aug 12, 2014

I have actually implemented a TCP stack in python, unfortunately I can't share it publicly just yet.

The authors problems are because she is not using RAW_IP_SOCKETS.

Making TCP packets using pythons struct module is a breeze. I can post specific examples in code if anyone is interested.

Finally you can write a proper TCP stack in python, there is no reason not to. Your user-space interpreted stack won't be as fast as a compiled kernel-space one - but it won't be feature short.

PS: I guess, Google is probably sending him a SSL/TLS handshake which he isn't handling.

Edit: Corrected author's gender as mentioned by kind poster.

blutoot · on Aug 12, 2014

I believe the author is a "she". It says at the banner of the blog.

blutoot · on Aug 12, 2014

I think this downvoting thing is out of control. What was even remotely offensive about this correction?

dragonwriter · on Aug 12, 2014

"offensive" isn't the only reason for downvoting. "Not a substantive addition to the conversation" is one (of many) other bases. If you are going to complain about downvoting [1] why do so based on the unwarranted assumption that the downmod must be for offensiveness?

[1] And you shouldn't, see the Guidelines [2] under "In Comments"

[2] https://news.ycombinator.com/newsguidelines.html

vacri · on Aug 13, 2014

In my opinion, if HN doesn't want puzzled people to ask why they've been downmodded, then instead of putting a line in the Guidelines saying "don't do this", they should separate the downmod into "I disagree" and "flag this as a bad comment", with the latter item the one that reduces the text weight. It's clearly broken - people have been doing this for as long as I've been here, and it's not going to change (sadly, 'it' means both the behaviour, and HN's braindead downmod mechanism).

fivre · on Aug 12, 2014

If you have code examples already, I'd be grateful--I ran into server that had Python but not nmap or nc and while I was already familiar enough with socket to do what I needed, I wasn't so familiar with struct and wasn't able to make much headway manually building TCP/UDP packets before having to move onto other work.

kashif · on Aug 13, 2014

drop me an email, the address is in my profile. I will send you some examples.

kashif · on Aug 12, 2014

Google sends extra packets because it does not comply with TCP Slow Start(RFC 2581) - if the author implements a proper tcp state machine and buffer then she would be able to handle this.

smutticus · on Aug 13, 2014

I'm the author of hexcap(http://www.hexcap.org), an ncurses libpcap file editor and packet generator. I've also written many Scapy applications like this one(https://github.com/smutt/mcastClients). I rewrote the DHCPv4 client in Scapy since the stock one is broken. Also as part of hexcap have made numerous fixes to dpkt. Needless to say, I've done a lot with Python and packets.

If you're interested in writing a TCP/IP stack in Python I would recommend you use Python raw sockets, or possibly dnet[1] or pcapy[2]. The Scapy level of abstraction is too high for your needs.

I agree with other posters who mention buffering in libpcap. Read the man page for pcap_dispatch to get a better idea of how buffering works in libpcap. Also try capturing packets with tcpdump with and without the '-l' switch. You'll see a big difference if your pkts/sec is low.

Don't do arp spoofing. If you're writing a TCP/IP stack then you need to also code up iparp. If you don't want to do that, then use raw sockets and query the kernel's arp cache.

On second thought you really need to use raw sockets if you want this to work. Using anything pcap based will still leave the kernel's TCP/IP stack involved, which is not what you want.

[1] http://libdnet.sourceforge.net/pydoc/private/dnet-module.htm... [2] http://corelabs.coresecurity.com/index.php?module=Wiki&actio...

jnbiche · on Aug 12, 2014

This is a fun write-up. If you enjoy this kind of playing around with networking in a dynamic language, and don't want to have to worry about ARP spoofing to do these kinds of experiments, you may want to take a look a Snabb Switch. It provides userland networking in Lua, connecting to the NIC directly (only a handful of popular NICs currently supported) [0].

I've not used it yet, but I've read over the documentation and am itching for an opportunity to do so.

0. https://github.com/SnabbCo/snabbswitch

voltagex_ · on Aug 13, 2014

I can't find the list of supported NICs in the documentation, but I found https://github.com/SnabbCo/snabbswitch/blob/master/src/lib/h... which suggests that only a very small subset of Intel NICs are supported. One of those may be emulated by VirtualBox, though.

mholt · on Aug 12, 2014

ARP spoofing... clever. This was an amusing read and really informative, too. There is definitely something to be said for explaining lower-level concepts (e.g. TCP handshakes) using the common tongue. IMO, not a bad way to begin learning. Someone could perform the same experiment now and use Wireshark to see the raw packets, then draw conclusions to what is happening.

Anyone know why the Python program is so slow? I'm looking at the code and my first guess would be this part[1] but I can't explain why, overall, it would be so slow that a remote host would close the connection.

[1] https://github.com/jvns/teeceepee/blob/7e93970e16fbb0c3d4bee...

delluminatus · on Aug 12, 2014

I would be interested in seeing if using asyncio would resolve the network issues.

blutoot · on Aug 12, 2014

This wins the Internet today for me... "my kernel: lol wtf I never asked for this! RST! my Python program: ... :("

kfnic · on Aug 12, 2014

Shouldn't the TCP handshake look like this:

---- SYN ---->

<-- SYN/ACK --

---- ACK ---->

rather than having the client send two SYNs to the server?

jvns · on Aug 12, 2014

absolutely! Fixed :)

cookiemonster11 · on Aug 12, 2014

It should.

malone · on Aug 12, 2014

I like your solution to prevent the kernel from interfering with your packets.

An alternative method I've used in the past is to add an iptables rule to silently drop the incoming packets. libpcap sees the packets before they are dropped, so you'll still be able to react to them, but the kernel will ignore them (and therefore won't attempt to RST them).

kalleboo · on Aug 12, 2014

It seems odd to me that Google would time out that quickly. You could never reach Google from a GPRS connection if that was the case. I'd investigate the ACKs you're sending. Are you missing packets or sending them in the wrong order?

In Uni we had a networking course where we got to build a network web server software stack from the bottom up, starting with Ethernet/MAC, TCP/IP, and then on the top HTTP, all being flashed onto a small network device (IIRC it was originally a CD-ROM server). It was an extremely enlightening exercise. I recommend you go deeper instead of just using a premade Python library for TCP!

benjamincburns · on Aug 13, 2014

Please keep writing, Julia. The next time somebody asks me why I spend so much of my life absorbed by a screen, I'll point them at your blog and say "because discovery is exciting!"

beering · on Aug 12, 2014

I wonder if it would have a better success rate on a site other than Google's, since I've heard that Google's done extensive tuning of their TCP stack to send page data faster.

Somebody's oversubscribed $3/month shared PHP hosting might not ramp up the speed as quickly.

wmf · on Aug 12, 2014

TCP is TCP. If you advertise a certain window size you better be prepared to receive that data.

lazyant · on Aug 12, 2014

yes, they also have a bigger initial TCP window than the default

ivraatiems · on Aug 12, 2014

I think it would still be significantly too fast. That oversubscribed PHP hosting is still sitting on a dedicated server that does nothing but serve TCP/IP all day. My hunch is that the code is very heavily optimized for most production servers, because they're all using the TCP/IP stack in the kernel. It won't be as fast as Google's, but it won't be as slow as python either.

latiera · on Aug 12, 2014

It's not Python that's slow, but scapy, which is dog slow. In fact, it is so slow that it should come with big WARNINGs that it's only really meant for interactive use. Do the dissection yourself or use something built for that purpose.

It's really surprising to me that lots of ppl are using scapy for things that require performance but then again if you look at the scapy website or the docs, it's not immediately apparent that their tool is not meant for this. Which I guess says a lot about the scapy developers rather than the scapy users.

tl;dr Scapy is a joke, performance-wise.

0xbadcafebee · on Aug 12, 2014

You don't need to spoof a different MAC or IP to implement your own stack on a raw socket, Python is not too slow to handle negotiating a connection, and your interpretation of how tcp/ip works is flawed. I highly recommend you read a good book about tcp/ip and learn how the kernel works with network applications of different types.

In terms of using Scapy for your packet crafting, here are some guides with examples that may help you work around your issues. (Hint: use the Scapy-defined sending and receiving routines and don't implement your own, or stop using Scapy and implement your own raw packet sockets) http://securitynik.blogspot.com/2014/05/building-your-own-tc... https://github.com/yomimono/client-fuzzball/blob/master/fuzz... https://www.sans.org/reading-room/whitepapers/detection/ip-f... http://www.lopisec.com/2011/12/learning-scapy-syn-stealth-po...

jtakkala · on Aug 15, 2014

I think you'll find that either publishing an ARP entry or filtering incoming packets in the kernel is required for handling a TCP stream over raw sockets.

As the outbound TCP SYN is manually crafted and sent over a raw socket, without any corresponding state table entry on the sender's kernel, incoming TCP responses will be rejected by the kernel with a RST.

I suggested to Julia that she manually publish an ARP entry for another IP which she could send and receive on. The kernel not having an interface with that IP assigned to it would ignore responses while also passing them to the raw socket. An alternative would be to use an iptables rule to drop incoming packets for the relevant flow - although that may be more difficult to manage depending on what you're doing.

js2 · on Aug 12, 2014

All three volumes of TCP/IP Illustrated may be found on the Internet in pdf form, but they are well worth buying.

Tangent: One of my favorite interview questions is to ask how traceroute works. The question works best when the candidate doesn't actually know. Then we can start to discuss bits and pieces of how TCP/IP works, until they can puzzle out the answer.

fivre · on Aug 12, 2014

Do you know if there's a good IPv6 equivalent? I have IPv6 Core Protocols Implementation but like the writing style TCP/IP Illustrated much more.

wmf · on Aug 12, 2014

There is a 2nd edition.

srean · on Aug 12, 2014

Too late to join the story, but I am really curious if datacenter nodes intended for heavy mapreduce use implement this layer in user space.

The bottleneck for such processes is typically network I/O and I can imagine that taking control of the network in the user space might offer some modest to significant wins. For Hadoop in particular network packets needs to traverse quite a few layers before it is accessible to the application.

Has anyone done this sort of a thing for mapreduce. Pointers to any writeup would be awesome.

In fact TCP itself might be an overkill for mapreduce. The reduce functions used are typically associative and commutative. So as long as the keys and values are contained entirely within a datapacket, proper sequencing is not even needed. Any sequence would suffice.

donavanm · on Aug 13, 2014

There are a couple implementations of userland ip/tcp out there. As I recall at least one or two are fairly direct ports of a BSD network stack in to a library. These pair up with concepts like intels dpdk to move networking closer to the business logic.

Here are some slides for one version: http://www.bsdcan.org/2014/schedule/attachments/260_libuinet...

Edit: with regards to motivation it's always been something along the lines of a network appliance that I've seen. The mainline linux network stack is more than capable of doing many millions of packets per second over hundreds of thousands of concurrent streams. The network stack will not be the limitation in something like batch processing.

wmf · on Aug 12, 2014

There has been some work about using zero-copy RDMA and RoCE (which bypass the kernel as a side effect) for analytics. Hadoop in particular is so slow that the kernel is unlikely to be a bottleneck, but more optimized runtimes like Spark might benefit.

srean · on Aug 12, 2014

I dont understand why is Hadoop so freaking slow. I am no fan of Java (to put it mildly) but Java does fairly well to keep in the 70~80% of a C++ code at the cost of 4 to 5 times more memory. My experience is that Hadoop is 4 to 6 times slower.

Is it because of bad choice of internal algorithms, bad choice of internal data structures ? Bad I/O design ? Given the popularity it enjoys, and given its age, its a little frightening how much worse Hadoop is in comparison to some proprietary implementations. My hunch is that Hadoop's slowdown is in the shuffle phase, which is where faster network data transfer can help.

I like abstractions that spark exposes but it still needs a lot of engineering to catchup. I have anecdotes where Spark is slower than Hadoop by quite a bit.

All my experience is with Hadoop 1.x. Is Hadoop 2.x much better ?

meowface · on Aug 12, 2014

I believe Google has experimented with this sort of thing, as have a few other companies that sell a supposedly more efficient/faster userspace TCP stack as an enterprise product.

lukego · on Aug 12, 2014

Cool :).

I also learned a lot about networking by writing a TCP/IP stack in Common Lisp. http://lukego.livejournal.com/4993.html if you are interested.

lnkmails · on Aug 12, 2014

As someone who worked on writing protocol specs as code for simulation purposes, I can see how much fun this is. A pure python network simulator (ns2 is C++ and painfully hard to debug) would actually be nice and encourage a lot of theoreticians to get into real programming. I've spent a reasonable amount of time in the industry building distributed systems and I can say with confidence understanding low level protocols improves your thinking.

beagle3 · on Aug 12, 2014

While not immediately relevant, if you find this discussion interesting, have a look at sshuttle:

sshuttle[0] is a pure-python one-way TCP VPN solution that is very well behaved and quite efficient. The source is highly readable as well. +1 to everything Avery Pennarun has released (including wvdial, bup)

[0] https://github.com/apenwarr/sshuttle

nodivbyzero · on Aug 12, 2014

Check this out: http://gafferongames.com/networking-for-game-programmers/sen...

http://gafferongames.com/networking-for-game-programmers/rel...

C++, but very detailed articles.

tjz77 · on Aug 14, 2014

What if every python virtual machine had a full TCP/IP stack?

IPv6 has enough address space. Object storage takes care of disk access. Generally it might be way less efficent? What would the OS look like? It seems like a lot of OS services would disappear? You'd have a cloud of processes. Each process vm would be like a cell in a body. Maybe each process vm would load an auth module. Or not.

Sami_Lehtinen · on Aug 13, 2014

There's no reason to ACK every packet. It's very common to deal with things selectively. SACK: http://packetlife.net/blog/2010/jun/17/tcp-selective-acknowl...

norswap · on Aug 12, 2014

If you're interested in the matter, here's more background info on userland TCP stacks:

http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013...

fragmede · on Aug 12, 2014

Unfortunately, most people are going to read the headline, read the conclusion (that Python is too slow for TCP), and not realize that it's wrong.

Hopefully someone will blog a response post that gets popular on HN proving just how wrong it is.

wslh · on Aug 12, 2014

And in Squeak (Smalltalk) ? take a look at: http://www.squeaksource.com/@Hl1Cdo4NwCmLqQl0/Im9cEg0J

tonyg · on Aug 13, 2014

I'm guessing you were linking to http://www.squeaksource.com/Net.html - Seaside uses non-persistent URLs by default, leading to problems like this...

philangist · on Aug 12, 2014

Fun read. Does anyone here know how to deal with the Python being slow at sending ACK packets problem? Or is it a built-in limitation that comes with dealing with high level languages?

mbell · on Aug 12, 2014

I'd start with the question of 'why so much ACKing?'. In TCP you don't normally ACK every packet. You really only need one ACK per receive window, not one ACK per packet which is what the op's code appears to be trying to do. It also doesn't look like op's code is setting the window size anywhere so in effect she's saying to google's server 'ok got that one, send me a bunch more' and trying to do that for every packet at fast as possible is creating a runaway congestion problem.

The best solution doesn't have anything to do with python, it's to implement window size (flow control) and only ack when you need to.

ps - It's been several years since I've worked on a TCP stack so please correct anything I'm remembered incorrectly

lmm · on Aug 12, 2014

The answer to questions about performance is always "profile it". I mean, sure, you could probably get a crude speedup by just running it under pypy or some such, but the real answer is to profile it, see where the bottleneck is, and fix it.

As the other post suggests, this is not a problem in all high-level languages; it's possible to write very high-performance code in Haskell or Ocaml, for example. But python's semantics are e.g. that every object has its own methods that can be overridden arbitrarily, which means that every single method call pretty much has to involve a hash table lookup. 99% of Python code never uses this extreme dynamism (usually you define methods on classes, and if you want an instance with a different implementation for a particular method then you make a subclass), but it's there, and a Python interpreter that didn't respect that would behave incorrectly.

kyllo · on Aug 12, 2014

So you're saying that late binding is the most significant reason why Python is slow? Is slowness just an inherent tradeoff in using a language that supports this powerful feature?

milesvp · on Aug 12, 2014

There's probably something to this statement. there was a great article I saw on HN about a year ago that was talking about this (wish I'd bookmarked it). Crux of it was he looked at idiomatic python, saw all the hash lookups that entailed and said if you wrote C like that, it'd be slow as hell too. he then proceeded to speed up an algorithm to near C speeds by removing structures relying on these lookups. Was quit amazing to see.

Keep in mind that in current computer architectures memory requests are very slow. And any data structure that randomizes memory access means that you have a good chance of a cache miss and now have to hit even slower ram.

mcguire · on Aug 12, 2014

Practically every operation in Python starts with a hashtable lookup. Global and local environments are hashtables, modules are hashtables, objects are hashtables. Eventually you get down to C, which is usually very good---the hashtable implementation in particular---but for a lot of Python code the extra indirections are expensive. That's also why it is hard to optimize.

In this particular case, I don't think it's the problem, though. Just something to keep in mind.

berdario · on Aug 12, 2014

You're probably thinking of this:

https://speakerdeck.com/alex/why-python-ruby-and-javascript-...

kyllo · on Aug 12, 2014

This is also relevant, someone implemented early-binding virtual method tables in Python and benchmarked the performance speedup: http://legacy.python.org/workshops/1998-11/proceedings/paper...

lmm · on Aug 12, 2014

I'm not an expert but that's my understanding. Remember that it applies to some scopes as well as to object properties. In benchmarks the big divide I've seen is between languages that allow this and languages that don't.

You can do sophisticated things where you compile objects assuming they won't be overridden and then back out the compile if something touches an object (the JVM does similar things where it will compile a never-overridden method as non-virtual and then detect when the class hierarchy changes), but that requires a lot of complexity that goes against the goals of CPython.

Nowadays I mostly work in Scala, and anywhere where I would have used such a technique I find there's a way to do it "statically". So I'd be interested if there are good examples of what makes this a "powerful" technique, and to see if I can't replicate them "statically" with enough typeclasses etc.

marcosdumay · on Aug 12, 2014

Not only late biding, but all the Python's powerfull reflection (not only hashes, but it's string hashes all the way down) makes it hard to write a fast compiler.

Not impossible, as lmm said, just hard.

crazypyro · on Aug 12, 2014

Implement it in C.

Also its likely more a python problem than a "high level" language problem.

mkonecny · on Aug 12, 2014

This has nothing to do with Python or C. More likely he has a bug in his code.

crazypyro · on Aug 12, 2014

Python being magnitudes slower than C for networking code has a lot to do with Python, actually. Other smarter people have already explained it better than I can up above.

jnazario · on Aug 12, 2014

there's a userland stack written in C in honeyd. http://honeyd.org/ (slow to respond) and http://en.wikipedia.org/wiki/Honeyd

crazypyro · on Aug 12, 2014

Except that was written for an explicit purpose that wasn't speed. I fail to see how my suggestion of implementing low level networking in C and linking that to python code is considered a poor solution.

fastball · on Aug 12, 2014

I blame all networking issues on the GIL.

c_plus_minus · on Aug 12, 2014

Ha! Nice read, perfect after lunch material with my coffee :) Good job

kernelwaste · on Aug 12, 2014

This is not an implementation of a TCP stack.

cookiemonster11 · on Aug 12, 2014

This is not a TCP stack.

prht · on Aug 12, 2014

Indeed

cookiemonster11 · on Aug 12, 2014

It confuses me that people make such a big deal of their little 20 lines of code toy projects.

michaelhoffman · on Aug 12, 2014

It's not a little 20 line of code toy project. It's an engaging and accessible writeup of some basic parts of TCP that happens to include some easy-to-understand code.

It's pointless to people who understand how TCP works in depth, but the majority of programmers don't.

0xbadcafebee · on Aug 12, 2014

It's an engaging and accessible writeup, true. Unfortunately, there are also several glaringly incorrect/misleading points in the article. The fact that it's getting upvoted is just..... strange.

ivraatiems · on Aug 12, 2014

I don't think it's presented as a "big deal." I read it as a fun little experiment.

Someone1234 · on Aug 12, 2014

It confuses me that instead of finding these projects uninteresting (as you claim) you took the time and effort to write a reply in a thread just to belittle and dismiss the article.

I for one found it interesting/fun.

prht · on Aug 12, 2014

More confusing part for me is that I knew there is TCP/IP stack for ages. TCP stack is all new to me.

zzzeek · on Aug 12, 2014

My first thought was "it'll be slow as shit", then I clicked the article to confirm. Win!

prht · on Aug 12, 2014

I don't know what TCP stack is, but in case if you are interested in implementing TCP/IP stack http://git.savannah.gnu.org/cgit/lwip.git/tree/src

kesor · on Aug 12, 2014

And there is no mention of ScaPy? http://www.secdev.org/projects/scapy/

ayrx · on Aug 12, 2014

" I was much more comfortable with Python than C and I’d recently discovered the scapy networking library which made sending packets really easy."

Did you even read the link?

cookiemonster11 · on Aug 12, 2014

There is, in the second sentence of the article.