Tcpdive – A TCP Performance Profiling Tool

brendangregg · on Jan 19, 2016

Looks like a good project -- new metrics like how much time a TCP session has been in different retransmit states should be useful for properly comprehending issues, and estimating speed up.

And it's a bunch of SystemTap scripts that trace send/receive/etc! Glad it isn't a kernel module, at least. :) So tracing send/receive does cost, which is covered in the README:

"The figure above shows Per-core CPU consumption of tcpdive is less than 10% while QPS is no significant influenced, which we believe is acceptable in most cases."

Actually, I'd say this was acceptable _because_ it is so clear in the README, and we can judge its usage beforehand. The overhead is actually pretty high, 2-10% CPU per core, but then that's what I'd expect for tracing send/receive using SystemTap. Again, by making it clear we can accept or not accept it beforehand and use accordingly.

eBPF should be making this type of tracing lower overhead...

aduitsis · on Jan 19, 2016

You don't mention it, but credit where it's due: DTrace, with the creation of which you had a lot to do, can be used in quite the same way in kernels that have added the TCP provider. I think we all owe you a big thanks.

brendangregg · on Jan 19, 2016

These look like a lot of new scripts, and not quite what I was doing with DTrace, although in the same spirit that DTrace pioneered: tracing of TCP internals for custom metrics.

I didn't create DTrace, but I did create the DTrace TCP provider and many networking scripts. A nice list of them is here: http://dtracebook.com/index.php/Network_Lower_Level_Protocol... . My scripts focused on tracing events, workload characterization at different levels, and some timing: connection lifespans, and 1st byte latency. (At least in that location; I've got DTrace scripts scattered elsewhere too). The tcpdive scripts have focused so far on perturbation study: congestion, retransmissions, resets. Also useful!

tomkinstinch · on Jan 19, 2016

And to simulate a degraded connection, the colorfully-named comcast tool works well:

https://github.com/tylertreat/comcast

wtallis · on Jan 19, 2016

That tool is a very thin wrapper over preexisting OS functionality, and it does little more than disguise how limited those capabilities are when it comes to making realistic simulations.

If you really want to assess how your application will perform on low-quality connections, it behooves you to understand what makes those connections suck, and what specific capabilities your OS of choice has for simulating or generating those conditions.

Statically setting latency to 500ms or packet loss to 10% is not realistic; it simultaneously exaggerates the kind of performance issues that exist in the real world and is much easier to compensate for than real network dynamics.

tomkinstinch · on Jan 20, 2016

The wrapper makes the tool. It may not be perfect, and it would be nice to have more of a "random but stochastically representative degradation" option, but it's the tool I know and it works well. Can you suggest a better one for simulation? On OSX there is Apple's Network Link Consitioner[1], but on Linux? Characterizing a bad connection is great, but that isn't a tool. It's super hand wavy to say "what specific capabilities your OS of choice has for simulating or generating those conditions". My OS doesn't have capabilities I can access easily, and importantly, disable easily. Building on Comcast to add simulation improvements seems like a viable option.

aduitsis · on Jan 19, 2016

Some tools like tcpdump or ss are mentioned, but those tools are not really comparable to what's described here. What I'd like to see would be a rough comparison with the existing web100 set of kernel patches https://web10g.org/ which is used for many many years in conjunction with the Network Diagnostic Tool (NDT) from Internet2. It provides userland visibility into some of the TCP kernel parameters of each connection via a documented interface that can be used e.g. by a special web server that does performance measurements. Also see http://www.measurementlab.net/.

Similar results could theoretically be obtained with the TCP Dtrace provider, which was added in Solaris 11 if memory serves. I am not aware whether FreeBSD or MacOSX have any similar providers, but my info could be outdated.

The idea behind all these approaches is basically to target a specific TCP connection and generate an event each time a TCP packet arrives. For each of those events, a rolling estimation of the RTT is generated by the kernel and is used as a basis for calculations for the congestion window, which limits how many bytes can be subsequently sent. Various timeouts can probably trigger similar events and so on.

(edit: s/packets/bytes/)

cbsmith · on Jan 20, 2016

So, I just read the readme and haven't even fully digested that, but the tool I'd have used in the past four this kind of problem space was tcpdump + tcptrace (http://tcptrace.org). To help me understand tcpdive, how would you compare the two?

ck2 · on Jan 19, 2016

I know there are 3G and 4G simulators but it occurs to me there might be a market for a proxy/VPN that really does use 3G or 4G on a connection and feed it back to you for testing in a desktop environment.

ori_b · on Jan 19, 2016

> but it occurs to me there might be a market for a proxy/VPN that really does use 3G or 4G on a connection

Why not just use tethering?

ck2 · on Jan 19, 2016

Well I mean on demand, as a service.

http://www.webpagetest.org/ does this for waterfalls, you can select real-world 2G/3G and even LTE

ori_b · on Jan 20, 2016

Yes, and I mean what does doing it "as a service" bring over just pushing a button on your phone to act as a hotspot?

ck2 · on Jan 20, 2016

What if you have a first-world problem and your cell service is too good and reliable - even forcing the phone to 2G/3G service is not simulation enough.

pyvpx · on Jan 20, 2016

the you spend some time defining what a typical 2G and 3G connection looks like to your end users, and then you replicate that with existing tools. There are very, very many open source solutions to adding latency, jitter, throttling bandwidth, and replicating packet loss. in fact, I'd argue that coming up with a few solid definitions of what a 2G and 3G connection look like is far more accurate than running tests over one dongle "IRL"

mentat · on Jan 20, 2016

Scale? Selection of scenarios? Offload of some management?