Hacker News new | past | comments | ask | show | jobs | submit login
A transport protocol's view of Starlink (potaroo.net)
214 points by fanf2 8 months ago | hide | past | favorite | 45 comments



What I miss in this article are some references to some of the great research on Starlink performance and characteristics that has come out in the past ~2 years. [1] and [2] come to mind. They also go into much more detail on the 15s interval, quote [1]:

> Interestingly, we observe that the Starlink OWD (one-way delay) often noticeably shifts at interval points that occur at 15 s increments. Further investigation reveals the cause to be the Starlink reconfiguration interval, which, as reported in FCC filings [71], is the time-step at which the satellite paths are reallocated to the users.

AFAIK it is not the dish itself that does the tracking but a central orchestration.

[1]: https://arxiv.org/abs/2310.09242 [2]: https://arxiv.org/abs/2306.07469


The dish tracks via phase array modulation?


It's a phased array antenna.

Teardown videos available


One comment about the analysis of ping times in the section "Low Earth Orbit Systems". Specifically, the analysis of ping times "within each 15s satellite tracking interval".

Most routers do not put ping-processing in the "fast path". That is, instead of having the ping be processed by an ASIC, the ping gets processed by the router's CPU. And ping-processing is typically a lower-priority task. Because of that, you can't assume that the high variation in latency is because of Starlink.


That is true for ICMP packets directed at the router itself.

Pretty sure that is the reason why he used

> ping across a Starlink connection from the customer side terminal to the first IP end point behind the Starlink earth station

instead of pinging the terminal itself.

(ICMP packets passing _through_ a router are handled just like any other traffic and are therefore OK for an analysis like this)


Additionally, I would not expect such “baseline” changes in the minimum RTT to be due to variation in CPU processing time.

These changes are more typical of a physical path change, as suggested by the author.

CPU/soft processing latency would look more like additive noise.

I give some examples of RTT patterns in this talk: https://ripe77.ripe.net/archives/video/2250/


It's unclear to me. `traceroute` typically just returns router IPs, so I'm assuming that "the first IP end point behind the Starlink earth station" is the IP address of a router of some sort.


Hm, I understood "ip endpoint" to mean something else than a router (because I am very sure that Geoff is aware of the "ICMP is not handled on the ASIC"-issue).


I ping routers around the world all the time and rarely see any significant (>1ms) change


Most ppl do not enjoy that privilege


What do you mean by "privilege" in this context?


1st World Infrastructure and Routing, being based in a densely populated and developed locale I'd imagine.

YMMV if you attempt the same traceroute from Bangalore or Sydney to London/Paris/NYC.


About a week ago there were significant routing issues between east Africa and Europe. Latency ballooned, and jitter was quite significant too - especially over thr course of the day. I believe both EASSY and one of the semewe circuits were down. I had packets from the U.K. reaching Mozambique via Tokyo.

That was a temporary situation though, and I think the worst I’ve seen in east Africa from the last 10 years.


I just tested something to a similar effect. It's much more jumpy time-wise, yes.


Just pinged my isp router in Kathmandu for 30 seconds, 172-174ms return. It’s on AS4007.

From my router in Dhaka pining a router on airtel in India, 36-38ms

Routers in the real would give very stable ping responses, no matter where they are based.


There was a time latency on AT&T 3G was 9 seconds during congestion.

The last mile is usually the issue. Pretty much only Cogent and HE allow you to have packet loss


The argument was “Most routers do not put ping-processing in the "fast path". That is, instead of having the ping be processed by an ASIC, the ping gets processed by the router's CPU.”

That’s meaningless. Control planes are often policed sure, but on overload will simply drop. In my experience they drop icmp to expired before echo response but most will generate ttl expired just fine.

Any router capable of processing a full bgp table, and to be honest any router made in the last 20 years, is perfectly capable of responding to icmp echos.

There was then a second argument that “3rd world” routers aren’t as good as ones in western country. In the majority of cases they’re exactly the same. That western arrogance is somewhere between insulting and amusing.

The final argument is about path loss/jitter/etc, specially loss on the first hop (your “my crappy 3G provider” argument)

That’s exactly what this test of starlink is showing.

Starlink is a great tool in specific cases, but the fanboyish ness often drowns out the actual benefits.


It's not unusual to see occasionally missing routers in a traceroute, please.

You be capping


> That is, instead of having the ping be processed by an ASIC, the ping gets processed by the router's CPU.

I wonder how true that is anymore, with ICMPv6 processing being a mandatory part of IPv6. I could totally see ICMP processing being a low-priority task, but am far less certain that it would not be done by dedicated hardware these days.

On top of that, I've never, ever, ever noticed the behavior he's observing with either my cable ISP connection, or the terrestrial microwave link provided by my local WISP. I don't have enough data to say that I'd never see that behavior if I happened to ping some router powered by a potato... but I've pinged a whole bunch of systems over the years, and have never seen variation like what he describes.


> I wonder how true that is anymore, with ICMPv6 processing being a mandatory part of IPv6.

ICMPv6 may be mandatory according to specs, but you can still drop most of it with no ill effects. You probably shouldn't drop needs fragmentation packets, but everyone has adapted to those being dropped sometimes, so...

If you ignore neighbor discovery, you'll have trouble reaching your neighbors, though.

But neighbor discovery is low volume mostly, and can be handled by the cpu, not the asic. Needs frag likely won't be directed at the router ip, the asic can forward them just fine though.

I've certainly seen much more variable ping times for routers than for the hosts behind them. If the router's cpu is less busy, ping times are usually a bit more than a host that's right there, and as the router's cpu gets more busy, the rtt increases or pings get dropped. It's not usually a factor of how busy the links are either; routers tend to have a pretty limited amount of cpu and if they're getting a lot of traffic, it's easy to overload them.

I haven't seen it happen quite in the bands like in the article, but sometimes it varies quite a bit. The results tend to be much more consistent if you can get a rtt from full host and not a router.


I’ve seen ping latencies of up to 30 seconds (yes, 30 000 milliseconds!) with a certain cable ISP, while at the same time VoIP (RDP over regular Internet IP, not “cable voice” or ISP-provided SIP, which often has its own QoS class) was borderline usable.

Could have been traffic shaping or prioritization (their network was in complete shambles after all), but ICMP was definitely taking some lower priority queue or path.


*RTP


I'd expect many fewer folks use RTP than use RDP, so I'd expect that "RDP" was absolutely not a typographical error.


I did indeed mean RTP! Almost everybody uses RTP these days, whether they realize it or not. It’s the basis for WebRTC and most proprietary VoIP solutions.


Indeed. Surprising that such a technical article from a supremely technical and knowledgeable person would not at least have a disclaimer about it. I would also be interested to know how much a Starlink satellite is an ASIC router vs pure CPU.

I believe a better technique would be to ping _through_ the router to another endpoint under ones own control, on the same network as the origin client, and where processing latency can be controlled. Basically ping yourself but via starlink and back.

That said, in practical terms he's quite right about the jitter problem. I maintain 2 networks myself at my remote home. One my microwave WISP for tight latency and jitter, and then Starlink for much better bandwidth/throughput.


He does ping through the starlink terminal:

> ping across a Starlink connection from the customer side terminal to the first IP end point behind the Starlink earth station


I interpreted that statement incorrectly as being an endpoint on the satellite itself. I guess the "earth station" is the round trip terrestrial endpoint via the satellite. I was misreading it as being the local terminal.


BBR may not be perfect but it sure is an improvement for Starlink over Cubic. I switched to it on a server I connect to frequently from Starlink and got a 2x performance improvement on TCP connections. Before, big file transfers would regularly bounce between 5 Mbytes/s and 12. After it'd hold a steady 12. Which is what you'd expect given Starlink's high packet loss rate (0.5% for me) and how TCP congestion control traditionally works.


For those not aware of how Starlink operates: The customer antenna is called the User Terminal (U.T.) a.k.a. "dish" although all production models are rectangular - only the pre-production beta model is round and dish-like.

The U.T. contains a phased array antenna that can electronically 'steer' the bore-sight (aim) of the transmitted (and received) signal at the current satellite that is in view. In ideal circumstances the U.T. antenna has approximately 110 degrees field of view (~ 35 degrees above each horizon).

The satellites pass from west to east and take approximately 15 seconds to pass through the field of view of the U.T. The satellite forms a beam aimed at a fixed location on the ground - this is called a 'cell'. All U.T. within that area share the radio link that has a fixed bandwidth, so contention is managed by the satellite.

The path length to a satellite directly overhead would be around 550km (in most cases the satellite is slightly north, or slightly south, of the U.T. but for round numbers sake assume 550km).

The path length to a satellite appearing 35 degrees above the horizon (the slant range) is ~ 2568km.

Satellites relay the packets from the U.T. to the (nearest) Earth ground station, so the path length and therefore travel-time will vary enormously over just 15 seconds.

The round-trip for the minimum case is 4 x 550km = 2200km but for the maximal case is 4 x 2568km = 10272km. These equate to a travel time of between 1.8 and 3.6ms per leg, so that gives a hard physical minimum of 4 x 1.8ms = 7.2ms to 4 x 3.6ms = 14.4ms

As more satellites are added to the constellation so the gap between satellites decreases and the angle above horizon at which a satellite is acquired can increase thus shortening the maximum path and lowering the latency.

Starlink has a publicly stated goal of less than 20ms round trip latency and published a report in March 2024 about the engineering efforts to achieve this [0]. Much of the effort that customers see focuses on two issues:

  1. reducing latency between ground station and Internet connection point
  2. scheduling the radio links between satellite and all U.T.s in its beam area
Starlink balances contention by sometimes restricting and sometimes promoting activation of new U.T.s in each area - this is why on occasion a fully subscribed cell will impose a waiting list on new activations. At other times Starlink will, and does, dynamically change the monthly subscription cost. Recently some areas had their residential price reduce from US$120 to US$90 where others in congested areas had an increase from US$90 to US$120 (in the USA).

[0] https://api.starlink.com/public-files/StarlinkLatency.pdf


fyi, User Terminal or UT is quite an established term. Oldest random Googlable example comes to my mind is a thin client server process named "SUNWut"(a ticker code + "ut") from early 2000s. I'm pretty sure the usage traces back multiple decades from that point in phone and software industries.


General usage stemmed from the Tactical User Terminal's deployed in the 1980s by the U.S. Army in Germany; part of codename TENCAP ELINT programme - and were truck-sized!


Most of the references I can find to "SUNWut" are related to the Sun Ray thin client. See https://en.wikipedia.org/wiki/Sun_Ray


> This orbital velocity at the surface of the earth is some 40,320 km/sec.

Huh? Is this right? That's really fast, like 24,800 miles per second, which is a significant fraction of c (the speed of light, 186,000 miles/sec). >10% of c, is this a typo?


They probably meant km/h - that's in the right ballpark for the oft cited 7.5 km/sec for LEO.


No, it's not. This immediately caught my eye too. According to google the right number is 7.8 k/s so it was very far off the mark.


Earth's orbit has a circumference of about 940 million km. Divide that by 365.25 days x 24 hrs/day x 3600 seconds/hr and I'm getting in the arena of 29.78 km/sec.


That’s the velocity of the earth around the sun. But “orbital velocity at the surface of the earth” is supposed to be the velocity a satellite in an orbit at ground level around the earth would have.


7.8 km/s is typical in LEO. Much above 11 is no longer in earth orbit.


> This orbital velocity at the surface of the earth is some 40,320 km/sec.

Is that a typo? Orbital velocity on the surface should be 7.9km/s.


40,320 km/sec is full earth rotation per second, a tiny bit too fast . Should be /per day or so. Also, 40k km/s is ~13% of c (speed of light)


It seems like it must be - 40,320 km is comparable to the circumference of the earth!


Is this scenario not what TCP Peach was created for?


Can you say more?


> A “flooding” ping sends a new ping packet each time a packet is received from the remote end

Uhm, no:

    -f
       Flood ping. For every ECHO_REQUEST sent a period “.” is
       printed, while for every ECHO_REPLY received a backspace is
       printed. This provides a rapid display of how many packets
       are being dropped. If interval is not given, it sets interval
       to zero and outputs packets as fast as they come back or one
       hundred times per second, whichever is more. Only the
       super-user may use this option with zero interval.


> outputs packets as fast as they come back

If the latency is under 10 ms, then what op said is correct .




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: