Is that really a win in terms of latency, considering that the chance of a cache...

vel0city · 2025-07-16T16:07:25 1752682045

I used to run unbound at home as a full resolver, and ultimately this was my reason to go back to forwarding to other large public resolvers. So many domains seemed to be pretty slow to get a first query back, I had all kinds of odd behaviors from devices around the house getting a slow initial connection.

Changed back to just using big resolvers and all those issues disappeared.

0xbadcafebee · 2025-07-16T15:08:09 1752678489

Keep in mind that low latency is a different goal than reliability. If you want the lowest-latency, the anycast address of a big company will often win out, because they've spent a couple million to get those numbers. If you want most reliable, then the closest hop to you should be the most reliable (there's no accounting for poor sysadmin'ing), which is often the ISP, but sometimes not.

If you run your own recursive DNS server (I keep forgetting to use the right term) on a local network, you can hit the root servers directly, which makes that the most reliable possible DNS resolver. Yes you might get more cache misses initially but I highly doubt you'd notice. (note: querying the root nameservers is bad netiquette; you should always cache queries to them for at least 5 minutes, and always use DNS resolvers to cache locally)

lxgr · 2025-07-16T17:17:35 1752686255

> If you want most reliable, then the closest hop to you should be the most reliable (there's no accounting for poor sysadmin'ing), which is often the ISP, but sometimes not.

I'd argue that accounting for poorly managed ISP resolvers is a critical part of reasoning about reliability.

JdeBP · 2025-07-16T18:21:52 1752690112

It is. If latency were important, one could always aggregate across a LAN with forwarding caching proxies pointing to a single resolving caching proxy, and gain economies of scale by exactly the same mechanisms. But latency is largely a wood-for-the-trees thing.

In terms of my everyday usage, for the past couple of decades, cache miss delays are largely lost in the noise of stupidly huge WWW pages, artificial service greylisting delays, CAPTCHA delays, and so forth.

Especially as the first step in any full cache miss, a back-end query to the root content DNS server, is also just a round-trip over the loopback interface. Indeed, as is also the second step sometimes now, since some TLDs also let one mirror their data. Thank you, Estonia. https://news.ycombinator.com/item?id=44318136

And the gains in other areas are significant. Remember that privacy and security are also things that people want.

Then there's the fact that things like Quad9's/Google's/CloudFlare's anycasting surprisingly often results in hitting multiple independent servers for successive lookups, not yielding the cache gains that a superficial understanding would lead one to expect.

Just for fun, I did Bender's test at https://news.ycombinator.com/item?id=44534938 a couple of days ago, in a loop. I received reset-to-maximum TTLs from multiple successive cache misses, on queries spaced merely 10 seconds apart, from all three of Quad9, Google Public DNS, and CloudFlare 1.1.1.1. With some maths, I could probably make a good estimate as to how many separate anycast caches on those services are answering me from scratch, and not actually providing the cache hits that one would naïvely think would happen.

I added 127.0.0.1 to Bender's list, of course. That had 1 cache miss at the beginning and then hit the cache every single time, just counting down the TTL by 10 seconds each iteration of the loop; although it did decide that 42 days was unreasonably long, and reduced it to a week. (-: