> The proper thing a resolver should do is follow the referral and ask the child name servers.
You're right, I was clearly not paying attention when writing that part down (fixed now in the forum post).
> [...] but I also strongly suspect that QNAME minimization may rest on under-specified or un-specified ideas about what authoritative name servers should do. I'd have it follow referrals if it doesn't already.
AFAIK, that is not the case, QNAME minimisation is well-specified, and the resolver will follow referrals if it gets them. But in this case the resolver is getting an NXDOMAIN where it should be getting either a NODATA response (if there is no zone at ws.fdmg.org), or it should be getting a referral, if there is a zone at ws.fdmg.org, which should then contain a delegation for prod.ws.fdmg.org.
Since it's getting neither, the resolver will stop resolution when it gets the NXDOMAIN. Now it may be the case that this is either broken empty non-terminal behaviour, or there is actually a zone at ws.fdmg.org, but that zone is missing a delegation for prod.ws.fdmg.org. It's hard to tell from the outside, but if the latter is the case it should be easy to fix.
There is a zone cut at ws.fdmg.org, so I think it's the latter. The zone is misconfigured, that's pretty simple to fix for whoever owns the zone, but the general question of whether Route 53 should be more strict and prevent it from being imported or not is a tough call.
On the one hand it's good to prevent these edge cases and keep things deterministic, but on the other it can cause availability issues for customers if Route 53 were to refuse to import a zone that probably loaded and "worked" just fine with the previous or secondary set up. There's all sorts of ways it's possible to write a mis-configured but "valid" zone file, where records and subtrees of records become unreachable. Most customers rely on test queries and traffic monitoring to observe if there are problems.
In general, at AWS we try to avoid being paternalistic; we believe in giving customers powerful tools and then making those tools intuitive and easy, but I'll admit this can be very challenging with DNS, where the standards and obscure corner cases were never really intended to be understood by an inexpert user-base.
I resisted making this point last night because I didn't want to be seen to deflect. But on reflection, the privacy advocate in me really needs to get this out: QNAME minimization is a really really bad idea and doesn't do anything meaningful for privacy IMO.
The most privacy sensitive domains are still leaked. DNS queries are still in the clear, including the ultimate specific queries for the domain you want to resolve, observable to anyone who can look at your traffic, maybe including your wifi buddies at the local café.
Then along the way the sensitive names are still leaked to parties you probably shouldn't trust in the privacy-protecting model. For example: Verisign can still tell that you are querying for "oddlyspecificpornfetish.com". Dropping the "www", if there even is one, isn't much protection. In return for essentially zero meaningful privacy protection, you have to deal with all sorts of name resolution problems due to making NS queries. That's not a smart trade-off. I'm not saying the goal isn't worthy though, it is.
Agreed, it is a tough call on whether or not Route 53 should try to prevent this type of misconfiguration. I would err on the side of saying you probably shouldn't because before you know it you're going down the rabbit hole of fixing all sorts of weird misconfigurations users make in the DNS.
And yes. DNS seems so simple on the outside until you let a non-expert near it. And even experts make mistakes analysing corner cases (mea culpa).
As to whether or not QNAME minimisation offers privacy, this is debatable, but here I would err on the side of saying: it's a building block that at least prevents some leakage (to e.g. the root).
A more important takeaway for me here is that QNAME minimisation makes misconfigurations such as missing delegations actually cause problems, whereas this would have been masked if QNAME minimisation is not enabled. I'm planning to do a measurement of the frequency at which this occurs.
Finally, as agwa pointed out, Route 53 still has broken empty non-terminal responses, I've added a separate reply to the AWS forum thread with the example agwa used.
> On the one hand it's good to prevent these edge cases and keep things deterministic, but on the other it can cause availability issues for customers if Route 53 were to refuse to import a zone that probably loaded and "worked" just fine with the previous or secondary set up. There's all sorts of ways it's possible to write a mis-configured but "valid" zone file, where records and subtrees of records become unreachable. Most customers rely on test queries and traffic monitoring to observe if there are problems.
Since this problem only materializes if you prime your resolver cache correctly, it's really easy for testing/monitoring to miss it.
Route 53 could issue a warning instead of rejecting the zone outright. For example, Google Cloud DNS issues these warnings if you try to add an SPF TXT record without quotes:
"Warning: A record for this domain has whitespace but is not a "quoted string" and therefore is split into separate strings at whitespace. SPF,
DKIM, and DMARC join those strings without spaces, which can cause "problems, especially for Sender Policy Framework records. Warning: A record for this domain starts with "v=spf1" but lacks a quoted space following the '1'. This may be a badly formatted Sender Policy Framework record that will be ignored by mail software."
That saved someone I know from publishing a bad SPF record.
You're right, I was clearly not paying attention when writing that part down (fixed now in the forum post).
> [...] but I also strongly suspect that QNAME minimization may rest on under-specified or un-specified ideas about what authoritative name servers should do. I'd have it follow referrals if it doesn't already.
AFAIK, that is not the case, QNAME minimisation is well-specified, and the resolver will follow referrals if it gets them. But in this case the resolver is getting an NXDOMAIN where it should be getting either a NODATA response (if there is no zone at ws.fdmg.org), or it should be getting a referral, if there is a zone at ws.fdmg.org, which should then contain a delegation for prod.ws.fdmg.org.
Since it's getting neither, the resolver will stop resolution when it gets the NXDOMAIN. Now it may be the case that this is either broken empty non-terminal behaviour, or there is actually a zone at ws.fdmg.org, but that zone is missing a delegation for prod.ws.fdmg.org. It's hard to tell from the outside, but if the latter is the case it should be easy to fix.