More

gmaslov · on Oct 21, 2016

Aw, don't leave us hanging like that. What problems did it cause?

Kalium · on Oct 21, 2016

Imagine migrating your website to a new host. A month later, you learn that a major ISP has decided that its customers don't need to know about the move, because they hold on to last-known-good records as they like. So half your traffic and business is gone. Or maybe you can't use anything run on Heroku, because the dynamicism there doesn't play nice with your resolver's policies.

That's the kind of world we used to live in when TTLs were often treated as vague suggestions.

bhauer · on Oct 21, 2016

The scenario I was describing was one where a last-known-good resolution would be used if and only if a refresh attempt fails after the authority-provided TTL expires.

I believe the scenario you are describing is a rogue ISP ignoring that authoritative TTL wholesale, caching resolutions according to its own preferences regardless of whether the authority is able to provide a response after the authoritative TTL expires.

Kalium · on Oct 21, 2016

The rogue ISPs thought they were helping people by serving stale data. After all, better something past its use-by date than failing, right? A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Among other problems, this enables attacks. Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

Be very, very careful when trying to be "safer". It can unintentionally lead somewhere very different.

bhauer · on Oct 21, 2016

> A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Hang on a second. I feel that you're piling on other resolver changes in order to make a point. I'm not suggesting that the tolerance for DNS response times be reduced. Nor am I suggesting a scenario where the authority gets one shot after their TTL, after which they're considered dead forever. I would expect my caching DNS resolver to periodically re-attempt to resolve with the authority once we've entered the period after the authority's TTL.

> Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

I think you're suggesting that someone could commandeer an IP and then prevent the rightful owner to correct their DNS to point to a temporary new IP.

Isn't the real problem in this scenario the ability to commandeer an IP? The malicious actor would also need to be able to provide a valid certificate at the commandeered IP. And at that point, I feel we've got a problem way beyond DNS resolution caching. Besides, if what you have proposed is possible, isn't it also possible against any current domain for the duration of their authoritative TTL? That is, a domain that specifies an 8-hour TTL is vulnerable to exactly this kind of scenario for up to an 8-hour window. Has this IP commandeering and certificate counterfeiting happened before?

Kalium · on Oct 21, 2016

> Hang on a second. I feel that you're piling on other resolver changes in order to make a point.

Yes. The point I am making is the additional failure modes that need to be considered and the pain they can cause. Historically have caused.

At no point did I ever think you were suggesting that one failure to respond renders a server dead to your resolver forever. Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

> Isn't the real problem in this scenario the ability to commandeer an IP?

You're absolutely right! The real problem here is the ability to commandeer an IP.

However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.

pyre · on Oct 21, 2016

> Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

If this were true, the current failure mode would have end users receiving NX DOMAIN a "high percentage of the time," which obviously is not happening.

{edit: To be clear, I'm reading the quote as you stating that "failure to resolve" currently happens a high percentage of the time, and therefore this new logic would result in extended TTLs more often than the original post would assume they would happen}

> However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.

It's fair to point out that this change, combined with other known issues could create a "perfect storm," but as was pointed out this exploit is already possible within the current authoritative TTL window. Exploiting the additional caching rules would just be a method of extending that TTL window.

On the other hand, where do you draw the line here? If you had to make sure that no exploits were possible most of the systems that exist today would never have gotten off the ground. It seems a bit like complaining that the locks to the White House can be exploited (picked), while missing the fact that they are only supposed to slow someone down before the "men with guns" can react.

Kalium · on Oct 21, 2016

Based on the highly unscientific sample of the set of questions asked by my coworkers in my office today, the failure mode of end users receiving NX DOMAIN has happened much more than on most days.

I don't need to make sure no exploits are possible. However, it at all possible, I'd like to help ensure that things aren't accidentally made more dangerous. It's one thing to consider and make a tradeoff. It's quite another to be ignorant of what the price is.

pyre · on Oct 21, 2016

Well, it obviously happens when the resolver is down, but that's the situation that this logic is being proposed to smooth over. The normal day-to-day does not see a high percentage of resolvers failing to respond, or else people would be getting NX DOMAIN for high profile domains much more often.

Kalium · on Oct 21, 2016

I'm just trying to make sure we don't wind up making DNS poisoning nastier in an effort to be more user-friendly.

robryk · on Oct 24, 2016

All the attacks mentioned here seem to be of the following shape:

1. Let's somehow get a record that points at a host controlled by us into many resolvers (by compromising a host or by actually inserting a record).

2. Let's prolong the time this record is visible to many people by denying access to authoritative name servers of a domain.

(1) is unrelated to caching-past-end-of-ttl, so you need to be able to do (1) already. (2) just prolongs the time (1) is effective and required you to be able to deny access to the correct DNS server. Is it really that much easier to deny access to a DNS server than it is to redirect traffic to that DNS server and supply bogus reponses?

Kalium · on Oct 24, 2016

DNS cache poisoning is currently a very common sort of attack. The UDP-y nature of DNS makes it very easy. There are typically some severe limitations placed on the effectiveness of this attack by low TTLs. It does not require you to deny access to the authoritative server. This attack is also known as DNS spoofing: https://en.wikipedia.org/wiki/DNS_spoofing

Ignoring TTLs in favor of your own policy means poisoned DNS caches can persist much longer and be much more dangerous.

robryk · on Oct 24, 2016

Right now, to keep a poisoned entry one must keep poisoning the cache.

In that world, one can still do that. One can also poison the entry once and then deny access to the real server. You seem to be arguing that this is easier than continuous poisoning. Do I understand you correctly?

Kalium · on Oct 24, 2016

You are correct in your assessment of the current dangers of DNS poisoning.

I am in no way arguing about ease of any given attack over any other. I am arguing that a proposed change results in an increased level of danger from known attacks.

I'm arguing that the proposed change at hand, keeping DNS records past their TTLs, makes DNS poisoning attacks more dangerous because access to origin servers can be denied. Right now TTLs are a real defense against DNS cache poisoning, and the idea at hand removes that in the name of user-friendliness.

robryk · on Oct 24, 2016

The way I read your argument, it relies on denying access to be cheaper or simpler than spoofing (X == spoofing, Y == denying access to authoritative NS):

You are arguing that a kind of attacks is made more dangerous, because in the world with that change an attacker can not only (a) keep performing attack X, but can also (b) perform attack X and then keep performing Y. If Y is in no way simpler for the attacker why would an attacker choose (b)? S/he can get the same result using (a) in that world or in our world.

Am I misreading you or missing some other important property of these two attack variants?

Kalium · on Oct 24, 2016

I believe you may have failed to consider the important role played by reliability.

X cannot always be done reliably - it usually relies on timing. Y, as we've seen, can be done with some degree of reliability. Combining them, in the wished-for world, creates a more reliable exploit environment because the spoofed records will not expire. The result is more attacks that persist longer and are more likely to reach their targets.

Such a world is certain to not be better than this one and likely to be worse.

robryk · on Oct 25, 2016

Indeed I didn't consider that. Thanks a lot for being patient and enlightening.

anfedorov · on Oct 21, 2016

[flagged]

bhauer · on Oct 21, 2016

I appreciate the support. But FWIW, I don't think Kalium was trolling. Although he (I assume, but correct me if I am wrong) and I disagree on the risk versus reward of extending the time-to-live of cached resolutions beyond the authoritative TTL, I nevertheless appreciated and enjoyed his feedback.

Kalium · on Oct 21, 2016

You assume correctly.

Kalium · on Oct 21, 2016

I'm afraid we're simply going to have to agree to disagree on this point. I do not share the opinion that this is a good idea with significant upside and virtually no downside. I also do not agree that none of the issues I have raised apply to the original suggestion - I believe they do apply, which is why I raised them.

azernik · on Oct 21, 2016

WRT the second attack, what they're referring to is actually DNS cache poisoning - inserting a false record into the DNS pointing your name at an attacker-controlled IP address. This is a fairly common attack, but usually has an upper time limit - the TTL (which is often limited by DNS servers).

This proposal would allow an attacker to prolong the effects of cache poisoning by running a simultaneous DDoS against un-poisoned upstream DNS servers.

romaniv · on Oct 21, 2016

Not sure whether it could be used in a legitimate attack (probably), but it can definitely lead to confusing behavior in some scenarios. You switch servers, your old IP is handed to some random person, your website temporarily goes down - and now your visitors end up at some random website. Would you want that? Especially if you're a business?

Also, "commandeering" an IP of a small hosting might be easier than you think. It depends entirely on how they recycle addresses.

cjbprime · on Oct 21, 2016

You seem to be continuing to warn against a proposal that isn't the one that was made. What specifically is dangerous about using cached records only in the case of the upstream servers failing to reply?

phil21 · on Oct 21, 2016

It doesn't take much of an imagination to attack this.

The older I get in tech the more I realize we just go in circles re-implementing every bad idea over again for the same exact reasons each "generation". Ah well.

TTL is TTL for a reason. It's simple. The publisher is in control, they set their TTL for 60 seconds so obviously they have robust DNS infrastructure they are confident in. They are also signaling with such low TTLs that they require them technically in order to do things like load balance or HA or need them for a DR plan.

Now I get a timeout. Or a negative response. What is the appropriate thing to do? Serve the last record I had? Are you sure? Maybe by doing so I'm actually redirecting traffic they are trying to drain and have now increased traffic at a specific point that is actually contributing to the problem vs. helping. How many queries do I get to serve out of my "best guess" cache before I ask again? How many minutes? Obviously a busy resolver (millions of qps at many ISPs) can't be checking every request so where do you draw the line?

It's just arrogant I suppose. The publisher of that DNS record could set a 30 day TTL if they wanted to, and completely avoid this. But they didn't, and they usually have a reason for that which should be respected. We have standards for a reason.

rickhanlonii · on Oct 21, 2016

Assume we serve the last known record after TTL.

Here's the attack:

- Compromise IP (maybe facebook.com)

- DDoS nameservers

- facebook removes IP from rotation

- Users still connect to bad actor even though TTL expired

"We have standards for a reason" is absolutely correct, and we can't start ignoring the standards because someone can't imagine why we need them _at this moment_

bhauer · on Oct 21, 2016

Yes, but there's one piece missing.

> Here's the attack:

> - Compromise IP (maybe facebook.com)

- Attacker generates or acquires counterfeit facebook.com certificate.

> - DDoS nameservers

> - facebook removes IP from rotation

> - Users still connect to bad actor even though TTL expired

I understand what you are saying, but this attack scenario is extraordinarily difficult as a means to attack users who have opted to configure their local DNS resolver to retain a last-known-good IP resolution. It involves commandeering an IP and counterfeiting Facebook's SSL/TLS certificate. As I have said elsewhere in this thread, all sites are currently vulnerable to such an attack today for the duration of their TTL window. So if this is a plausible attack vector, we could plausibly see it used now.

Kalium · on Oct 21, 2016

You're right! Completely, absolutely, 100% right. If this was a plausible attack vector, we could see it used now. And you know what? We do!

This is why some people are concerned about technical decisions that make this vector more dangerous. Systems that attack by, say, injecting DNS responses already exist and are deployed in real life. The NSA has one - Quantum. Why make the cache poisoning worse?

bhauer · on Oct 21, 2016

Kalium, I really appreciate your responses.

If my adversary can steal an IP from Facebook, create a valid certificate for facebook.com, and provide bogus DNS resolution for facebook.com, I feel it's game over for me. My home network is forfeit to such an adversary.

But I get your point. It's about layering on mitigating factors. The lower the TTL, the lower the exposure. Still, my current calculus is that the risk of being attacked by such an adversary is fairly low (well, I sure hope so), and I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

All that said, I have to hand it to you and others like you, those whom keep the needle balanced between security and convenience.

Kalium · on Oct 21, 2016

Now that I think about it more, it's even worse than that. A bogus non-DNSSEC resolution and a forged cert, both of which are real-life attacks that have actually happened, and you're done for. Compromising an IP isn't really necessary if you're going to hang on to a bad one forever, but it's a nice add-on. It removes the need to take out the DNS provider, but we can clearly see that that is possible.

Keeping the balance between security and convenience is difficult on the best of days. Today is not one of them. :/

Godel_unicode · on Oct 21, 2016

If you can forge certs for HTTPS-protected sites, this is not what you would use them for.

Kalium · on Oct 21, 2016

It's part of what I would use them for. A big, splashy attack distracts a bunch of people while you MITM something important with a forged cert? Great way to steal a bunch of credentials with something that leaves relatively few traces while the security people are distracted.

phil21 · on Oct 21, 2016

> - Attacker generates or acquires counterfeit facebook.com certificate.

So you enabled an attack vector that has to be nullified by a deeper layer of defense? And in some cases possibly impacted by a user having to do the right then when presented with a security warning.

Why would you willingly do that?

Also I do find your assumption of ubiquitous TLS rather alarming - facebook is a poor example here, there are far softer and more valuable targets for such an attack vector to succeed.

Edit: Also to keep my replies down...

> I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

You can! All these tools are open source, and there are a number of simple stub resolvers that run on linux (I'd imagine OSX as well) which you can configure to ignore TTL. They may not be as configurable as you like, but again they are open source and I'm sure would welcome a pull request :)

Kalium · on Oct 21, 2016

The policy that caused so much pain before is to take DNS records, ignore their TTLs, and apply some other arbitrarily selected policy instead. I confess, I don't understand how the proposal at hand is different in ways that prevent the previous pains from recurring.

Maybe you can enlighten me on key differences I've overlooked? How do you define "failing to reply"? Do you ever stop serving records for being stale, or do you store them indefinitely?

slim · on Oct 21, 2016

So if our resolver was on our resolver was on our laptop and had a nice UI that would work great. Now the question is : why is the resolver not in my laptop?

phil21 · on Oct 21, 2016

It can be if you want it to be, but it's probably much less interesting than you think.

You likely underestimate the sheer number of DNS records you look up just by surfing the web, and how useful that information would be to 99.99% of users.

Basically the tools exist for you to do this yourself if you are so inclined, but they may not be that user friendly since they aren't generally useful to most.

Godel_unicode · on Oct 21, 2016

Seconded. This is a common idea which occurs to people who haven't dealt with DNS before, and it ends with a much better understanding of how many things https is used for and going back to using openDNS as your resolver.

harryh · on Oct 21, 2016

Proper cache invalidation is one of the 2 hard problems of computer science (the other 2 being naming things and off by 1 errors).

vram22 · on Oct 21, 2016

Yes, and there are 10 kinds of people in the world: those who understand binary and those who don't.

gmaslov · on July 12, 2016

I've only ever seen it spelled out "theoretical CS". There is GOFAI, though.

gmaslov · on March 24, 2016

Couldn't a hypothetical Big Alien make exactly the same argument? There are plenty of exoplanets larger than Earth.

wrsh07 · on March 24, 2016

Of course they could. And they should! Statistically, if every single sentient being makes this argument, more sentient beings will be right than wrong.

To quote:

> Given the prevalence of the four different blood groups in the cartoon above, the most profitable strategy here would be to bet on "A", as that gives you the greatest chance of winning. Another way of looking at it, is that if everyone adopted that strategy, the bookmaker would lose the most money.

gmaslov · on Sept 8, 2015

The way I understand master keys work is that the lock has two "breaks" in each pin, instead of just one. So each pin has two positions that allow the lock to open, and you don't know which one belongs to the master key.

I think at least two locks would be required, and depending on how many pins and possible positions there are, more might be needed in case of a collision.

brianwawok · on Sept 8, 2015

I mean a lock comes with 1 key right? If you make a second key that opens it, but shares nothing with the default key - you may have a winner.

gmaslov · on May 6, 2015

FYI, you've got a merge conflict marker hanging out in your privacy policy :)

<<<<<<< HEAD:web/src/main/resources/com/pacifica/web/views/privacy.ftl We use this information solely [...]

beermann · on May 6, 2015

Ha, thanks for that. I'll get it fixed.

gmaslov · on Oct 19, 2014

I went to the more-difficult-to-reach exit and was sent all the way back to the beginning of the game :-(

chris_overseas · on Oct 19, 2014

[Spoiler Alert] I did this on the first play through too. I played through again and it turns out that's the last level anyway. If you take the easy option you end up in a blank level where all the winners can hang out and just draw pictures. When I was there it was 50% people expressing their joy in finishing, and 50% dick pics :)

theLearningChan · on Oct 19, 2014

That is hilarious. This game is awesome.

gmaslov · on April 25, 2014

How about looking at the number of people who must be removed in order to bring the bus factor down to 1? We can call it the "bus co-factor" :). With the bus factor we're picking the most critical people first, and removing the least number of them; with the bus co-factor we're picking the least critical people first, and removing the greatest number of them.

There's probably already some name and many theorems in graph theory for both of these ideas.

gmaslov · on Jan 9, 2014

Don't worry -- it just means your brain's eviction policy is well-tuned for immortality. ;)

Paul_S · on Jan 9, 2014

I hope my brain is putting all those reclaimed neurons to good use then but it's probably just storing pins and passwords which I remember for decades after not needing them any more. Wish I could talk to the bit of the brain that decides what should be kept.

gmaslov · on Dec 22, 2013

Text messages are only the relatively boring first step in this kind of project. If these guys go on to implement IP/Vodka they can proudly follow in the footsteps of such hacks as RFC 1149 :-) "The network smells a little slow today"

gmaslov · on Dec 4, 2013

I presume that the way the lucid dream induction works is by blinking an LED when the onset of REM sleep is detected. There are other products out there already that do this; the idea is that you'll train yourself to notice any blinking red lights or objects in your dreams and associate that with realizing that you're dreaming.

Whether it actually works or not I can't say.