When I was solving this problem about a year ago for a previous company, I got around this by having a simple health checker that killed any node not properly responding to dns/docker/etc queries, and automatically replacing it with a new node.
Granted we were using mesos not k8s, but I suspect a similar approach could work here too.
Are you saying this doesn't work?