I work in a big corp myself. If you worked on cloud foundry you probably worked ...

gtirloni · on Dec 9, 2018

> And that's the current experience.

Maybe that's your experience? I haven't had any of those problems in moderately sized on-prem k8s clusters.

yebyen · on Dec 9, 2018

I think I've had that experience. I wouldn't generalize and say it's "the current experience," and certainly not 50% of the time, but I can say I have had this experience at least once on every cloud provider that I've used Kubernetes with to any degree of depth. Something goes wrong, and it's outside of your control, you just have to wait for them to fix it (or sometimes I guess, you can just pick a different AZ and try again.)

It's almost just enough to make you wish for a Hybrid cloud. If you don't have Kubernetes experts on staff, you shouldn't be trying to manage your own Kubernetes on-prem. It won't be surprising that this is the experience 50% of the time – honestly that has to be why managed services for Kubernetes are evidently becoming so popular.

Managed services have these issues some of the time too, and if your managed service has those issues often enough that you'd want to talk about them, you better find a different managed service. I think AKS and GKE have had those issues, but they are rare. Don't know personally about EKS.

I think some people interpret the promise of managed K8s services to be that you don't need to have someone anymore on-staff who is the expert in how things might go sideways on Kubernetes.

On the contrary, you still need that person to be able to take advantage of Kubernetes with confidence, but maybe now thanks to (whatever vendor you chose for Kubernetes) you simply don't need to have an actual department of 6-12 of those experts focused on only doing that (managing Kubernetes.)

A result of using a real, stable managed K8s service should be that on a day-to-day basis, those people won't actually need to run around with their hair on fire doing ops things just to keep the business going. Automation.

With on-prem, maybe just don't expect it to come like that out-of-the-box; if your whole team is still new at this they're going to need training and planning and ramp-up to get it to that stage. This is exactly how you get "it might be better for us to start with a managed k8s service."

gtirloni · on Dec 9, 2018

I was replying to a specific list of problems:

> usually people spend their time with having DNS not working, having servers hanging, having etcd nodes not talking to each other without explaining why, having deployments say "SUCCESS" while actually not running

I'm not saying k8s is free of problems.

yebyen · on Dec 9, 2018

Not disagreeing with you, and those are all "noob problems" from my perspective as I've done all of those things wrong before myself, and I myself am a noob. But they are some actually very complex problems (and you can even have them on your managed platforms once in a while, too.)

Self-healing is great when it works, and those items you listed are all real problems. But they are not problems that you should expect to encounter on a managed service, at least hopefully not more than once. (YMMV, amiright?)

yebyen · on Dec 9, 2018

> I haven't had any of those problems in moderately sized on-prem k8s clusters.

Just out of curiosity, how are you managing your on-prem k8s clusters? (Is there a toolkit you'd recommend using? Kubespray?)