I can see the point you're trying to make, but you're choosing an interesting way of conveying it, and I disagree.
Restic is a backup tool. Velero is a backup tool for Kubernetes. Vultr is a low cost but still decent "cloud" provider. GitOps is a philosophy which makes sense even on small projects.
None of those are "wrong" or "overcomplicated" options.
The elephant in the room is Kubernetes, which is indeed quite complex, and often gets used as the go-to even where it doesn't make sense [1] either because it's popular or because that's what people know, or because of the ready-made tools from others (e.g. if all you want to deploy exists in the form of Helm charts, it can save you lots of time) but it has its place and brings a lot to the table. You just have to be aware of the risks the complexity brings.
Disclaimer: I work at HashiCorp, I'm a massive fan of Nomad and think it's a better fit than Kubernetes in many cases, but dismissing Kubernetes outright is wrong.
> GitOps is a philosophy which makes sense even on small projects.
Er, does it? Root cause of this catastrophic dataloss incident is that in "GitOps" none of the traditional safety checks can be implemented. In normal sysadmin workflows, attempting to delete all your data will yield an "Are you sure?!" type message and you'll probably have to take explicit steps to confirm that this is really what you intended. There will also be dry run modes and other helpers.
Because git is intended for source code and not as a way to make stateful changes to servers, there are no features for that. If you push a commit that didn't do what you mean, it will just blindly do it.
It seems like this is a pretty major flaw in the whole "philosophy". The whole point of hacking a VCS into a server admin UI is because people think git will let you roll back infrastructure changes easily. But it cannot, because infrastructure isn't a stateless function of your git repository.
> In normal sysadmin workflows, attempting to delete all your data will yield an "Are you sure?!" type message and you'll probably have to take explicit steps to confirm that this is really what you intended
Depends on how you go about it, a wrong pathed mv could erase your data without a warning.
> Because git is intended for source code and not as a way to make stateful changes to servers, there are no features for that. If you push a commit that didn't do what you mean, it will just blindly do it.
There are no features in git itself for that, but in the way I usually implement GitOps, with a CI/CD system, you can easily have a manual "check this terraform plan's output to make sure you aren't doing anything crazy, and manually click this button to approve" step.
GitOps workflows can have safety gates that require a manual double-check/approval step, for instance if the deployment plan is substantially different or if deletes and loss of data would occur. They weren't implemented in this case, but that doesn't mean they can't be implemented. They probably should be implemented as a takeaway from this article.
Source control is great at auditing why a change occurred. (Keep in mind some of those changes include things like bad merge accidents and sloppy refactors. Source control never guarantees a perfect state of the code at any point in time, only a saved state.) Source control can also give you an estimate in how big of a change occurred (diff size, number of files changed/moved/deleted). You can use those same tools in the process of a GitOps workflow and in setting smart manual gates, not just in post-mortem root cause finding when things go wrong.
> Because git is intended for source code and not as a way to make stateful changes to servers, there are no features for that. If you push a commit that didn't do what you mean, it will just blindly do it.
git doesn't do anything (except keep versioned source of declarative state).
Instead, have a look at your state engine, and make it as safe as you care to.
> because infrastructure isn't a stateless function of your git repository
So we agree, there's your problem. Not git, but the state engine function.
While you don't need K8s, for GitOps to work you do need a structured approach for operating your infrastructure. These concepts can help:
“It is relatively easy to manage and scale web apps, mobile backends, and API services right out of the box. Why? Because these applications are generally stateless, so scripts can scale and recover infrastructure from failures without additional knowledge.”
“A larger challenge is managing stateful applications, like databases, caches, and monitoring systems. These systems require application domain knowledge to correctly scale, upgrade, and reconfigure while protecting against data loss or unavailability. We want this application-specific operational knowledge encoded into software … to run and manage the application correctly.”
Start by looking at the Operator Capability Level diagram here:
> If the application is containerised, like most are these days, how do you propose running these on said VM?
It really depends on how you implement it. In simplest setups - yes, you can destroy infrastructure with data and when you recreate the infra the data is gone. But the implementations I worked on were specifically designed to withstand this kind of problem.
> You just have to be aware of the risks the complexity brings.
I'd say most are not aware. It is not enough to say that people should "just" be aware. That is something said from a position of knowledge and awareness, which helps nobody.
Completely agree with the parent comment, simplicity is the first thing that should be reached for. K8s ought to be dismissed by default, because then you have to justify its inclusion. That's probably a way to increase awareness before plunging in.
Nomad is honestly really good. For a long time I've wished that it had a wider reach because I just outright don't want to go back to Kubernetes after using them both.
I'm really fearful that the recent events have harmed the chances of that happening though. It's a shame so I hope that isn't how it all happens.
> The elephant in the room is Kubernetes, which [...] often gets used as the go-to even where it doesn't make sense either because it's popular or because that's what people know
Or because folks want to pad their resume with k8s, which is what I'm seeing at work most of the time.
Restic is a backup tool. Velero is a backup tool for Kubernetes. Vultr is a low cost but still decent "cloud" provider. GitOps is a philosophy which makes sense even on small projects.
None of those are "wrong" or "overcomplicated" options.
The elephant in the room is Kubernetes, which is indeed quite complex, and often gets used as the go-to even where it doesn't make sense [1] either because it's popular or because that's what people know, or because of the ready-made tools from others (e.g. if all you want to deploy exists in the form of Helm charts, it can save you lots of time) but it has its place and brings a lot to the table. You just have to be aware of the risks the complexity brings.
Disclaimer: I work at HashiCorp, I'm a massive fan of Nomad and think it's a better fit than Kubernetes in many cases, but dismissing Kubernetes outright is wrong.
1 - https://atodorov.me/img/nomad/kubernetes.jpg#center