Ex Amazon here. Most grumpy system engineers did not disappear: we got hired by Google/Amazon/etc to build large-scale infrastructure... and sometimes sell it back to you as a service.
Believe me or not, most of the underlying infra does not run on the popular technology of the year. Far, far from it. That's why it works.
Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.
Grumpykins here. I think the term "Modern devops" sort of nails it but not quite how you used it. Most departmental/enterprise sys admins/engineers of lore that had even the slightest necessity for life outside the box scaled anything resembling automation to its breaking points. Combined with knowing and serving the reasons for their existence - developers, users etc., and devops is nothing new - it is now simply the necessary manifestation of progress at scale (albeit positively devoured by managers speaking business, not entirely unlike "agile").
"Use what works" definitely presents a lot more choices these days and likely will forever more.
"Use what works well" is something different where "well" implies helpful, dependable, predictable, manageable and so on that will continue to scale with your needs. Only breaking things down the "old-school" way will lead towards success, stability, security and life outside the box.
Good devops is still, primarily, good engineers engineering good things, for themselves and others.
Granted the article is from 2015 but my impression is author is not just cranky, but scared.
Agreed. DevOps has been a thing for a long time. The funny thing is that the core of the DevOps philosophy —- to unite development and ops through code —- is still a rarity. SO MANY big companies have entire departments for DevOps that are basically either developers doing release management sysadmins writing pseudo code for infrastructure while making it somehow less accessible to development teams.
DevOps is uniting dev and ops through code?
I believe it’s at first and most important to unite them through good(early, often, honest, etc.) communication and collaboration instead of “you broke x” and “you have to fix y” as well as “this is the other (dev or ops depending who says it) departments fault/task is the most important thing in DevOps. The code is just a tool to make this collaboration easier / automate it where the other things already happened and all understood they work on the same goal and not as enemies.
A good sysadmin would not look like they are doing much work (everything is humming along and can self-heal minus physical problems), but a good devops person is constantly busy.
I sort of agree but a good sysadmin was never idle on the inside. I'm seeing good devops people getting worked well beyond what I'd consider reasonable expectations i.e. "Oh, look, you can do everything! Here's everything!".
They're being perverted into a role having a full load of pure operations with shit for processes (and, often, systems) and an expectation that you have time to automate and shore up all the shit and technical debt accumulated since.
Can most good or even extraordinary developers simultaneously be elbow deep on a dozen unrelated products and actually get reasonable traction? I can barely keep one glass castle together, myself.
This is exactly my sentiment and why I moved away from SRE and back to SWE. I felt busy all the time doing development of tools and infrastructure while at the same time aggregating the role of operations.
Never having the time to properly finish a project that I was proud of delivering, turning those into services so we could leverage self-servicing was a dream that most of the time never happened, we were left with half-done systems requiring tons of manual intervention (lots of toil) while having to move fast to the next thing...
I think a lot of data engineers and transcoding folks would have similar reports. But you’re right; the problem with DevOps is the reach of their usefulness. If your whole company is built on code, your DevOps team will always be overworked and under appreciated.
Logging is your friend here. You can spend days scrolling through logs, doing an occasional grep and making disapproving noises occasionally. Bonus points for developing some graphs for the next meeting.
What fascinates me about this is, and sorry for being morbid, but what happens when y'all die? Does knowledge of the lower levels of the stack go away with your generation, or will there be enough of us young ones picking the important stuff up?
There's rarely anything old school sysadmins have learned that hasn't come from experience.
Been there, done that, fought that shit the first time. And the second. And the third. (it's amazing how often I find myself solving what are essentially the same problems over and over.) It's one reason why you'll find we'll push back on the "ohh shiny". There are many wonderful and fascinating things coming out. Tech is an amazing field to be working in. But it's also ridiculously frustrating because no one pays any attention to _why_ things are done the way they are, or _why_ approaches haven't worked in the past (I'm all for re-introducing past failed approaches, as long as there's evidence those reasons have been investigated)
You'll find a common trend amongst us in that most of us sort of ended up in the role accidentally. Schools and college teach you to become developers. Few people tend to head to college with the view of specialising in the ops side of things.
Even speaking as a comparatively old-school sysadmin, my strengths come from being flexible and adaptable. What I do today is nothing like what I was doing 5 years ago, and what I did then is nothing like what I was doing 5 years before that, and so on down the line. The field is constantly in flux.
I just have the best part of two decades of experience to both anticipate the problems, and be able to get to diagnosis quicker when things do go wrong.
Even as the older sysadmins die off I'm fairly confident there will be newer ones to replace them, because people are going to continue to learn from the problems they run in to.
Ansible is one "ohh shiny" thing that has greatly increased my productivity as a sysadmin. Before that I would automate what I could with ssh and pdsh and scripts, but it was never as well polished as Ansible.
I'm even using ansible for ad-hoc stuff (tweaking a config, restarting a service) because it's easier to do that from a management server than log in to some remote host, get oriented as to the OS distribution and version, and run commends in the shell there.
I like ansible with vagrant as well - it makes for a nice clean way of deploying to development environments while also been nicely 'self' documenting and not limiting (you can drop back to shell), it's a lovely tool for the most part.
Edit: The thing I really like about Ansible is how unsexy it is, it's just a nice sane way of doing largely what you could do yourself with ssh and bash but in a language that doesn't make you want to cry.
I've been around linux since the 90's and Ansible feels comfortable, predictable and stable - what you would want in a piece of software that can be mission critical in the most fundamental sense.
I think I achieved my perfect balance of tooling for current systems with Ansible and Docker.
Ansible automates even provisioning in AWS. I never really liked CloudFormation's way of creating stacks, so I began to use Ansible to document the application's stack, have used it to deploy systems running in EC2 with RDS, ElastiCache, SQS, SNS, DynamoDB, etc.
After provisioning/configuring I'd end up with an instance in EC2 with Docker installed and from there our CI/CD would just trigger the deployment playbook that simply would do a `docker pull` of the version tagged for release and start the container.
Ansible helped to also install our Splunk forwarders as running it from Docker was a hassle still not so long ago, so we would have the best of both worlds: configurability of the host machine completely with Ansible and packaging and predictability of deployment through Docker.
I advocate this stack as simple enough to learn and use with widely used tools without their fancy (and often broken) features. Even though they can be still a bit immature, they are production-ready enough.
Ansible? I could see using it to build a container that then got "orchestrated", I guess, but ... hmmm. I've never really looked at them as doing the same thing. (Nor have I looked at either Chef/Puppet for CM. Maybe I'm just stupid... My ex- certainly thinks that I am...)
I used puppet -- unhappily -- for a while before discovering SaltStack.
Salt, like Ansible, is a response to the Puppet/Chef hegemony -- ruby and DSL's and tacked-together bits that are a nightmare to install and upgrade in themselves).
I'd suggest Salt did some things much cleaner than Ansible, and while it can be an orchestrator, and deployment system, it also excels as a configuration management system ... but I think typically people are talking about configuration management systems that are tightly coupled with more formal change management systems (of which I've found pretty much none that work well).
It's a legit concern. There was a NANOG panel about this exact thing. I believe the quote was, "Take a look around. We're all old and greying. We have a severe pipeline problem." And then much to AWS' dude's dismay, the topic shifted towards blaming cloud services because no one takes the time to learn how any of this works any more.
Want to guarantee your child's future employment? Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.
I’m going to teach my children how to navigate the world of insane Harry-Potter-esque rules which all IaaS/PaaS platforms enforce upon you. They will become software language lawyers and be masters of the electric Disney dollar.
You know like “ahh don’t call the messaging endpoint more than 800 mega-milli-times per mega-nano-second or it will cost you three bazillion CPU credits, but only on three and a half cores which will starve all your instances, issue an invoice and proceed to melt your credit card.”
> Teach them how to build networks and truly understand network protocols.
I don't know how the situation is in the US, but in my country network engineering is actually quite a popular field of study. (we have college level education in network engineering).
The one thing that stands out though is that it's mostly done by youngsters who have either sysadmin experience, or worked in IT before that. Almost everyone who comes from high school goes into Software Engineering.
I think this is mainly because networking is quite an invisible field so to speak. Many people don't even know your job exists, and many young people only see the shiny hip side of it. (being Software Engineering).
Being good at network engineering is hard, especially once you get past entry level work and actually start being responsible for designing large-scale networks. Mainly because building a network is a major financial investment where garuanteeing performance is hard without either a ton of experience, or a shitton of lab time.
> Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.
Building networks and understanding protocols is something machines can do TODAY. The entire internet was built to survive a nuclear war. It can reshape itself, and it most definitely understand network protocols. By definition, protocols are the language of machines over the network. And it's been like this for a few decades.
The only reason knowing low level network protocols in a world where machines can code anything (which makes them better analytical thinkers than humans) is to beg the machines for mercy in their ancient tongue.
It can reshape itself, and it most definitely understand network protocols
It really, really can't! That quote about the Internet interpreting censorship as damage and routing around it? Or the one about information "wanting" to be free? Taken wildly out of context.
Routing protocols convey state information between routers, but really that is just table stakes.
So, what do network engineers do?
Classically, Set up and troubleshoot those systems.
Currently, transforming from manual work to building systems to deploy, monitor, and remediate routers and such.
In other words, the same stuff sysadmins->(SRE|PE) folks do and undergoing a similar transition.
Then what about routing protocols such as RIP, OSPF, etc?
Don't forget BGP. I don't think they do what you think they do, at least, not to the extent you think they do them. There is a hell of a lot of manual work in running any sizeable network even within a single organisation.
And what exactly do they do that I don't understand?
> There is a hell of a lot of manual work in running any sizeable network even within a single organisation.
I'm not trying to say they do everything, and no manual work is required. I'm trying to say machines are already doing part of it. OP believes that give a world where machines can code, they can't design or maintain networks, which I find truly ridiculous, since machines are pretty far from doing any kind of "coding" today, but they do networking and network protocols pretty well.
since machines are pretty far from doing any kind of "coding" today
The first attempt at a system to turn plain English that even managers could write into executable code was 1959 - COBOL. So you're right in a sense, even 59 years later - but also wrong if you think networks are any more advanced than this. The Internet really cannot "reshape itself" and probably never will be able to.
RIP and OSPF are interior routing protocols---that is, they're used for routing within an organization (or autonomous system in Internet lingo) and deal with technical routing issues (fastest link, most bandwidth, etc). BGP is for routing between organizations and deals with political issues than technical issues (we need to send all traffic here due to contracts, unless it goes down, then shift traffic over there; and refuse routing information from such-n-such organization because they don't have their act together).
OP was talking about a future where developers become obsolete because machines take over the development sector. Do you think writing protocols will be something humans will do better than machines?
You lose and gain and you should always be
mindful of what is coming. Humility is good.
I love you young guys, your ideas keep coming and
they are mostly good.
Same problem as making sure your system doesn't lose data when a server dies. Make sure you have enough copies of the knowledge by propagating it between people. Try to have some kind of offline recording (books?) for recovery from a disaster where you lose everyone. Have an idea of how to recover at a business level if you do lose the data forever.
The trouble is making sure these plans actually work. This is why Netflix randomly execute some of their employees every month.
That's part of the reason why my last two hires have been at the beginning of their career. For both of them, it was their first major sysadmin responsibility after having jobs involving tech support and occasional Linux experience.
The key is to pick smart people who are good at learning and find complex systems interesting. Then, of course, you need to have interesting projects for them to work on.
It's a very real concern. We have a thing called a "bus plan" for all of our tech employees (6 of us - small non-tech company). It basically attempts to cover everything that we would need to know if one of us gets hit by a bus.
What makes you think there aren’t young people doing systems administration? Our last two hires in my most recent job were 23 and 27 respectively. Sure, they’re getting trained in the new hot cloud stuff... as the grumpy seniors figure it out first and set patterns... but they are still doing daily work with some rather ancient stuff.
I'm not saying there are no young people doing sysadmin. What I'm trying to say is that if the new 'infrastructure' that all sysadmins learn is not an open UNIXy system where you can grok all the internals if you want to, but closed systems owned by 2-3 major cloud players, then we kinda maybe have a problem in 20 years?
Of course, one can argue that that will just cause a new wave of openness and the cycle continues.
As a "young" (30) sysadmin/devops dude I think that open, Unixy system is Kubernetes. I can take an application, dockerize it, write a helm chart and run it anywhere.
The risk is in treating anything as a black box, whether its a managed service or a container you pull from Dockerhub. It's something you'll get burned by eventually and need to learn from experience.
Yeah, and my point was that we start the young people that we hire on the open stuff. Then we move them up and on to the other open stuff, which runs on top of the cloud vendors.
(As almost everyone else points out, the closed cloud vendor stuff is nowhere near flexible enough for most moderately complicated use cases unless you’re running at serious scale.)
My company seem to split responsibilities based on age. 40/50 year olds deal with oracle Linux, AIX and Solaris. Under 30's hires are more focused on cloud and mobile. We're all expected to have a footing in Windows and Oracle DB.
CFEngine is basic text manipulation, it's not comparable to the rest.
Puppet and Chef was the first generation. I wouldn't recommend. All the companies and people I know using Chef migrated away from it after many disasters. Nowadays, it's only mentioned in interviews to find out if candidates have real world fire fighting experiences.
Ansible is good. Used that for managing hundreds of machines at multiple jobs (some who migrated from Chef). It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.
Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.
> Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.
saltstack is a well thoughout solution in my opinion. It makes more logical sense and is less of a mumbled mess then either chef or puppet and has miles better performance then ansible.
I know quite a few shops who use it. Its definitly smaller then ansible.
+1 for salt -- I wish it had better docs or examples of how to build out a larger system; it's hard to start with imho even if you know ansible well. The existing docs read like man pages without the helpful examples even.
At the last gig, I wrapped salt deploys with a small Slack bot, so users would fire deploys from Slack; you could see what was going out and who was pushing. It was a very very nice, simple, fast solution that should scale to hundreds of machines easily.
SaltStack is around. Lots of big orgs take the time to understand. Ansible is more popular because you can use it with just one playbook. Saltstack requires you to think about your environment and design your configuration management properly.
I use salt. Multiple thousands of machines. I feel like I've barely scratched the surface of what it can do with it. I wrote some custom utilities for it. Added some functionality to handle physical deployments of an OS with redfish (the new iLO/iDRAC api).
Salt is not without warts but its definitely worth checking out.
CFengine, at least version 3, was probably the furthest away from string manipulation (and I was given the impression that text file content manipulation was considered a bad idea with it). What killed it was promise theory, which is actually a great theory and works quite well but made writing the bundles painfully hard and also hard to maintain. Also during the early days of v3 it was probably lacking a ton of essential functions so even if you were trying to do things the right way you would bump into feature limitations. I think this put a lot of people off adopting it widely and why Chef and Puppet did so well.
Puppet and Chef is actually quite good and I still prefer it to Ansible for a number of reasons. I've certainly run it fine in environments of many thousands of servers, though I can understand that it can implode for some people at scale if they design their deployments in a certain way or structure their manifests/cookbooks a certain way. That said I've certainly seen Ansible fold on much smaller infrastructure, but that is also down to a number of factors that can be avoided or mitigated. Idempotency with Puppet is really strong which is something you want if not every single system in your environment is ephemeral, with Chef it's almost as good with that but not always with the first run, with Ansible you have to specifically consider and aim for it when writing code for it in your playbooks.
The fact that you get used to having Chef or Puppet run e.g. every half an hour is a good thing, where Ansible runs are more ad-hoc. This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks. They have a lot of things in common, but Puppet is more for managing and ensuring changes in an idempotent, non-conflicting way while Ansible is more for doing something a bit like that but more for ad-hoc or orchestration tasks. I think it's good to use both but also be sure what you use it for, since one can do a bit of what the other thing is good at but doesn't do it so well.
For example, I would consider using Ansible to do deployments and releases, rotate SSH keys, execute failovers, or even to install the Puppet agent for the first time. I would use Puppet to deploy and update monitoring agents and configuration, user access, ensure directory permissions, configure system things like rsyslog, logrotate, Postfix, ntp, etc.
> This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks.
That's mainly because Ansible folks advertise it as a configuration management
tool, while in fact it's a deployment tool. The former needs asynchronous
operations, especially because a node that is supposed to be reconfigured can
be temporarily down. The latter needs to be executed synchronously, with
reports being read as they come by an operator.
There are several other operation modes that are useful for a sysadmin, like
running a predefined procedure with parameters supplied from the client, or
running a one-off command everywhere (even on the servers that are currently
down, as soon as they are up), but we don't have many tools to cover those
cases.
I make my living as a CFEngine consultant. CFEngine runs every 5 minutes (it's lightweight enough to do that). The evolution was: CFEngine 1 ran once a day; CFEngine 2 ran once an hour; CFEngine 3 runs every 5 minutes. Self-healing infrastructure.
The concept of self-healing is a bit weird for me. Surely you want to investigate the cause before it heals?
Funny that we have tools like tripwire which have the opposite idea of the world.
My dream would be to have both functionalities in a single tool.
Bidirectionality! If you solve a problem on one machine you could pull that fix then push the same fix out to other machines as a preventative measure.
> Ansible is great. Used that for management hundreds of machines at multiple jobs. It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.
A lot of folks I know have been bitten by Ansible's performance (Ansible has a central master that runs recipes on each node, rather than having nodes "pull" from a central master).
Ansible has a pull mode that can be turned on. There are some trade-offs with it from the normal operating model, but it's there when you get large enough to need it.
Ansible has a very, very low barrier to entry. You go from 0 to 100 in a very short time. It makes a lot of sense to use it when you just begin building your infrastructure.
Later on you can run Ansible Tower, deploy Ansible agents everywhere, and basically use Ansible under the same client/server model like all the other tools.
Salt is eerily similar to Ansible, it's just geared towards client/server. Being experienced with Ansible, it was weird at first to use Salt because everything looked familiar, yet slightly different.
Yes. The host will run 100% CPU to handle the hundreds of SSH connections.
I've been re configuring 300 to 800 hosts many times a day, never had a problem. I think it would take a few thousands hosts for the performance to be noticeably slow and I am really not sure that other tools or systems could take it much better.
I know our SREs once screwed the config for sshd, and considered themselves very lucky that they had puppet on the machines and could push a fixed configuration (if they had used exclusively ansible, that'd be the end of it - no way to connect or to deploy new configuration)
[edit] To clarify - ansible is great, and we use it. Just saying that, as everything, it still has (sometimes subtle) downsides in various scenarios. If it works well for you - great, but maybe others really were bitten by it.
There's nothing stopping you from having a sshd instance dedicated for use just by ansible, on a different port/different network, on every node. Now if that's simpler or more complex I don't know.
But "have two ways in" is a basic principle of sys admin (typically via traditional network and some out of band console access).
When i worked with physical machines, they had embedded management systems, which were on a physically separate network to the machines' main interfaces, ran a little embedded SSH server, and would (amongst other things) give you a console on the machine.
Simpler machines should still have serial consoles, and you can get those on the network via a terminal concentrator or a serial-to-ethernet adaptor.
I would love it if Ansible could control machines over an interface like that, rather than via SSH. Then you wouldn't even need to run SSH on machines which don't need it, which is most of them.
> Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.
Wrong. Teach your sysadmin not to overload a single service with different
functions (debugging channel, user-facing shell service, running remote
commands, file upload, and config distribution channel), especially not the
one that should not be used in batch mode, without human supervision.
When you write an application, you don't put a HTTP server in database
connection handling code, but when it comes to server management, suddenly the
very same approach is deemed brilliant, because you don't run an agent (which
is false, because you do, it's just not a dedicated agent).
Good heavens, no! You'd only have two different instances of the same service
that is difficult to work correctly with.
For serving as a debugging channel and user-facing shell access, SSH is fine
(though I've never seen it managed properly in the presence of nodes being
installed and reinstalled all the time). But for everything else (unattended):
* you don't want commands execution, port forwarding, or VPN in your file server
* you don't want remote shell in your daemon that runs parametrized procedures -- but you do want it not to break on quoting the arguments and call results (try passing shell wildcards through SSH)
* you don't want port forwarding and remote shell in config distribution channel; in fact, you want config distribution channel itself to be reconfigured as little as possible, so it should be a totally separate thing that has no other purpose whatsoever
* you don't want to maintain a human-user-like account ($HOME, shell, etc.) for any of the above, since they likely will never see a proper account on the server side; you want each of the services to have a dedicated UID in /etc/passwd, own configuration in /etc/$service, own data directory, and that's it
Each of the tasks above has a daemon that is much better at them than SSH. The
only redeeming quality of SSH is that it's there already, but it becomes
irrelevant when the server's expected life time gets longer than a few days.
Yes, because everybody knows that testing eliminates all bugs.
(it's not that testing is useless - far from it; but I thought the HN crowd knows better than to respond to issues with "that's because you didn't do enough testing!")
I'd venture to say you're wrong about Salt. It's being used at some large enterprises. I use it (in one of the large tech companies) on thousands of servers, with plans to up that an order of magnitude or more. Of all of the solutions mentioned, it has been the most powerful, while also being the most scalable.
Other than that, my experiences line up with yours almost exactly.
I love SaltStack, its more of a python framework for managing systems over ZeroMQ than it is pure configuration management. Compared to Ansible it's more complex but faster, reactive, and significantly more flexible. I'd highly recommend it over Ansible for larger environments. For smaller ones, it depends on if the steeper learning curve is worth it.
Of all the tools, I first heard of Puppet first and so I'm assuming it was first on scene? From my limited experience, it seems Puppet is most widely used tool because of that reason. Not necessarily the best of the bunch, but first on the scene. Considering the effort required to roll it out, I am assuming whatever is deployed first will stay as the tool of choice.
I've tried out Puppet, SaltStack, and Ansible, in that order.
What I didn't like about Puppet is that once you deploy a change, the actual change can happen on the "client servers" at any point within next 20 minutes. I may be off on the exact duration but I remember that changes were deployed at any point within that range of time. To me that sounds like not a great idea. What if you want to switch over your web servers at a specific moment? And Puppet requires a dedicated command/control server.
Next I tried SaltStack. I liked it enough. Now that I think about it and hear someone else mention it, yah SaltStack is similar to Ansible. What drove me away from SaltStack was that you essentially need a dedicated command/control server from where all SaltStack commands are sent out to SaltStack "client servers". I did not want to dedicate resource (and money) for a server that is rarely used. And the personal web/lab servers I manage can grow small/large from 2 servers to 10 servers.
Next I tried Ansible. I think Ansible is the perfect choice for me. I only needed to 'devop' just a handful servers and also learn a tool that many businesses seemed to want on resume. So I picked Ansible and it's been great. Some operations are not as flexible as doing it with a shell script (and I assume same issue exists for other tools). But I've had good luck combining Ansible with little bits of shell script to get the result I need.
The best part of Ansible is that any Mac or Linux machine can be used as the "command server", provided that you have the SSH key pair on your Mac or Linux machine.
Lastly, some may not like the ad-hoc way of doing things on Ansible, but I prefer it that way.
I first heard of Puppet first and so I'm assuming it was first on scene?
CFEngine was first, it's based on a kind of maths called "promise theory" and it solved the problem of you had many different kinds of Unix owned by many different groups and had to have a consistent way of saying "all machines belonging to group X need to have user Y and package Z" and it would abstract away the slightly differing syntax between Solaris, SunOS, IRIX, OSF/1, Ultrix, yadda yadda. This is a problem that doesn't really exist anymore.
Chef I think came next, it was written by people who knew Ruby but didn't know maths so they used CFEngine terminology like "converging" but Chef doesn't really do that, it just runs Ruby scripts. If CFEngine was a scalpel, Chef is a mallet. Chef and Puppet are related somehow, same group of devs had a falling out and went their own ways, something like that. They are much of a muchness.
Ansible is cool because it recognises the reality of why CFEngine isn't relevant nowadays: most organisations are running just one particular Linux distro so you can do away with the abstraction and get all the benefits without the complexity.
> it's based on a kind of maths called "promise theory"
Promise theory is not math, despite its name. It doesn't predict anything, it
doesn't explain any phenomena. It's an architectural approach. Brilliant, led
to a really great software (CFEngine), but it's not "maths".
It's not "maths" like arithmentic but it's "maths" like graph theory:
Promise Theory, in the context of information science, is a model of voluntary cooperation between individual, autonomous actors or agents who publish their intentions to one another in the form of promises. It is a form of labelled graph theory, describing discrete networks of agents joined by the unilateral promises they make.
> It's not "maths" like arithmentic but it's "maths" like graph theory
It's less like graph theory and more like inversion of control: an
architecture, not a set of theorems and their proofs. Even Burgess' own book
you mentioned is nothing like a mathematical handbook.
I'm a great fan of Mark Burgess and his promise theory, but calling it
a mathematical theory or a mathematical domain is simply incorrect.
> [...] the actual change can happen on the "client servers" at any point within next 20 minutes. [...] What if you want to switch over your web servers at a specific moment?
You don't. Configuration management is a wrong operation mode for
a synchronous change. Still, you could order all your Puppet agents to run
their scheduled operation now instead of leaving it waiting for its time.
Ansible all of the way. Chef and Puppet have too much overhead in comparison. Ansible is agentless. You can either use a centralized server for deployments or you can have every instance configure itself. Also, Ansible is YAML based, which is a strength and a weakness.
Chef is a runner up. Love their community and Chef is pretty straightforward once you learn the lingo.
Puppet doesn’t really work for modern Git development workflows (Hiera and r10k are duct tape) and testing Puppet is kludgey. Also, most of the docs you’ll find for it stopped getting updated in 2015 or so.
I've used chef, ansible and saltstack in small startup and large scale enterprise environments.
Ansible is just about the easiest and most flexible thing going, but once you hit "very large scale" you're going to get bit by its performance and start worrying about when you actually update things. Ansible Tower starts to look good then, but it's not the well-walked path and brings you all sorts of other issues about how you distribute secrets to bootstrap things.
Chef is kind of nice when you don't have a lot of environments that you need to manage and about as flexible as you need it to be in those situations.
SaltStack shines when you really have a lot of heavy lifting to do and the Event System, Beacons and Reactors will honestly blow your mind with the complex things you can achieve in a way that's simple to reason about and maintain.
That said, there's really like 3-4 majorly different ways you can (or would want to) use Salt and understanding it and its documentation is a large cognitive investment. You will likely run into major pain at some point down the road if you choose to use it. I would only use it again if I had a really good reason to -- pretty much if there's no other alternative. I would not at all bother using it to try and do typical sysadmin automation tasks.
Strange side-note: The best managed Salt environments I've worked in or looked at were all masterless, whether at small or massive scale. It's my probably-wrong opinion that traditional master/minion SaltStack is always going to cause you enormous problems eventually when you need to either scale out or pivot on something.
Bash is everywhere but in small quantities. If it's more than a page of bash, it's probably time to rewrite in something with stricter rules and fewer surprises, or better libraries, or both. Perl or Python is quite common at that level.
> Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.
Ironically, modern devops and its million broken tools are a primary source of revenue for cloud providers, helping pay for your lunch in the first place.
You might be on the point, efficient designs are bad for cloud providers. On the other hand, shitty designs that get hacked are bad PR for them.
Thing is, none of it matters. Bottom line matters.
They care to attract company decision makers. Decision makers are engineers in small to medium businesses, and managers in big ones. Sadly, its the big ones that matter for bottom line. So target is mid management, flashy power point presentations and 'conferences' that allow for justified travel and stay.
Good mid management, with tech background, exists, but is a minority.
Not all is lost, truth is out there (c) Mulder :))
The clearest explanation of why this happens is at the end:
Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!
1) The merging of devs into the sysadmin role was a product of:
the work of sysadmins (particularly systems change control and security compliance) not being valued in our culture.
2) Devs delighted to be free of the shackles placed upon them by sysadmins who were encumbered by the concerns expressed in this article.
If you were a devop who resolved to fix the problems bemoaned in this article, my guess is you would turn around in 60 days to discover you'd become a sysadmin.
The stated goal of putting both systems administrators and software engineers on the same team is to reduce friction and increase communication. One of the worst, productivity-killing situations you can find yourself in when developing network software and services is caused by the traditional "old school" mentality of separating the two camps. When your software developers operate independently of your systems engineers and administrators they're forced to make assumptions about infrastructure, operations, and compliance goals. Both teams have the same goals so why are they not on the same team? I think some "old school" system administrators don't realize how costly such communication mistakes are. Getting 6 months into a development project to be told you cannot have a critical piece of infrastructure _for reasons_ is a costly, costly mistake.
Containers are a smart solution to the build problem. Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.
>Don't build your containers from public, un-trusted images!
The author is complaining that you can't build these private trusted images. Software developers have got it in their head that containers are a way to package & distribute software. They're not, that's what the OS's package managers are for. If your software requires Docker as a build dependency, you have failed to properly package your software.
As a concrete example look at Ubiquiti's UNMS.[1] Their package consists of downloading & installing Docker binaries on your system, not tracked by the OS package manager, and then running a bunch of containers built from these public un-trusted images you just told me not to use.
They also conveniently ignore the fact that I already have a Redis server, I already have a PostgreSQL server, I already have an NGinx proxy. (Plus I guarantee my database servers are better tuned for my hardware than some random image from Docker's library.) It is not up to some random software developer where I should be drawing the isolation boundaries on my infrastructure. They also make the big assumption I want to use Docker to manage my containers in the first place. Perhaps my company already uses Solaris LX-branded zones, or LXC, etc.
Now imagine if instead of spinning up a PostgreSQL database container, it used MS SQL as it's database of choice. You think I'm going to let some random developer dictate whether or not I should spin up another SQL Server instance and pay MS for another round of cores / CALs?
Yes - you can build your own containers, and they're fantastic - if software developers properly package that software for ease of installation & configuration. Software developers should not be dictating what container/virtualization framework I use, what configuration management I use, etc.
There are public trusted images, like the so-called official repositories on Docker Hub [1]. As long as you build your images based on official repo images, you're probably fine. Just don't depend on untrusted images; instead get their dockerfile/config files, and build the images yourself.
To me, a Docker image seems like an ideal way to distribute some proprietary device management web software like Ubiquity UNMS, rather than requiring some obscure version of some database or whatever other dependency actually be installed and maintained by their clients. You can spin that image up on a server or group of servers, or on Amazon ECS or a bunch of other providers in a matter of minutes. With enough motivation, you could even export the image and manage the environment manually.
This comment makes way more sense to me that the original blog post. Yes, nobody should be relying upon docker as their distribution platform. That's pretty terrible. Ubiquity I've observed seems pretty uncomfortable just supporting the major distros, I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system. That's closed source for ya.
> I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system.
Why would you want to do that though? Treat the whole thing as a black box running inside docker and be done with it. The second you crack it open, you get to support it. Let Ubiquiti support it, after all that's what you are paying them the big bucks for.
because....they only offered .deb files and I wasnt running debian or ubuntu (nor do I like to bother with it in containers I'm building myself b.c. i have no clue about debian)
the package in question has since finally offered .rpms but i haven't had time / interest in updating it. this is wifi software I'm running personally, ubiquiti only supports the windows/mac versions of it in any case.
Ubiquiti has always done this, even before containers were "hot."
If you install their rpms or debs for any of their properties, you're almost always getting a copy of Mongo or some other dependent service... and it is probably going to be incompatible with whatever version your package manager has or you're already running (version-constraints-wise, not actual compatibility-wise).
This is an indictment of Ubiquiti, not containers in general. If their software were properly built, they'd be shipping you a docker compose setup or something with N different containers that you could substitute out (at a network level) for your own.
I once worked at a company which separated IT into 3 teams: developers, DB-sysadmin (ops), and QA (who also managed deployments). Releases were supposed to go in a waterfall model from the Dev group -> QA group -> Ops. QA wanted Dev to submit Word documents for each release with blanks to be filled in with server names. However Ops was so distrustful of Dev that it was not enough for them to lock us out of Prod using regular security tools, we were also not allowed to know the NAMES of servers in Prod or how currently deployed systems were grouped.
Every release was an Abbott-and-Costello "Who's on first?" routine. Do you have any idea how hard it is (especially in computing) to ask for something without being able to utter its name?
QA: "You left servername blank on this deployment document."
Dev: "I know; Ops won't tell me. Just ask them for where the service is currently."
QA: "Ops says there's 5 unrelated legacy services with that same project name, on different servers."
Dev: "5? I only knew about 3. You know, if I could query the schemas of the Prod DB, I could tell you in a jiffy which one it is."
Ops: "Pound sand. If you want look at databases that's what the dev DB server is for."
Dev: "Erm, OK well can I give you a listing of the Dev DB schema and you tell me if it looks like the one the Prod service is talking to?"
Ops: "Oh I see you want us to do your job for you now? You can compare the schemas."
Dev: "OK..."
Ops: "Just tell us which DB server you want the schema pulled for."
Dev: "But you won't tell me the server names."
Ops: "No."
My point is this is how bad communication can be when ops and dev are not on the same team.
Devs hardcoding things in their software in a rush making the software tougher to deploy and operate causing greater incident rates and therefore page-outs. Devs interested in greater resilience and stability in their software should be opting for dependency injection of pretty much every damn thing in the world around them whether it’s a network service or file system location. Otherwise, presume that it can go away at any time. A common pattern among developers trying to save time that costs more in the long run is to hardcode a path to an executable. A simple /use/local/bin/ buried in an infrequent job that is installed on developer machines but never in prod is all it would take to cause an incident in prod that costs the company millions. I say this both as someone that has written this and had to fix others committing the same error in their code and QA passing it along.
Ops tends to be where the brunt of technical debt is truly buried. Bad code is one thing but seeing the code in action with real world data is a different beast altogether.
The thing is that any separation in the roles in ineffective. Things shift around some if you embed an ops guy into the dev team directly, but it doesn't resolve the core problem. This applies to DBAs as well as ops or any other software-side segmentation as well.
The core problem is that there are "ops guys" and "dev guys". That creates conflicting incentives, even within the same team. It creates tension and a dynamic centered around bandying work around so that it's "the other guy's problem" in some situations, and hoarding logic onto the one segment so that there isn't an "obstruction" in getting things done in others. Moving the "segmented" guys directly into your team just makes these politics closer to the heart, which is not always an improvement.
Teams should be comprised of whole-platform "generalists" (in quotes because they really should be good at stuff, whereas "generalist" implies they aren't; here I just mean a competent non-specialist), where any single individual would be comfortable/capable performing any particular task that may come up. Of course, each member will have preferences and habits, little "skews", but it is important that these skews are controlled and used for mutual education, and not allowed to "flandersize" someone from "the guy who knows SQL better than the rest of us" to "full-fledged DBA who hasn't committed any C# for 3 years".
The right axis for separation is hardware v. software. If it's software-related, your dudes should be equally yoked, such that any SQL ticket would be assigned to any member of the team, or any "devops"/sysadmin/deployment ticket assigned to any member of the team.
These systems, from the OS up, are all part of the same thing, and they're all tightly integrated. Making the workload of the individual people on the team also tightly integrated is the only way to make sure that incentives align properly and that the most effective technical decisions are made, instead of decisions motivated, consciously or not, by offloading blame or other political/effort/convenience considerations that cause the overall system to suffer.
If you get into a sticky situation that requires specialized help from someone who has lived and breathed MySQL Server night and day, well, that's what consultants are for. Consultants would also be useful for inspections/sign-offs. But your core team can't tolerate being segmented out by component/implementation detail.
> Containers are a smart solution to the build problem.
Linux "containers" are a variety of things. True OS "containers" don't exist on Linux, but there are some rudimentary approximations. A Docker image is essentially a zip file, and sure, zip file-ish things may work fine for uploading artifacts to systems. Dockerfiles are unequivocally terrible, however.
I agree with parent, but I think you're taking it too far. I don't think there are enough skilled generalists to pull off your ideal, and I think software/infrastructure is too complex to allow for generalists in the breadth you describe.
I'm a security person who knows pretty good Python and simple database stuff (SQLite). I think I'm in the top 50% (humbly) of my field, probably higher.
But I don't know front-end, containers/CICD, or disrtibuted systems worth a damn.
I do believe parent, which is the idea that teams should have embedded resources. A "VM security team" operating firewalls and infrastructure and policy auditing should not only have security experts, but their own devops group that automates the crap out of everything, using 2018 best practices. Currently, my team's "dev" group is a separate team in another area whose work queue is fed by multiple, distinct teams. It makes learning and understanding our requirements really tough for them.
Phew, this has been a good exercise. Let me clarify the thesis.
The thesis is NOT that a crew of superhumans can supersede all DBAs, security engineers, and infra people in the world.
It is rather that you can be a great software-side engineer, and that you can skew/focus on a few primary concerns, and develop and maintain a working knowledge in the others, sufficient to service your core project's needs.
Specialists can be called in as spot checkers, auditors, or short-term implementers, but they shouldn't be needed for the day-to-day of building, maintaining, and deploying your software.
In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles. We can learn the varying conventions well enough to be serviceable in all the elements that we send down to hardware -- not necessarily expert, but good enough for day-to-day work.
I'm not saying that everyone on the team should be better than the best DBA guy you've ever met. I'm saying that everyone on your team should be reasonably confident with SQL. Specialists have a place in your friendly local <security/database/whatever> consultancy.
> In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles.
Interesting that you mention this, since I think it's become something of a self-fulfilling prophecy, especially with giant cloud IAAS providers making one-size-fits-all choices of hardware to sell.
I certainly agree with you that that the basic principles are certainly the same, but that ignores the performance (and, arguably, reliability) possibilities that open up when not limited by the hardware (including network) choices of others.
> But your core team can't tolerate being segmented out by component/implementation detail.
And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc. And what are junior developers supposed to do?
> And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc.
You're right that everyone is not going to start out knowing everything. No matter how senior you get, there will always be areas you know better or areas that you prefer, which are the "skews" I referred to in my original comment. When a new framework or technology or whatever is introduced, only one or two people will know it. That's all fine.
Docker is the epitome of the broken segmented model. Devs hate and resent ops telling them they can't do things. Docker promised devs that if you spend a half-hour writing instructions to build an archive that contains your app's file tree and to pull in a completely untrusted OS userland `nice-mans-alpine:4.x.malware-free`, those annoying ops people will get out of your hair, and you can go ahead pulling `bad-actors-handy-4line-totally-safe-lib` from npm to your heart's content. No more complaints about that package not being approved, or the dependencies not installed, or the runtime too slow, ha!
The whole comment thread on the original article is a case in point. Someone who is responsible for the whole software side of real systems will be horrified at the suggestion of such recklessness. However, developers who're only accountable for pushing "at least one commit per day!", and consider security and performance someone else's problem, will be thrilled at the prospect of "tearing it up with some 10x coding" while they silence "the Luddites". (who, sidebar, were too dumb to see the beauty in JavaScript back in the 90s! Pshaw!)
Which dynamic do you want to encourage?
> And what are junior developers supposed to do?
The same thing that everyone else is supposed to do: learn it, gradually, as needed. Read the docs. Seek mentorship from team members who have that "skew" (formalize this process if necessary). Read the changelogs. Read the code. Figure it out!
Many will protest and say it's outside of their comfort zone. Some will protest and say this is inefficient. That may be true in the short-term, but the system will invariably suffer if you do hard segmentation on the software work, because the falsely-separated concerns won't understand each other and end up setting up territories.
People will hate the DBA because they won't understand why he cares about "boring crap" like "normal form". People will hate the sysadmin because they won't understand why he cares about "boring crap" like "not being woken up at 3am". Your front-enders will be more gregarious and have better haircuts, leading to prioritization of front-end concerns.
Essentially, the project becomes driven by blame-shifting, protectionism, and which software-side segment has the more attractive people, because the concerns are fungible enough that any side could potentially handle them. That makes it a political competition. The project is no longer driven by technical prudence or efficiency. It's no longer about the tradeoffs involved in solving the problem at layer X instead of layer Y.
The dividing lines from OS up are arbitrary. We can't all be experts in all of it, but we can all have the expectation that we need a basic grasp over the whole system, by which I mean the WHOLE SYSTEM, and that we should become competent in the major elements used to build it, and patiently nurse this competence over time.
One team member should be able to handle 90% of the tickets that come in independently, whatever elements of the stack are affected (sysadmin, application code, database, frontend, etc.), and when they hit one of the 10% they can't do independently, they should consider it their responsibility to seek mentorship and learn the skills so that after several such rounds, they can do it independently.
The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation. I'm not sure if there are when we've already accepted the segmentation as a fact of life.
>The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation.
That strikes me as merely wishful thinking. It's not as if there isn't already research on human cognitive abilities in general.
Do you have any scientific basis for thinking engineers are merely being held back by our acceptance of specialization, rather than by inherent cognitive limitations?
Once the downvotes start coming in, people read comments uncharitably, and the thread gets lost, but to be clear, I'm not advocating for anything that is beyond the cognitive capacity of typical software developers.
One and two-man startups provide ample evidence that working knowledge of the whole platform is not beyond human cognitive scope, even if getting this to be accepted at large requires some extra cultural encouragement and support, and some professional management of individual "skewing".
Once more, it's not that everyone has to be a hardcore expert in everything all at once. You don't want them to be.
You just want your main people to know each platform component well enough to be able to make a reasoned decision about the trade-offs involved in using one or the other for a specific task, and then to be able to own that decision as a group.
If they can't or won't do that, the platform decisions become political instead of technical. I've seen this over and over again, where massive technical problems are routed around because the Java developers have been told they can't touch Ruby, or the C# developers have been told they can't touch SQL, and the real problem never gets fixed, because we only recognize naive, scared "specialists" who insist that they can't learn Python because they're just a PHP developer, so they can't look at that piece of Python that's holding up the thing, instead of rounded, capable "generalists" who can be trusted to call in help when they're getting in over their heads, and may take an occasional "inspection" or two to make sure they're aligned with best practices.
General contractors are not electricians, but they can do a lot of routine work that involves electrical fixtures, sockets, and outlets. You call the electrician for the face-melting stuff.
General practitioner MDs are not dermatologists, but they can do a lot of work that involves routine skin disorders. They'll prescribe creams for fungal infections, rashes, acne, etc. They'll let you know you need to call in a dermatologist for the "skin-melting" stuff.
In software, we don't say "call the DBA for the database-melting stuff." We say "the DBA will write all of the SQL for you." It just doesn't seem to comport to me.
I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.
However, I still disagree with your premise that it's merely our attitude at large somehow holding people back. Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing. I'm also not convinced that, even among founders, they're as broad generalists as you're suggesting.
You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.
To reiterate my point about cognitive capacity, if true specialists are desirable, then I allege asking them to be more of a generalist makes them a less competent specialist and therefore less valuable on the market. That's an alternate explanation for extremity of specialization than preconceived notions.
Now, personally, I share your desire for greater breadth of knowledge among all technical professionals, if for no other reason than they might have a greater appreciation for my own specialization. I just don't think it's realistic.
> I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.
No need, it wasn't really meant to be directed toward your comment specifically. I just referenced that negative misinterpretations are inferred when the comment is grey as a way to remind people that it's not likely someone would advocate such caricatures.
> Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing.
You're right, and I thought of this when I used that example. But by the same token, we can take it out a level further: professional software developers have already shown themselves as having higher-than-average cognitive abilities, because the truth is that the average human doesn't have the cognitive capacity to become a professional software developer. If they did, we'd all be paid much worse.
How far off are founders from professional software engineers? How far off are professional software engineers from the median of adults? How much additional cognitive load is required to be operational in a handful of extra platform components, especially if all those components run the same type of hardware? All good questions that I don't think either of us have ready answers for.
The other thing is that even if this is out of reach for the "average developer", it wouldn't mean that it's not an ideal to strive toward, or necessarily even unrealistic in all cases.
> You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.
Specialists should exist -- as external reference points in consulting groups.
If you want your life's mission to be building SQL queries, join a database consultancy and deal only with the SQL problems that your clients couldn't figure out on their own and decided they needed to pay $$$ to solve. If SQL and database design is truly your passion, you'll be much happier this way than you would be as a staff DBA redesigning the same rote EAV schema for Generic Business App #29, working slavishly to finish the code for that report that Important Boss #14 needs on his desk ASAP.
Creating a referral-style economy creates a lot more room in the marketplace for specialist consulting groups and gives more specialists greater reward (monetary and emotional). It simultaneously allows "generalists" to stay focused on the big picture of building and maintaining a robust and prudent system overall.
I think it's worthwhile to consider how generalist v. specialist operates in other knowledge fields, and what lessons we can take from that.
I am confident that a generalist ethos is for the best, but I'm not sure we'll get there without better cultural underpinnings, so I'm not making these statements purely out of self-righteousness (maybe only like 80% ;) ).
This dialogue has already been informative and has helped me refine my ideas and hopefully learn to present them somewhat better. Thanks! :)
The thing is that any separation in the roles in ineffective. [...] The right axis for separation is hardware v. software.
Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.
> Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.
Hardly arbitrary -- hardware is fixed at the time of manufacture. Hardware engineers should be well-acquainted with software concerns and needs, but the years-long feedback cycle and real expenses associated with hardware development creates a natural barrier for work separation, requires different work cadence and much more stringent processes, etc.
This is not to say that a good hardware engineer shouldn't contribute to software and vice-versa, but it is to say that the roles are sufficiently divergent that it makes sense to place them in different segments. That is not the case with anything this side of the operating system, as far as I'm concerned.
It's arbitrary when you claim there are no sensible divisions in software. I think your entire lengthy argument is a sort of elaborate fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be. It's fun but not a particularly realistic or constructive way to look at the world.
> It's arbitrary when you claim there are no sensible divisions in software.
It's about the fungibility of the problem space. I don't know how you expect your core team to make reasonable decisions about the tradeoffs if they a) don't understand more than one of the platform elements; and/or b) don't have any responsibility or accountability for the tradeoffs that get made, because now it's another segment's problem. Indeed, when I've been on teams primarily comprised of non-generalists, these decisions were almost always a matter of bureaucracy and politics.
> I think your entire lengthy argument is a sort of elaborate, lengthy fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be.
I've worked on teams that were mostly "generalist" and teams where the "generalist" type was either absent or artificially constrained. My perspectives are drawn from those experiences, and have developed based on a hard-earned worldview that says people reliably act in favor of their own expedience. Doesn't seem very fantastic to me. ¯\_(ツ)_/¯
I don't know how you expect your core team to make reasonable decisions [...]
That's how most everything is made, not just software. In the case of software, Fred Brooks added an essay titled "Parnas was right, and I was wrong" in the 20th anniversary edition of The Mythical Man Month about this topic. Itself published over 20 years ago.
> Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.
You have just lost all the speed to production advantages of containers.
"speed to production" is not meant to be the primary advantage of containers.
"knowing exactly what you're running and being able to reproduce it" is meant to be the primary advantage of containers.
What you're basically saying is "if your container system admins do their job properly rather than throwing security and reliability out of the window, it can take a bit longer than not bothering". This is trivially true, but not really the point agentultra was making.
That's how I always did it (building containers ourselves), and once the pipeline is in place, it's barely more work than pulling public images.
Speed of production advantages are absolutely not due to pulling untrusted containers. If anything, it makes your life harder.
Hard to imagine any serious production setup not doing this... In most cases, you need to modify the containers anyway to suit your needs, and how else are you going to rebuild them all when the next OpenSSL update comes out?
Have you really? Building a base container to base all further images off of takes about a half hour with our build system. Fuether app builds are down to 10 minutes at a max and can honestly still be optimized. How exactly are you losing all the speed advantages?
Well, potentially unpopular opinion here, but an awful lot of sysadmins brought their looming obsolesence on themselves. I'm an app (as in "a program that runs on a computer", not an iOS add-on) developer, always have been. I get requirements from the business types, code it up in vi or Eclipse or whatever, get it working, and then they (the business) want to deploy the working app out to production so people can use it and the business can make money off of it. And, for decades, sysadmins have been a brick wall of pure hostility. They're not all like this, but a lot more are than aren't. Like, I get it - you're overworked and the demands on you are unreasonable. Yeah, me too. But I just work here, man. You're right, I don't know how to do your job, that's why I sent you an e-mail asking you what steps are needed to deploy an app into production since it's not documented anywhere. But rather than just tell me what you need so I can go gather that up, you're going to unload on me because you feel overworked and unappreciated, but you're sure as hell not going to unload on a manager or somebody with actual power, you're going to take it out on the developers who have no pull or voice.
Actually, as a sysadmin, I sympathize with you, since I consider that kind of situation to be a sign of, essentially, bad system administration. It also sounds like it might be at a larger company.
Personally, I've always considered it a significant part of my job to make developers' jobs easier, especially with something like deployments and dependencies.
As such, I disagree that we've brought our own "obsolesence" on ourselves, but I do agree that those of use who have perhaps forgotten that ours is a service profession have hastened its demise.
I feel like there has always been a contingent of sysadmin / ops folks who preferred the "Better to ask for forgiveness than permission" model. They still hate when things break, (so not quite fans of developers with a "move fast and break things" philosophy) but they care more about big picture improvements and ease of upkeep than enforcing any particular process. Detecting problems and being able to roll back is typically more valuable than preventing mistakes in many cases. It may be somewhat driven by laziness, but it actually works out pretty well for collaborating with the fast-moving developer types. It also does depend on being in an environment that is tolerant to occasional mistakes or outages.
It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.
> It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.
Working for profitable businesses where stability is valued over velocity.
It's rough because, like many backend type jobs, the best thing that can happen is nothing breaks. Incremental improvements in stability or scalability will not be noticed, but every single change you make is a massive risk of a page at 2am, all-nighters trying to fix things, outage reports, incident reports, root cause analysis reports, etc. You're stuck between process and outcome.
You have to constantly fight the urge to just never touch anything.
This is definitely a rant that obscures the underlying point: the introduction of _untrusted_ or _unreliable_ network resources, frequently hidden in a string of dependencies.
I'm baffled how often I see an someone throw this sort of craziness - "go fetch this thing from some random third party" - into very important places, such as the startup procedures of a container. It's something I see in a culture of the two person startup just trying to get something out the door. It's definitely "technical debt", and frequently, it won't get removed. Thus, you try to scale up to meet load, and all these new instances go time out on the same external resource that's randomly having problems... boom! At the worst possible time. Never mind the potential huge security gaps.
But the specific _tools_ aren't the issue here. It's the culture of "ship something now we'll deal with fallout later". A lot of people start using Docker and won't ever look at the Dockerfile, or, will add a Maven dependency and won't even check licenses or security updates for _any_ of the transient dependencies.
Cloud technologies and containerization make everyone just think "we can do things so fast now" and never, ever pay attention to details that can come back to bite you.
On the flip side, it's a good time to be in cybersecurity; because this cultural problem will never, ever, get solved. :)
At the end of the day it comes down to the fact that businesses just simply don't care (Equifax etc).
They like the idea of security and that's where it ends.
In many places if you try to "do things right" you will get fired in two months for being too slow/strict and they will happily replace you with a clueless easily trusting person who "goes and fetches things from random third parties".
Many times they get lucky enough to survive and they don't appreciate the risks that they took. That pace becomes the expected norm and sets the theme in the industry.
And when shit hits the fan the PR person writes a "we are oh so very sorry .. security is totally our number one priority" blog post. They blame and fire the poor bastard and replace him with another warm body.
When it comes to these "hidden" things like security companies do not reward and also punish "doing things right" so on average and over the long term we end up where we are today.
When the culture sufficiently shifts towards being sloppy you will get hammered down quick if you try to be the voice of reason because it ends up being you vs everyone else (the norm).
And, honestly, why should they? Security breaches have yet to hurt an actual company (they hurt users plenty, but not the organization that's actually responsible).
(I've seen similar claims in different ranges. Costs of breaches in the US are pretty high - over $7 million.)
Even Equifax probably wants the 3-4 billion in valuation it's lost since the breach.
The solution appears to be buying tools to avoid and respond to breaches quickly, instead of engaging and building in security awareness. (Microsoft's security development lifecycle comes to mind.)
IMO, both approaches are likely cost effective, though I have no numbers or research to back that up.
You know, I used to agree with you. But the reality is you have to weigh the massive productivity boosts that things like docker bring to the table vs. the potential issues it can bring. To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.
> To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.
this is a very naive way of setting up a secure production enviroment.
Your perimeter security is worthless if you are loading non public images which have malware or even worse, unknown malicious code in them.
having a data breach or hack on your hands is something which could kill the company. That risk is not worth having a slightly faster productivity boost because you or your ops team is not able or willing to build a proper private repository setup.
Yes, it's definitely not something related only to Docker, it's today's culture of trusting all possible code only because someone placed it on Github.
> Consider for example Hadoop. Nobody seems to know how to build Hadoop from scratch. It’s an incredible mess of dependencies, version requirements and build tools.
And as the major introduction to the blog post:
> I’m not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.
Huh? Old-school sysadmins know how to keep systems running, manage updates and upgrades. At the same time nobody knows how to build Hadoop from scratch. At the same time, Hadoop build instructions themselves have curl|sh scripts or mirrors and the wiki page is outdated. And it uses Java (and thus maven/ivy). And that downloads the internet.
According to the blog, Hadoop, maven/ivy/sbt/any dependency manager, package managers, and everything is broken. But the tagline is:
> This rant is about containers, prebuilt VMs
What does any of this have to do with the "Age of containers" and pre-built VMs? Is the author just talking about Gentoo/LFS-style "compile the whole system from scratch"?
This feels like an incredibly rushed rant. I can only envision the author requiring to setup hadoop for the first time, breaking their head for a few days (it happens), and taking it out on everything.
There is SSL/TLS, unless it's done wrong (invalid certificates get ignored by the dependencies manager), it's safer than the old "md5 of the file" systems.
Now, some dependencies are fraudolent (especially true in the Javascript world because it eventually targets a lot of user browsers), but nobody ever checked the sources anyway...
TLS only verifies that have connected to the correct server. It can't verify whether the package on the server has been replaced by a malicious one. For that, you need a "md5 of the file" (these days, a sha256, because md5 has long been broken).
Which isn't really true; as a sysadmin (I'd say "former", but once you're a sysadmin, you're always a sysadmin), I've seen lots of things with horrible build and dependency nightmares, and that was before package managers, containers, and virtual machine images became de rigeuer.
Think of a self-hosting programming language: you can't build it without a running installation of a previous version. (Anyone remembering "On Trusting Trust" at this point?) Or any application in an image-based language like Smalltalk. Development becomes path-dependent. It's inevitable to get into a situation where A and B cannot be made to work together, except in a derivative of a version that someone, somewhere made while holding their mouth the right way.
Pre-built containers and VMs are an admission that path-dependence is the way stuff is supposed to be.
If I need to use Hadoop, I'll download one of the pre-built binaries that they offer on their site.
You'll notice that the Debian Wiki users have given up on building it since 2010. That was three years before Docker even appeared. Almost nobody was using containers back then.
That's like saying, "If we didn't invent the internet, we would have never had privacy issues". OK, so if we didn't rely on containers - would hadoop have had a perfect set of packages for every distribution? Let's say that the packages for Arch linux were broken. What next?
That's the whole problem with the article. It takes a problem (building Hadoop was bad), correlates it to a completely different tool (because we have docker, hadoop build scripts are bad), and goes on to rant about everything else.
I've seen this brewing for a while, and getting worse and worse. Back in the 80's and 90's, there were developers who would code their own sorting or hashing routines rather than linking in some external library to handle this "solved" problem. The perjorative term "Not Invented Here" (NIH) grew to describe those developers and they were shamed into reusing code whenever there was code to reuse. And in some cases (like sort routines), it makes perfect sense. However, NIH accusations have grown to "if there's something vaguely similar to what you're writing, you must use it, even if that involves more custom coding to artificially bend it to the case at hand than you would have developed in the first place", culminating in things like the completely empty, useless (but enormous) Spring "framework" or, to a lesser extent, things like Angular that sort of do some things, but create far more problems than they solve (and definitely add more development overhead than they remove).
Interesting take on the NIH term. I thought this was more an ego thing for the big tech companies. They love to reinvent existing things to look like geniuses
I think you're correct on the origin of the term, and I should have mentioned that - but in the past few decades, I've been accused of "NIH"-ing whenever I've "rolled something" of my own from authentication to IoC. Just because there's a library that has a particular description attached to it doesn't mean that it should be used as often as it can be.
The author has added an update to the bottom of the post which I think makes his main intended message clearer:
Update: it was pointed out that this started way before Docker: »Docker is the new ‘curl | sudo bash‘«. That’s right, but it’s now pretty much mainstream to download and run untrusted software in your “datacenter”. That is bad, really bad. Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!
The author should have grabbed the .sdeb or the debian build scripts and tore them apart if they really wanted to make a point (if, upon examining the build, there was one to make).
I mean there is a lot of cognitive load/disconnect we're talk about. As an ops guy, I can't look into every package. That's why I trust the package manager (apt-get, yum, whatever) and all the build maintainers who either volunteer or work on for Redhat/Canonical/SuSE/IBM/whoever.
Things get through. That's why we have all those security people out there who are digging around for bug bounties and find crap like the recent Ubuntu Snap package craziness.
Docker containers can be good. You can use an official Ubuntu or Alpine image, build your base, and create scripts to make sure your base containers don't go out of date. Most people don't do that. The official Docker containers are kinda a mess, but at least they're maintained. Grabbing some random container off Dockerhub? Yea that's not going to end well; unless you just use their source to build your own. Or if it's a container continually maintained but the person/company who wrote the service.
Docker containers do need better security introspection and that's going to be a big deal going forward. But this article is all rant and some, but not enough, substance.
Yes, but shouldn't you have separate "build" and "deploy" container images? You should "build" a particular version once, "deploy" the result into a test environment, test it thoroughly, and then "deploy" to production, right?
This is not my job (yet). Please tell me if I'm wrong, because I'll need to do it in the next few months.
But I just don't understand why we have to have 47 half-built over-complicated build systems or job runners or whatever the new fad term is for every language, when there's something that does what they all do, is battle-tested, and has been around for decades.
Everyone repeat after me. Makefiles are not scary. I can write a shell script. Do I really need to learn grunt/gulp/webpack/npm/rake/fake/maven/gradle/ant and on and on and on?
Probably somebody has released another one in the time it's taken me to write this comment.
Makefiles aren't scary. But they're also not particularly good.
I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever). Shell plumbing is fine for informal and small-scale stuff, and I make my code conform if somebody down the line (who may be me) wants to get out their duct tape, but the world is more complex than what /bin/sh can see. Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better. The profusion of tools happens because they iterate on each other to be better. If old tools were sufficient, people would use them because learning new ones is hard.
So, yes, you do need to learn those tools. Or invent a shell that isn't tooth-pullingly difficult to use with a JSON file (and do not say `jq`, I love `jq` as an inspector but it does not step to `JSON.parse` and a working subscript operator). Or change `make` so that a git checkout won't trigger a full rebuild. Lots of baseline, stump-simple things that `make` is just not going to do for you because it's built for a frankly outmoded method of development.
Ha, that's a neat trick! Trouble is, for either Python or Ruby it becomes tricky due to stuff like dependency management. You'll have to `bundle exec make` to get sane library paths for Ruby or `pipenv run` for Python, etcetera etcetera.
At that point I think you might as well just use a language-native one.
The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.
I think you're (understandably) misinformed about what Makefiles do because you've run into some bad ones. The thing they're doing is managing a N-level deep dependency tree in a declarative way. So if A->B->C you can run something to generate C, then B can run, and finally A, and this can all be done in parallel for hundreds of files.
On the individual rule level this is really simple, e.g. just turning a .c file into a .o file, then finally some rule that depends on all *.o being generated creates a program out of them.
The language-native ones are usually much worse. They're easier to use at the outset because they don't make you create this dependency graph, but that also means that they can't run in parallel on your N cores, and they usually suck at incrementally updating the build.
> The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.
I'm not sure what you mean? How would this allow you to use, say, Python as the language for recipes? Just having Make drop straight into Python kind of defeats the purpose of Make.
You'd use Python as the language for the recipe that turns (in this example) a given .c file into a .o file, while leaving the Makefile to do what it's good at, declaring the DAG dependency tree needed to incrementally build it.
The point is that people conflate these two things. They open some random Makefile and see that it's mostly doing stuff with shellscripts, and think "oh I should do this all in Python", and then write some monstrosity that doesn't make a dependency DAG and thus can't run in parallel or be easily understood.
Instead they should have split the actual logic they found in shellscripts in to Python scripts the Makefile invokes.
Nevermind, I misread you. I missed "rule" where you said "beginning of every Makefile rule". (I thought you were suggesting just having the default rule run some enormous Python script, which I've unfortunately seen before.)
> Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable
Is it that hard to learn shell? Why is it so painful? What makes it the "lowest common denominator"? I use it all the time, but I admit at work I am one of the few.
> Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better.
This is too vague. What makes a Ruby based build or JS based build a more "refined" tool? It sounds like familiarity is the real issue here.
> If old tools were sufficient, people would use them because learning new ones is hard.
How many people even know Makefiles these days anyway? The "modern" approach seems to be, learn a programming language and then try to do everything inside of it. Some languages are more interested in this cloistered philosophy than others (like JS).
If anything, I think the reason these build tools keep being proliferated is because nobody wants to learn anything more than the bare minimum to "be productive" (which, depending on what you're working on, can be anything from pushing out customer demos for a company that will never sell, to microservices operating at scale). Learning a language and never leaving its paradigms/comforts is easy.
Your second paragraph got to the heart of it. If we want to use some standard build toolchain, it needs to use a nice language and not feel obscure. I was explaining to someone a bash script I wrote, and he said "why not use Python". There were reasons but... he was right, Python would be much easier to use and maintain, and we have a lot more developers who know it.
Eh. It's not my favorite thing out there, but Maven's fine for what it is. It's designed for and explicitly for well-behaved Java artifacts. If your Java artifacts are not well-behaved, you're going to have a bad time--in my experience, most of those cases are doing things you probably shouldn't be doing.
(You may be a wizard and have a reason to do them, for sure--but that's what writing Maven plugins is for. Or not using Maven. You've got choices.)
Given the limitations of the platform, there really isn't a such a thing as a well-behaved JVM library that depends on other libraries, unfortunately. Oracle really dropped the ball by only serving their own needs with the module system.
Can you expand on this? Having done a pretty decent bit of JVM development, I've never really run into issues even doing some not-out-of-the-box stuff.
Blatantly false? OK, put up. Forget `sh`'s JSON parsing, I'll make it easy on you--show me its arrays. Arrays, plural, you can't use $@ as a bail-out; I need more than one. Show me hashes. Show me sets. Show me the basic building-block primitives of software, because build pipelines are software. It's 2018. If your language can't do this stuff without babysitting, it effectively can't do it because nobody's got the time to babysit your easy-25%-of-Perl language
And yeah, I did say `sh`, because that is what you can practically be expected to have kicking around alongside `make` on a system where I can't just install something worthwhile. If I have them, then there's no reason to write much harder to troubleshoot shell scripts (and, thank you quotebarf, more likely to be incorrect) than to open Pry.
Clunky is subjective -- I think we can probably agree that there are clunkier, more opaque, and harder to work with constructs in programming than a slightly different syntax for array declaration.
What are these build systems where you need to install Bash? Bash 4 was released in 2009: it's been in every major Linux distribution for at least two major versions, ditto for FreeBSD... heck, even Solaris ships with it now.
You're right, there are clunkier, more opaque, and harder to work with constructs in programming. Like `sh` arguments. And like `sh` quotebarf.
And Busybox is commonplace. Systems that include Busybox usually do not include `bash`. And so, for my purposes, if I can specify `bash`, I can also specify, say, Ruby, which--while by no means perfect--makes life much, much easier.
"All of which are clunky, opaque, and harder to work with than any other language I have used in the last decade."
Wrong question. A better one: Is it more clunky, opaque, and harder to work with than every other language that's appeared in the last decade? Because no one seems to agree on what is specifically better.
I disagree that it is the wrong question. "Every other language" doesn't matter because I don't value homogeneity and I think that homogeneity of programming language is a fool's errand. I am comfortable shipping production code in most of the languages in current use; in my estimation, none of the major build systems out there are as opaque or difficult to use correctly as make/shell.
(And I am what a current sysadmin would be if we did not call ourselves "devops engineers" now.)
And you are definitely not a sysad or any sort of sysadmin. The core mission of a sysad is to build the best environment possible while restricting that environment.
You seem to be the sort of developer that developers love and the sort of sysadmin that gets fired in the first week
Just my .02 after two and a half decades.
Not to say that you are entirely wrong or your approach doesn't have merit in the new world (especially SV). But
it doesn't work for sysadmins and production environments
anywhere but your bubble. Not yet.
Well, you're right, I'm not a sysadmin. I pervasively automate, which often sidelines sysadmins, when it doesn't make them redundant. I write code and I don't touch production machines except in extremity, neither of which apply to most (though by no means all) of the people I know who want to call themselves a sysadmin.
Anyway, the core mission of anybody touching the stack is to enable the business to achieve its goals. Nothing, and I mean nothing, more. "Restricting that environment" is appropriate in some environments, and a number of my clients bring me in to help with that. Facilitating developer velocity--and, yes, developers do tend to like me, because I'm good at this while achieving goals around security and uptime--is appropriate in, probably, more. Pays better, too, even if it shouldn't.
It's not that sysadmins cannot do the work you are rightfully proud of. If there are two basic things that differentiates your statements from those of a traditional sysadmin it is these.
1. Design.
2. Discipline.
Where these two values are dispensable long term devops and the new world shine through. I've worked in both worlds and the only mistake is assuming one size fits all.
In general you seem like an absurd sort of creature. Neither here nor there. Bragging about your facility and business velocity. Everything you claim to do sysadmins were doing in 98 and with equal velocity and adequate coverage.
At the risk of being too "meta", although I agree with what I believe is your point about good sysadmins having been advancing automation (and otherwise keeping business needs in mind), I worry that you're distracting a reader from that point by what reads as an ad-hominem attack in your first sentence.
I'm still not certain what point you were trying to make with "neithere here nor there", however.
Have you SEEN how Bash array indexing works? It’s not exactly user friendly. Same with Bash hashtables (which I don’t think are really hashtables).
I love Bash and use it way more often than I should, but Bash is not a friendly language. Mocking and stubbing functions is really hard to do, which makes tests awkward, even with BATS. And you need to watch the file descriptor to which text is sent to avoid things get globes unexpectedly. And you need to properly scope your vars to avoid unexpected collisions. And everything regarding sourcing and importing dependencies. Etc.
My challenge to you: I want a makefile that has 20 third party dependencies and can be built on osx, linux, and windows.
I can do this within an hour with gradle, ant, or maven. The ecosystem doesn't exist for this in make, and anything I could come up with to make it possible would end up being a tool that would look like automake and the monstrosity that it entails.
That's a bit unfair, because make relies on the underlying system capabilities much more than Java does, and Windows' just isn't up to snuff. But for the other platforms, autotools definitely can do what you ask.
> But for the other platforms, autotools definitely can do what you ask
Ugg... autotools. Yeah, lets use what feel like build tools created way back in the 60's or something. Sure they "work" for some value of "work" but oh boy are they ugly and nasty. Good luck hiring anybody under the age of 40 who is gonna be willing to work on such a clunky old tool.
There is a reason why the world is moving to build systems that replace the Make toolchain.
Come on...bundle your JRE. I don't know why anyone wouldn't in the age of terabyte hard drives and ubiquitous "small" apps with sizes that make a bundled JRE with all the trimmings look lightweight.
And while you’re at it can you make it idempotent when building sub artifacts which inputs haven’t changed.
And can it do it all incrementally in parallel too please because enterprise shops tend to have a ton of code
Oh and it would be really nice if you could make it so if someone else somewhere in the org has compiled that thing then could we just use their computed binary to save the time compiling locally.
> But I just don't understand why we have to have 47 half-built over-complicated build systems
> Everyone repeat after me.
I don't mean to pick on you specifically here because this attitude comes up a lot. In short, a lot of people are doing a thing, a thing that you aren't familiar with, and your gut reaction is to say "everyone: stop doing that, and do what I say!".
Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out.
> Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out.
Here we have another fundemental "problem" between dev and ops.
The inherent friction because of different areas of concern.
Dev's want to build fast and create new features, but sadly (even with the whole devops notion) are usually not thinking about viability in production.
Ops people need to keep thing stable, something which is sadly undervalued a lot of times in companies.
Not really? A lot of build tools are chosen specifically to make the build process more reliable and understandable. For example, build tools like Maven handle dependency management, which is hugely beneficial in ensuring your builds are consistent and work the same in different environments. Makefiles are shell scripts.
Makefiles are definitely not shell scripts. Individual recipe lines are executed using Bash by default, but you can change this. (Heck you probably could even get away with Python if you really wanted.)
The rest of Make (which is 95% of what's in a Makefile) is its own language, which is actually not too bad for what it's intended for (munging around filenames), and has the flavor of a functional language.
Not taking sides here but you picked a bad example. Makefiles specifically handle dependency management, but designed from a compiled language perspective. Make sure you build this .so before you build this bin, or that this directory exists, and so forth.
That's fair, but it still reinforces the point: Makefiles are great from a compiled language perspective. Other build tools are better from other perspectives. It isn't wrong to choose a tool depending on your needs!
"Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out."
Have you considered that perhaps the previous commentor is familiar with the other tools? Or perhaps that the large number of developers have streamlined their particular workflows for their particular use case and have not considered the flexibility needed for other cases? Or perhaps that there is a cost associated simply with having 47 different build systems?
If the previous commenter was familiar was familiar with 47 build tools and has discounted a lot of them as extraneous. Is he advocating building an npm module with make? What about a jar? Nuget package? Cargo module?
Its seems more likely that they’re frustrated about the sheer amount to learn and they probably know make quite well.
But there are real advantages to these tools. There’s a lot of vanity libraries out there but not many vanity build tools.
My real issue is the constant wheel-making impulse, that leaves us with a shattered landscape of people tripping and falling over busted-ass and abandoned wheels. I have a JavaScript front-end project that uses three different package managers, and four different make-analog tools, plus some batch files sprinkled on top to orchestrate the common use cases of this monstrosity. Nothing that we are doing here is that complicated, except for this sea of shitty half-baked ecosystems around these tools, each of which was considered best-of-breed at one point.
What we want to do is slurp some text files up, apply some transformations to them, glob them together, run them through the minifier, and dump them into a final output. This is exactly what a decent makefile would fit well for. Instead, I got this, because apparently nobody wants to use anything that isn't the hot new way to do things, and so very few people have even had enough exposure to know that there are tested tools for these kinds of things that existed. The last time I was using make beyond trivial uses was a decade ago in college (incidentally, I actually did build jars with make, easier than fighting Eclipse...). But just knowing that something exists is more than half the battle, although the problem then is that it's frustrating as all hell watching the same ideas cycle round and round and round every couple years.
> I’ve wasted enough time tinkering with gulp, grunt and webpack to sympathize
And yet, those tools fill a need that would be very hard to replicate with the toolchains that came before it. Good luck doing half of what gulp / webpack do from, say, a Makefile.
I'm not familiar with either of those particular two tools, and what you say may well be absolutely true for both of them.
It's just that this argument keeps getting used for every single new "reinvented wheel" (to borrow from the GP). Sometimes the argument is as strong as "it couldn't be done any other way" and sometimes it's as weak as "this one is just incrementally better," but it feels a little like crying wolf.
Was it really "very hard" to make the old wheel do what you needed, or perhaps somehow extend it or add a library, or was it just far more fun and exciting to build something from scratch?
I generally don't mind a proliferation of tools, except when they start to break or conflict with each other, which is, I believe the GP's main concern, and at least tangentially related to the article.
Well you use the right tool for the job. I hope to god you're not using a Makefile for a Java or Scala project. You better be using Gradle, SBT or Maven.
If you're building in Elixir, I hope you're using Mix and not a Makefile. If you're building in Rust, I hope you're using Cargo, or some other Rust specific build tool.
And Makefiles do get stupid complicated when you need things portable and to work on Linux, Mac and FreeBSD; or allow them to have optional dependencies. That's why we have autoconf and all that ./configure && make && make install craziness.
I wish there was something like an updated Make, a tool that works for everything but updated to 2018.
For instance Make works based on timestamps and therefore works very poorly together with git. Switch to another branch and you can get weird effects based on what files were updated and not and often trigger needless rebuilds. And everyone uses git these days.
GNU Make, just using hashes instead of timestamps, would be a huge step forward.
> GNU Make, just using hashes instead of timestamps
Sounds like you're describing make on a system with ccashe installed. Hashing incurs a significant performance hit. The first build with ccashe is 20% slower than building without it[1]. Your modern make would likely be slower for people who just build something from source once and aren't doing incremental development.
Did you consider that it could be other things with ccache that makes it slow? E.g., the need to fully expand C headers and turn it into a text stream (NOT needed for a build tool).
Everything with git is lightning fast including operations that require a full directory tree diff / hash.
That'd require hashing every file on every run to figure out what changed. Not a good idea.
There are modern tools that work for many things. Gradle supports native compilation these days as well as any JVM language. Bazel supports compiling many kinds of languages, and there's still scons.
We have 47 half-built over-complicated build systems, because writing build systems is hard.
Writing correct Makefiles is also really hard, and good luck debugging them. Make is simple and beautiful for small, self-contained projects without many dependencies, but it does not scale well (and even then it's tricky, see [0][1]).
Having 47 different build tools with their own flaws is certainly bad, but they exist because of Make's shortcomings. Just saying "make is fine, use it" won't fix anything.
For many of those systems, the biggest advantage is often that tasks are written in the language of the application.
A lot of the Rake tasks I've encountered in my career would have been easier to write in bash. Not a majority, but a sizable minority. I suspect that in many cases, the gain was that the authors were more comfortable in Ruby than in bash.
I'm pretty comfortable in both, but I'll pretty regularly use Ruby via Pry for stuff I know I can do in bash. It's easier to write, much much much easier to write correctly, and it presents a unified interface to other developers.
Language specific build tools will usually behave predictably across all development platforms supported by the language. Once you start building on a third party, developer experience will be bounded by the quality of platform support of that third party. You don't want to send everybody off on a hunt for the right version of Python unless you are Python.
(Just picking Python as an example because of the recent xkcd. The same is true for everything else, e.g. a Windows computer will rarely contain exactly one make.exe, it's usually either none or a whole bunch of them)
I've uses my fair share of build tools and I think makefiles are horrible. Stringly typed, ad-hoc features, and a really bad language from a PL perspective.
Writing good build systems is genuinely hard, and I think make is not a good build system.
Many of these are tools which download code dependencies from package repositories and integrate them into a codebase or build. Make doesn’t do that, even configure doesn’t do that.
Days like today? As someone who is often working on machines with no network access, I vastly prefer build and deployment processes that don't need to go download crap.
there are much better tools. sadly, nothing LCD (least common denominator) so as to gain wide traction.
That said, for anyone distributing software, shame on them for not packaging their custom build so as to be runnable via ‘make all’ (just using make to drive everything else).
Make is only bad under the Autotools mess. Every build-related struggle in an open-source project that uses Autotools can be traced to Autotools, not to make.
Autotools wasn't invented to overcome deficiencies in make, but deficiencies in C portability across Unix flavors.
Those deficiencies are greatly diminished today, both by POSIX standardization, and there being fewer viable surviving Unix variants that anyone cares to build for.
I've used make for 15 years and never once needed automake. The company I'm currently at uses straight make without problems for a 500 kLOC code base of multiple languages, 3rd-party code, code generators, and unit tests. Our make code totals 1000 lines.
I've seen plenty of messes using scons and ant, more so than I've seen with make. Make is a solid tool.
Across multiple operating systems (unix and windows)? Does it fetch and install 3rd party dependencies? Can a noob maintain the makefile without pulling their hair out?
Old way: we have one tool, it’s got a few quirks but it does most of what we want and then gets out of the way. Noone is impressed by this, tools are supposed to just work aren’t they?
New way: we have a dozen tools, they collectively do less than the old one and integrating them is a full time job for someone since any upgrade breaks something. But we all get to put the names of all of these things on our CVs! And that’s what’s important.
In reality it's all peachy until stuff doesn't work and no one knows why, or how to investigate an issue or remotely where to begin to troubleshoot it. Change and evolution is good, but I think there is still a lot to be said about knowing the basics to anything. Levels of abstractions eventually hurt more than it helps.
I think that may be hehind some of these tales of 'luddite sysadmins'. Sysadmins need to keep things running and complexity and dependencies, even if they being convenience is something that makes them nervous. It's not about being a luddite, its about being able to hold a mental map of how it all works, so that when it stops working you can dive in.
And not to mention getting called at 2 am because something didn't build or the release bombed and having to examine, for the first time, some over-complicated mechanism to build things that goes through some pipeline where you're eyeballing large log files full of long exception chains.
Devs see the world as one where velocity and progress is among the most important, while 'sysadmins' (a term no longer used by companies and recruiters, unfortunately) have to worry about keeping the applications actually up so they can be used.
DevOps just seemed to have swept the sysadmin under the rug under some pretty words about breaking down barriers. It feels more that devs broke down the wall sometimes. The amount of infrastructure chaos and mess and confusion I see today (as a contractor bouncing around different places) seems higher than traditional infrastructure shops of 10+ years ago.
This is very true. Having a mental map of how things work is vital for correct (and good) troubleshooting.
Also, a ton of people use abstraction as an excuse to not learn a body of knowledge that contains fundementals (especially on the systems side).
In comparison to network engineering for instance, where having (quite) low level knowledge of how protocols work and interact with each other is vital.
Isn't that reductio ad absurdum, though, epecially considering the original comment is saying that the levels of abstraction eventually hurt more than helping?
Pretty much. I know several admins who appear to be joining a growing pool of luddites who rail against anything new. They're particularly butt-mad about anyone drawing more salary than them. "DevOps" is their favored totem to direct their ire at.
I used to try and convince them otherwise, but it turned out to be a completely futile waste of time. At the end of the day, persistent FUD just means a more lucrative job market for the engineers who are pragmatic and fearless enough to give an honest try at making the newer paradigms work for their employers.
You know all those people who say, "Never rewrite a project! It'll always fail and take forever and cost too much and..."? I've spend much of the last decade getting paid reasonably well rewriting their projects. (Anyone remember mod_perl 1.x?)
Every build system is like Make, but more friendly for their language (IIRC Make was originally for compiling C and C++). Make just so happened to become generic enough to build damn nearly anything and also get bundled into most Linux distros.
I think the author is arguing that having to install a shit ton of dependencies to use some other Make-like build system is garbage. That’s true in some cases. But I wouldn’t want to use a Makefile for packaging Node; npm is great for that and understands how Node works.
If you thought having to deal with old COBOL programs was a problem, it gets much worse. When there are 10 year old containers in production, and the parts to rebuild them are long gone, then you have a real problem.
There's a reason that Google has internal systems which can and do rebuild everything from source.
The right answer is to use your own repository and pull those images in-house where they can be scanned and verified, I.e. , run a container from that image, scan it using security tools, then action from there.
This is why we have a local registry and for any container that is goes to testing or production or is needed for builds should be build-able from source.
We still need to figure out how to best tackle the issue of online repositories being taken down or vanishing. For now we run apt-mirror.
The same thing you do with any 3rd party dependency -- save a copy of the source. Pull from DockerHub all you want but keep the images and their sources in a private registry and deploy your actual services from that.
Hadoop is a rather extreme example... It's bad, but not everything nowadays is. Many newer pieces of software install entirely from source with one command.
Also, this is not at all endemic of containers, there's simply zero connection. Dockerfiles tend to be very simple and easy to reason about. The most complex application I have is around 50 lines of Dockerfile, and that's mostly just made more complicated to arrange things for the best layer caching.
I suppose we're supposed to believe that this is somehow worse than the days of debugging m4 macros and autotools just to get a build that doesn't work.
Does that Dockerfile build a container that only has that app in and it's required dependencies? Almost all of the ones I've seen given as examples seem to have an entire copy of the OS in.
A container is basically a glorified chroot, so there is a few things not strictly needed, but I typically use Alpine as a base system, which has a shell, some core utils, and musl libc in just ~5 MB. Since it's on its own layer, it gets deduplicated both in build and at runtime with other Alpine containers (and many Dockerhub images have an Alpine option.)
That being said, since Go binaries have no inherent dependencies, I have indeed made Docker images containing exactly one file: the Go binary. These containers are basically the same as fat binaries, with the benefits of Docker scheduling and networking at runtime.
> I have indeed made Docker images containing exactly one file
To anyone reading this, you need some magic compiler flags in both Rust and GoLang to make sure it's a statically compiled binary (doesn't dynamically link against GNU lib-c).
But yes, this is super neat. I also like how it reads in the docker file:
Ah yeah, without CGO_ENABLED=0, you'll get a very cryptic error when the ELF binfmt can't find the linker binary...
Never tried it with Rust, but I look to using Rust in the future, so I guess I better find out what the flags are for Rust.
Sidenote: It's often useful to have ca certs and timezone info. At that point it's probably not a bad idea to just use Alpine and apk add those things.
By default, Rust compiles all Rust code statically, but the standard library depends on a libc. If you want to use MUSL, you can. If you bind to C libraries, you may need to configure it or not, it depends on how the wrapper is written.
> And since nobody is still able to compile things from scratch, everybody just downloads precompiled binaries from random websites. Often without any authentication or signature.
Apache has official mirrors that host repo files for various package managers so you can install using apt-get or whatever it is that replaced yum (dnf? dnf):
There are over 2.9 million lines of code in Apache Hadoop alone, not counting dependencies. If you can't trust Apache, you can't trust Hadoop, regardless of whether or not you can compile it yourself.
EXACTLY. It’s just software. There’s no easy answers here. There’s vulns in everything from hypervisors to node modules. Building from scratch isn’t going help.
Pragmatic solutions where possible. Like scanning containers, using OWASP tools on your repos etc
Did you read all those lines yourself? Did you even confirm checksums matched before running them?
I think that's the parent's point. You can build from source, but how do you trust the source? Is it any more egregious to trust a prebuilt binary from a specific website than it is the raw source? If you can't trust the binary being hosted by the author/caretaker, can you really trust the source being hosted or maintained by the author/caretaker?
I don't think his point is so much about the source as it is about updating N containers. For instance, say there's a known libssl bug. Can you tell how many of your containers are running that version of libssl? And how do they get updated?
1) List the number of containers running pre-fix versions of images of libssl-using server software. 2) Bump the version of the images you're using as a base for your server images to post-libssl-fix and push.
I think the point isn't that we can build from source, but why. If its a huge codebase you can't independently audit that source code. So ultimately if you compile it or the organization making it doesn't matter for purposes of trusting that code not to be malicious.
But this isn't a website hosting a binary. These are binary repos hosted by Apache, who self-hosts their VCS repos as well. The idea that Apache can be trusted to host one safely but not the other is absurd, and the idea that you are more likely to notice malicious tampering via MitM attack on 2.9 million lines of code than you are a binary is laughable.
I believe this is true and if so, then the argument should be against package managers, not Docker specifically. Most (if not all) of the official Docker images are built either by compiling the binaries from source, properly installing the binaries from the base distro's package manager, or pulling and verifying a pre-built binary from the vendor's website. For most cases, I don't see anything wrong with any of these.
Personally, what I like is no longer having to setup arch specific build machines containing all of the build tools and dependencies for all binaries that I wish to self-compile. Instead, I either use the vendor's Dockerfile which already contains everything it needs to build from source or I simply write my own if there is not one available. Building and distributing these binaries in the form of Docker images is a breeze using Gitlab CI and container registry and is just as easy with a small VPS and Docker Hub.
Actually, I think there's a subtle distinction everyone's missing (which the original article may or may not have been making):
Unless one can compile it oneself, how can one trust that a particular version of a binary release correspond to a particular version of a source release?
If the process is reproduced by another trusted-enough source and is identical to the official release, then I'd say one can go ahead and trust the binary release of either one.
Sadly, I don't think this is generally done, though perhaps ones own spot-checking of the official release is enough.
That's supposed to be the basis of modern science, too, though, of course, it's not generally done there, either.
No, the point was "when you routinely use binaries from a bajillion different sources of varying degrees of trustworthiness, bad stuff is bound to happen".
I think part of the problem is that Docker is the now popular answer to the problem of 'it works on my machine'. Unfortunately 'works on my machine' also involved going to the web site of a particular tool or library and following the steps recommended for quickly trying the tool out, which gives things like curl http://somewebsite.com | sudo bash situation or following the steps in a blog post where someone quickly compiled a bleeding edge version of it, with all the dependencies to build it, on their Ubuntu laptop.
I work for a hosting company and we host our own Openstack based public cloud. We have a hard demand that for production systems we build all binaries we use from source. We actually build these in docker and use that to deploy to production.
What I'm trying to say is that the one doesn't exclude the other.
And we actually use make quite extensively.
I do however see the ops point, building from source hasn't gotten easier.
I think the only reasonable answer is "it depends".
After the new updated base OS images for our systems after Meltdown and Spectre were out, it literally took us ten minutes of human and about an hour of machine time to recompile all of our containers, run the tests and deploy them to production on our Kubernetes cluster, replacing the old insecure ones.
At the scale of our systems, any grumpy sysadmin would have spent at least several days untangling the dependencies and carefully restarting all servers in the correct order after some manually `sudo apt-get`ing (and probably forgetting a few of the lesser used systems).
Sure, typing "FROM ubuntu" (which by the way is a trusted and cryptographically image, contrary to the OPs concerns) leaves me at the mercy of whoever I trusted compiled that image.
Then again, what difference does it make to trust whoever compiled that ubuntu-17.04.iso image I put on my CD?
Or, as Brian Tracy taught me to say in these situations: "You may be right."
> At the scale of our systems, any grumpy sysadmin would have spent at least several days untangling the dependencies and carefully restarting all servers in the correct order after some manually `sudo apt-get`ing (and probably forgetting a few of the lesser used systems).
if you are operating at suchs a large scale, your sysadmin should be automating things (not necesserly with docker mind you).
This automation has been possible on *nix for the better part of a decade by now.
Agreed. As a (potentially) "grumpy" sysadmin, I wouldn't be manually apt-getting or forgetting anything. In fact, one of the lessons I've learned is that, at large enough scale, automated mass upgrades can do the forgetting for me, so verifying (also automated) is important.
As for any "tangle" of dependencies that may exist, I've never seen that be caused by any choice made by "Ops" but, rather, solely by those writing the appllication code being deployed. As such, it would apply just as much to a Docker image as to (what I view as) a traditional deployment.
I do often think about how, by using large JS packages, it's very feasible that in the daisy chain of NPM dependencies, somebody's managed to slip in malware.
I'm not sure what to do about it other than just not using NPM at all!
Checkout from npm retire and auditjs (there's also more non free options). You'll likely get some false positives (such as, who would host jQuery on a public/untrusted cdn? or why would you even pass user input into that?), but if a real malware actually showed up, you can find out as part of your CI process. Using the lockfile provided by yarn/npm is also a good way to reduce accidental unnecessary package updates.
This has been my complaint from day one. Instead of docker have something like ports or pkgsrc and simply create tools that simplify sandboxing, like cgexec, a google kafel -> ebpf filter then all the package manager has to do is well package. Docker IMO is a mudball of concerns that need to be separated.
jails are actually very mature if you compare them to docker. They have a well working security system and sane networking. (docker just does endless NAT abstraction, which is terrible for certain use cases. Not to mention it breaks a ton of useful networking features)
I was asking a coworker the other day about using containers with a "shared" mapped filesystem subdirectory for UNIX file socket communication versus encapsulating everything in a TCP/IP network stack, but my coworker was concerned about the security risks of a mapped filesystem on the containers.
I don't know enough about BSD to really comment but do any of them have resource and syscall limiting (xgroips, seccomp) like functionality? IMO it should be mandatory for distributed application bundles.
not to be pedantic but why should containers be patched for security issues?
the entire point of containers (and orchestration systems) is the ability to push updates without downtime.
Just update with a newer container....am I missing something?
Also about container security - a strict process internally can easily help counter that (I believe Shopify had a nice talk about it at the Google Cloud Platform event in Toronto - everything from using trusted images, running only signed images and going through a security check for each layer).
EDIT: To add to it, please don't patch containers - the entire point of an "image" is that if I run it locally and on my datacenter - it should behave the same - live patching them just voids this concept.
How many people do you see setting up a deploy pipeline that includes pulling security updates into the base image and redeploying as needed?
In my experience, it's much more common to see docker images that have been untouched for months with zero accountability of what exactly is running there.
That's my real concern: old, out of date images. How will we handle another OpenSSL-level vulnerability in 7 years, with bad code buried in containers that haven't been updated in 4, and for which the build infrastructure is no longer functional?
This really isn't that different from having some pre-built statically liked app still kicking on your system with the source and/or build tooling long gone.
There aren't really easy answers here. You can't fix bad software with more tooling.
I blame that on docker hub. It's the fault of Docker - the company. They have a security scanning software that they decided was an enterprise feature. This sort of issue is to be expected if you claim that security to be an enterprise feature.
Perhaps interestingly, I've had some similar complaints about package managers like homebrew.
I've noticed over the last decade that a certain level of knowledge about building from source seems harder to find via search or engaging in a community forum/chat. The assumption is often that everyone will be using the package manager (I'd say that's especially true on macOS, but it might be an artifact of me spending more time there since it's been my most used dev machine), so if you have trouble building something, the answer will frequently be "just use homebrew -- whoever maintains the formula will already have solved your problems for you."
There's two problems with this: first, that's presumably true for the source build, and if you can hit a case that the source build doesn't work for, chances are pretty good you can hit a case that brew doesn't work for (my experience is that more often than not they go together). Second... I'm happier to use package managers for applications meant to live on my machine and never go live elsewhere (heaven knows there's lots of stuff I just want to install and get on with my life rather than fiddle with), but for applications I'm deploying, it seems to me it's generally wise if someone involved in the project has a picture of the build details and dependency graphs in their head.
Some of the automation we're throwing at operations is really convenient. It's not simple, though. And depending on how much forethought is put into it, I'm starting to think of it as ... maybe not technical debt, but a technical credit card which is super convenient at times, but also can easily become debt if you're not careful with it.
I never installer homebrew on any mac I develop. When you see all the possibilities to customize projects, it's just ridiculous to want to build something with a one-liner and expect it will take thousand of decisions for you.
There are plenty of mechanisms in the container ecosystem to address each of the problems the author states. Building software isn't all that it is cracked up to be and it isn't very fun. Often times build steps for open source projects are not well documented or the build process is inherently difficult to push a paid product. Example: nginx
I think the entire statosphere of DevOps is just about dead on the whole in 2018 .. in retro, after working with things like Docker .. and more specific industry variations beyond the Amazon tech, it makes no sense to dwell on the security / control of a dedicated systems admin professional since the tools are all outside the local domain anyway. The rest from VoIP to IoT to container services are managed whole-sale .... SysAdmin is a dino in the age of distributed tech and outsourced IT resources.
I'm a programmer, so I'll take heat for it .. but I don't see a need for them anymore.
The entire purpose of DevOps IMO was to close the gap between sysadmins and devs through code. Devs doing everything, including infrastructure, was and is the entire plan! Public cloud made this super duper easy.
The problem is devs don’t want to manage core infrastructure (VPCs, networking, modules for deploying lambdas and database clusters and container orchestration clusters, etc) and somebody has to do that stuff
Ideally, those would just be features like any other software team, as it’s all API calls at the end of the day. But lots of companies have issues with structuring their platform teams like software teams because its “not software” even though it is
This problem is more deeply entrenched at large companies with hundreds of millions of dollars of compute that they own that is owned by an old school IT function that can’t fathom the idea of either giving it up or making it accessible like cloud and would rather pay VMware tons for tools that make teams even slower than have their sysadmins become developers
Then there’s the whole protectionist “You’re taking my job” and “devs can’t possibly know this much about $infra” that isn’t dying off anytime soon
> Ideally, those would just be features like any other software team, as it’s all API calls at the end of the day. But lots of companies have issues with structuring their platform teams like software teams because its “not software” even though it is
Just because it's is implemented as API calls at the end of the day doesn't necessarily make it not "not software" (if you'll pardon the double negative), at least in the sense that I believe you mean.
To whit, I believe you're suggesting that if something can be expressed as code, it's all "software" and can therefore be designed, written, and maintained by the same kinds of experts, software developers.
I disagree, because the nature of the infrastructure-as-code code is too different from the application software code.
One could, similarly, express an FPGA configuration in code, but a software developer would not automatically be good at programming one. This is even likely to be true for less extreme examples, such as programming expertise not automatically transferring from general software (for lack of a betterm term) to code that works well on, say, GPUs.
In the case of IAC "software", a more mature design is more likely to resemble traditional sysadmin/network/security best practices than application software features. It could also have significant financial side effects if there's an error, assuming public cloud, which could require more stringent standards of control, review, and quality, especially if a company ends up in SOX territory.
>Then there’s the whole protectionist “You’re taking my job” and “devs can’t possibly know this much about $infra” that isn’t dying off anytime soon
I'm sure some of this exists, but my own experience is an attitude not that devs can't know a certain amount about infrastructure but that they simply don't, often because they actually don't want to.
Perhaps they fear that if they do end up knowing that much, they'll end up being the ones to manage that core infrastructure, which you indentified that they don't want to do!
I'm not convinced there was ever a need for sysadmins, especially from the point of view of programmers. Programmers have always been able to do that work themselves.
It's just that they don't want to. They think it's boring, or perhaps even beneath them. It's a bit like tax preparation.
I don't think that's really changed all that much, even today, even if it's now programming against the AWS API, judging by the number of job postings for such a role.
Also, like with tax preparation, using a good expert instead of doing it oneself could save on a lot of money and/or future headaches, but it can come with "grumpy" old-fashioned-seeming nagging about procedures akin to annotating and saving receipts. Whether it's worth it depends on situation and, far more importantly temperament.
> I'm not convinced there was ever a need for sysadmins, especially from the point of view of programmers. Programmers have always been able to do that work themselves.
> It's just that they don't want to. They think it's boring, or perhaps even beneath them. It's a bit like tax preparation.
I'll half agree here, and half not. I'm coming at this as a programmer who has historically done ops as well, currently doing more ops than dev.
There's parts of it that programmers don't want to manage, like Apache configuration, provisioning and sizing VMs appropriately, etc. That's the boring stuff, and I totally get it. I don't like managing most of that stuff either, and have got everything set up with service discovery to handle a good chunk of the configuration, and Terraform for managing resources instead of having to click through the Azure Portal, and Nomad for scheduling tasks in the clusters. Groovy.
There's also the parts that the developers seem to have no clue about. They're very happy with all of these abstractions, but there's a lot of layers underneath. They think in terms of making HTTP requests; when those don't work, it's up to me to figure out that their library is keeping a pool of HTTP connections open, and the Azure load balancer silently drops those connections out of the NAT table after 4 minutes of idle time (without sending FIN or RST packets to either side, natch). When the app gets terminated because it's exceeded its cgroup memory limit, I'm the one helping them adjust their JVM heap size. And when their requests take too long, I'm there helping them look through their query logs to figure out what's happening.
All of these cases have happened in recent memory, and it's never been a matter of "this is boring and/or beneath me", it's been a matter of "HALP! I HAVE NO IDEA WTF IS HAPPENING"
Edit: so yes, I agree with you in the sense that I'm a developer who generally speaking doesn't need a sysadmin. I installed Linux for the first time when I was 12 years old, 22 years ago :), and I didn't have a sysadmin then. In those 22 years, I've seen a lot of weird shit happen, and most of the people I've worked with don't seem to be well equipped to dive 3 abstraction layers deeper than they're used to to figure out what's wrong.
I'm not sure you're even half disagreeing with me here, especially as you just admitted you may have the same bias as I do :)
> All of these cases have happened in recent memory, and it's never been a matter of "this is boring and/or beneath me", it's been a matter of "HALP! I HAVE NO IDEA WTF IS HAPPENING"
These two cases strike me as actually one case, with the former being the cause and the latter being the effect. At least that's my contention: they had no idea wtf is happening in those lower layers because learning them was beneath them (no pun intended). Nothing was really stopping them from learning about heap size versus process/cgroup limits.
On the other hand, you do bring up an important point about (mis)behavior of network infrastructure. That's not something the average programmer would necessarily encounter or have access to, especially at scale, and therefore wouldn't be expected to know. Still, my point still stands: there's nothing stopping programmers from gaining that knowledge if/when it becomes necessary, should they want to.
Now, I'm not trying to play dumb. I mostly don't want to read too much between the lines. Are you, in essence, saying that your half-disagreement is that sysadmins are needed, not because (average) programmers don't want to learn all those layers, but, instead, are incapable of learning them?
Very very excellent points. I think that everyone on my team would be capable of learning them, but at least in this specific case it's not so much that they feel it's beneath them, but more of a fear. They talk about C as if it's something that only dark wizards understand, likely as a result of poorly taught early CS.
I think what I'm saying is that I feel like sysadmins are frequently needed for sake of expediency, especially when hiring younger developers. And maybe, due to that same bias, I actually mean "senior developers who understand abstractions several layers deep" :)
> I think what I'm saying is that I feel like sysadmins are frequently needed for sake of expediency, especially when hiring younger developers.
Thanks. That's a point I hadn't considered.
I'm not sure that expediency translates to a need, as such, but that situation is certainly different from the one I envisioned (where a sysadmin is merely a luxury or an optimization to the programmer-DIY-ops scenario).
> I actually mean "senior developers who understand abstractions several layers deep" :)
I'm not sure you do, since you admitted to really being at least part sysadmin, earlier :)
> I'm not sure you do, since you admitted to really being at least part sysadmin, earlier :)
Hah!
I wish I could remember which talk it was. Bryan Cantrill had a good line about DevOps in one of his talks (I think it was this one: https://www.youtube.com/watch?v=30jNsCVLpAE). The gist was "you can say you're DevOps, but when the shit hits the fan, you're either going to be Dev or Ops. If you're a Dev, you're going to want to debug the problem before rebooting the failed machine. If you're Ops, you're going to want to reboot the machine as fast as you can to get it back up." Through that lens, I'm definitely pretty far over on the Dev spectrum; when something goes catastrophically wrong and someone reboots a box to "solve the problem", my first reaction is "YOU FUCKER YOU BURNED THE CORPSE"
It's such a tricky thing all around. Looking back at what I wrote earlier, I also realize that in some ways I'm facilitating the ignorance. I've got a really nice Consul and Nomad setup for the team, so they can pretty much just toss .wars and Docker containers at the cluster and they'll automatically get scheduled somewhere with spare capacity. The load balancer, the database cluster, all of the service discovery and job scheduling stuff... they've never had to get in and set any of that up. Maybe it's time to do more mentoring...
Anyway, thanks for getting me thinking. I've been a little bit grouchy lately about all of the recent experiences of people not knowing how the stuff they build actually runs. You've been a great mirror for some self-reflection :)
Run your own artifactory, have that as your docker repository too, only allow the use of jars that have been vetted ... it's what we do where I work, it's not hard, and it solves 99% of these gripes.
He's right that security right now is a bit unreliable but random sysadmins writing scripts and manually configuring things is never a good guarantee either.
As a dev I'd like nothing more than k8s or similar to be the standard platform to run applications so that everthing is standardized and I dont ever have to require a sysadmin. I think already this is possible more secure than the "good old days", but in the future I expect it to be more so.
Regardless of your take on the article, could we, at least, agree that all build tools suck in their own special way after a certain level of complexity is reached?
Thank heavens people are starting to point this out.
The last devops team I worked on had an obsession with shiny. Never mind that they couldn't bootstrap a new base database for their application anymore or automate the entire application deployment (even with a phased approach) on either VMware or AWS. They wanted to keep piling new tools (often with sub 1.0 version numbers) of new tools on top of an unstable foundation, and would just shrug when it fell apart in production (which it commonly did).
I tried pointing out to them that by giving commit access to their internal puppet git repository to every developer in the building, they had effectively given root access to them as well. All I received were shrugs and blank looks all around.
One thing from the article that doesn't seem to be discussed enough here in the comments is trend of pulling random Docker images from the internet and deploying on your infrastructure simply because it's easier to integrate random versions with feature X than maintain your own builds, or work with the vendor's provided packages to achieve the same result. The security implications of this in particular has been bugging me for years.
It is a mess, but one part of the ever transitioning "sysadmin role" that is valuable is, being able to code and understand it. That being said, good sysadmins (rare), can and do.
Isn't most of this being driven by society in general though? Everyone wants everything now... we're generally feeding this trend to do whatever it takes in the shortest amount of time.
I think a lot of stuff that I used to need to know —- OS kernel internals, hardware specifics, relatively deep network knowledge —— is less useful in the age of public cloud, containers and immutable infrastructure. There isn’t really a need to tune a kernel for performance or do deep troubleshooting to root cause OS issues and ensure uptime; if something seems off, kill the machine or container and let auto scaling or the container orchestrator take care of it. If something wrong with cloud networking, call AWS, as the most we can do is prove that it’s on them. Etc.
While I still know these things, I haven’t had to employ them in a while. I will probably use those even less as I start getting into management. That’s a bit saddening.
But the new stuff I’ve learned over the years, namely, designing systems like I’d design software, is amazing. Applying TDD to infrastructure isn’t something I thought I’d be doing, but here we are, and we have “modern DevOps” tooling (and lots of other things) to thank for that.
I was all ready to rant and then read the first part: "I’m not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.
This rant is about containers, prebuilt VMs, and the incredible mess they cause because their concept lacks notions of “trust” and “upgrades”."
nixpkgs isn't rigorous about reproducible builds. Hadoop is actually a great example of this. They do not build it from source, a prerequisite for calling a build reproducible. Instead, they download the binaries that the Apache project has already built and run patchelf on them to make them work.
That is true, but in my experience in small scale personal desktop and cloud computing, NixOS is in practice reasonably reproducible on the system level.
For me in terms of interface, the strength of the Nix ecosystem is declarative system, ops and service configs in the same language used for package and build specification. The technical strength is striving towards reproducibe builds by hashing the dependency tree to build an immutable store. Yes sometimes this means getting binaries but you can always pin the package version or even have multiple versions in tandem. The practical upshot of this immutability is system level rollbacks, which are generally reliable although there are ways to break it. Yes there is garbage collection.
Nixpkgs has is quite an achievement, and yes it has its warts but we are working hard to make it better. If we manage to shape up the data science I will try it out at work too. Im very curious how it might scale.
> Essentially, the Docker approach boils down to downloading an unsigned binary, running it, and hoping it doesn’t contain any backdoor into your companies network.
I don't get why people still claim that this is "the Docker approach". This is not the docker approach, everyone hopefully knows this is an anti-pattern by now.
I'm out! I'm done. I started as a developer. I then migrated to sysadmin, then systems engineer, then devops, and back to developer. I am done with playing the platform game. None of it matters. What matters is writing code that does work leading to profits. Always be coding. CAPEX over OPEX.
> None of these “fancy” tools still builds by a traditional make command. Every tool has to come up with their own, incomptaible, and non-portable “method of the day” of building.
100% this. Learning Make a few years ago was one of the best decisions I’ve ever made. It’s simple (until it’s not), straightforward and available on just about any Linux and UNIX installation (version compatibility aside).
Trying to make the case for POSIX-compatible Bash scripts has been tough, though.
I also agree re: Docker. While you can secure a container image with USER statements in the Dockerfile and knowing how to give containers just the capabilities that they need to do their job, it is WAY too easy to run them as root and give them privileged access to things. It should’ve been the other way around. Also, every container orchestration platform seems like a really elaborate hack.
I feel like this has a lot to do with the "cattle, not pets" mindset and disposability of modern infrastructure. For instance the author talks about patching a container. But that's not idiomatic- instead you would include the patch in your build pipeline and deploy the new image.
Depends on the approach, you can use mutable or immutable containers. In fact, the OpenVZ VPSs that were at one time reasonably popular were just containers.
However I do think OP made a mistake by using Hadoop as the example. Hadoop has much more in common with an OS at this point than a single application. The ability to download a single component of the eco system is mainly to support small scale testing and development.
You wouldn't build Debian by going and getting each piece of the linux kernel from its source building those building all the pieces of the middle ware up and so forth... that is the whole point of having Debian in the first place.
Do certain complex software eco systems need better support for fingerprinting their builds, Yes. Does all this mean the sky is falling... probably not.
There's always been propriety code installed on systems, who knows what that big Oracle database is actually doing?
Are these sysadmins actually looking through the open source code that they're compiling to make sure that there isn't a security flaw in plain sight?
With Docker you can get containers maintained by a trusted source, as a sysadmin you don't have to deal with all the hassle of upgrading things, you can just replace the container with the latest version. With Docker content trust the containers are signed.
There have always been cowboys out there and that has nothing to do with the tech stack.
If a company has a threat model and a list of business goals then they would have at least a risk matrix and they might decide what to do: either go the slow way and build software they can trust or accept the risks, backdoors and all the rest. Most companies skip all of that and hope for the best. Not always a conscious decision.
Sometimes they get their unpatched servers encrypted by some ransomware, remember they don't have any backup, close shop and move on to the next business idea. I've seen that happen.
I see this as more of an opportunity than a problem. The fact that Hadoop, Kubernetes, and other platform-like systems are complex to manage properly with good attention to security implies they should be delivered as cloud services rather than having everyone run their own. This enables K8s users to focus on apps while offloading management to specialists who can focus on running the services well.
If you operating at large enough scale you can bring the "cloud service" in-house.
One of the gating factors here is the both the speed of Kubernetes development (move fast and break all the things), and the terrible state of accompanying documentation.
If "they should be delivered as cloud services" is some sort of k8s apologist stance for its sorry state of maturity, then we got issues. OTOH, if it's "You shouldn't run it in house unless you have an army of people to read every new commit", that's wrong, too.
First of all let me be clear I'm not an expert in K8s and certainly not an apologist for bad software.
On the other hand there's a level of complexity in distributed systems that is impossible to avoid even in stable infrastructure. You have a design choice of trying to make the system as easy as possible to operate (at the cost of other features) vs. finding operating models that make it less of an issue.
Personally I would rather spend time futzing around with my applications that run on kubernetes vs. trying to run kubernetes itself. It would be sufficient if Kubernetes services were portable across a marketplace of providers so I could pick a place to run my applications.
Also as far as security is concerned it appears to me that a lot of people are deploying technology that they simply don't understand. This is not just a problem with Docker but with apps from ecosystems running on npm and pip.
You can build images securely with Docker but it requires building them yourself, using private registries, checking carefully for vulnerabilities, and testing. If you don't want to do this, pay somebody else to do it right. There's no free lunch.
> The first internet worm spreading via flawed docker images?
Good question, why don't we see exploits of all that implicit trust to the degree that, eg, the DOS shareware scene gave your PC visible virus infection, or the early internet gave us worms that would bog down the whole net?
My attempt at an answer: Because the black hats aren't hobbyists anymore. Visibility is for amateurs.
Are you saying that every new black hat is immediately a professional? Or that there aren't any new black hats? In every other activity of human life, new amateurs appear as the older ones become professionals. Where are the visible exploits from the new amateurs?
> Are you saying that every new black hat is immediately a professional
To a degree that's what I'm saying. Sentencing for "computer crimes" when perpetrated by non-corporate entities has reached epic levels. I'm sure qualified talent takes that into account.
Except.. isn't that based on the misconception that all of these "cloud" tools that "come from Google" are actually used there?
Google isn't using Kubernetes with Docker on AWS (or even GCP). I'm quite confident that they're running their own software on custom bare metal, with custom virtualization/container abstraction layers and custom management software.
Seems like the security issue is easy to fix? Only installing binaries you trust and comparing them against a hash is basic computer admin 101; it's not specific to Docker. Just make sure you trust whoever made the docker image, and that they give you a hash to compare it against.
Isn't this just an example of automation? Other than sunk cost, by what argument is it reasonable that the sysadmin job should in any way deserve to be protected or aught to continue to be a thing? To my ears containers sounds like they've kind of solved the problem.
How have containers solved maintaining production systems?
It's just moving the abstraction layer higher up the stack.
Also, who will design and build security for these systems? Or care about low-level performance?
A ton of dev's don't care about prod in my experience, and just want to ship shiny features.
Containers and VM's are not very different from a deployment or maintance standpoint. The deployment strategies used with containers could be done with (lightweight) VM's a decade ago. (heck, jails have been used for nearly 25 years)
Also, sysadmins have been automating stuff since forever. Without automation one inherently has a very unstable system.
No one is arguing that a sysadmin job should be protected. The article is more of a reminder that the sysadmin responsibilities don't disappear because we paper over them. Containers are a great tool, but they can trick people into thinking that certain things aren't a problem merely because they have been swept under the rug operationally.
With that in mind, I don't see sysadmins going anywhere soon. Every layer of abstraction is supposed to 'eliminate sysadmins', but inevitably it becomes yet another system that someone needs to audit and maintain.
The fact is, many open source software projects provide their own Dockerfile (and in many cases, images). Using these is akin to downloading and deploying a release tarball.
It’s ironic. I love python but the syntax for make files was a pain. They’re rather cryptic. I guess they’re a staple time to conquer them once and for all.
It's trivial to call bash syntax from within a Makefile. The Makefile just breaks these things up into rules which define steps and dependencies for these rules to occur.
> Back then, years ago, Linux distributions were trying to provide you with a safe operating system. With signed packages, built from a web of trust. Some even work on reproducible builds.
> But then, everything got Windows-ized. “Apps” were the rage, which you download and run, without being concerned about security, or the ability to upgrade the application to the next version. Because “you only live once”.
These two a both very valid and great approaches for solving different problems. Sometimes you're just a regular user without any valuable data that just wants to do things in a quick and convenient way. And sometimes, you're a system administrator that needs to evaluate the whole build pipeline and plug all the holes for production deployment.
Both alternatives should exist, and one doesn't cancel the other.
Believe me or not, most of the underlying infra does not run on the popular technology of the year. Far, far from it. That's why it works.
Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.