The sad state of sysadmin in the age of containers (2015)

ex_amazon_sde · on May 16, 2018

Ex Amazon here. Most grumpy system engineers did not disappear: we got hired by Google/Amazon/etc to build large-scale infrastructure... and sometimes sell it back to you as a service.

Believe me or not, most of the underlying infra does not run on the popular technology of the year. Far, far from it. That's why it works.

Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.

seorphates · on May 16, 2018

Grumpykins here. I think the term "Modern devops" sort of nails it but not quite how you used it. Most departmental/enterprise sys admins/engineers of lore that had even the slightest necessity for life outside the box scaled anything resembling automation to its breaking points. Combined with knowing and serving the reasons for their existence - developers, users etc., and devops is nothing new - it is now simply the necessary manifestation of progress at scale (albeit positively devoured by managers speaking business, not entirely unlike "agile").

"Use what works" definitely presents a lot more choices these days and likely will forever more.

"Use what works well" is something different where "well" implies helpful, dependable, predictable, manageable and so on that will continue to scale with your needs. Only breaking things down the "old-school" way will lead towards success, stability, security and life outside the box.

Good devops is still, primarily, good engineers engineering good things, for themselves and others.

Granted the article is from 2015 but my impression is author is not just cranky, but scared.

nunez · on May 17, 2018

Agreed. DevOps has been a thing for a long time. The funny thing is that the core of the DevOps philosophy —- to unite development and ops through code —- is still a rarity. SO MANY big companies have entire departments for DevOps that are basically either developers doing release management sysadmins writing pseudo code for infrastructure while making it somehow less accessible to development teams.

hennsen · on May 17, 2018

DevOps is uniting dev and ops through code? I believe it’s at first and most important to unite them through good(early, often, honest, etc.) communication and collaboration instead of “you broke x” and “you have to fix y” as well as “this is the other (dev or ops depending who says it) departments fault/task is the most important thing in DevOps. The code is just a tool to make this collaboration easier / automate it where the other things already happened and all understood they work on the same goal and not as enemies.

gm-conspiracy · on May 16, 2018

A good sysadmin would not look like they are doing much work (everything is humming along and can self-heal minus physical problems), but a good devops person is constantly busy.

seorphates · on May 16, 2018

I sort of agree but a good sysadmin was never idle on the inside. I'm seeing good devops people getting worked well beyond what I'd consider reasonable expectations i.e. "Oh, look, you can do everything! Here's everything!".

They're being perverted into a role having a full load of pure operations with shit for processes (and, often, systems) and an expectation that you have time to automate and shore up all the shit and technical debt accumulated since.

Can most good or even extraordinary developers simultaneously be elbow deep on a dozen unrelated products and actually get reasonable traction? I can barely keep one glass castle together, myself.

piva00 · on May 17, 2018

This is exactly my sentiment and why I moved away from SRE and back to SWE. I felt busy all the time doing development of tools and infrastructure while at the same time aggregating the role of operations.

Never having the time to properly finish a project that I was proud of delivering, turning those into services so we could leverage self-servicing was a dream that most of the time never happened, we were left with half-done systems requiring tons of manual intervention (lots of toil) while having to move fast to the next thing...

AndrewUnmuted · on May 17, 2018

I think a lot of data engineers and transcoding folks would have similar reports. But you’re right; the problem with DevOps is the reach of their usefulness. If your whole company is built on code, your DevOps team will always be overworked and under appreciated.

segmondy · on May 16, 2018

Great sysadmins get fired unless they learn to pretend to look busy. They automate everything till they have nothing to do.

Intermernet · on May 17, 2018

Logging is your friend here. You can spend days scrolling through logs, doing an occasional grep and making disapproving noises occasionally. Bonus points for developing some graphs for the next meeting.

shakna · on May 17, 2018

My introduction to statistics was learned as a sysadmin, trying to show that I was doing 'stuff'.

hennsen · on May 17, 2018

Can you elaborate that?

Whats a “devops person” for you and what keeps them busy? And why cant they have systems that hum along self-healingly automated?

gm-conspiracy · on May 17, 2018

Most of the tools of the modern "devops person" are undergoing constant development themselves.

So, not only is there the responsibility of creating this self-healing automated infrastructure, but keeping tools and resume up-to-date, as well.

spiderjerusalem · on May 16, 2018

What fascinates me about this is, and sorry for being morbid, but what happens when y'all die? Does knowledge of the lower levels of the stack go away with your generation, or will there be enough of us young ones picking the important stuff up?

Twirrim · on May 16, 2018

There's rarely anything old school sysadmins have learned that hasn't come from experience.

Been there, done that, fought that shit the first time. And the second. And the third. (it's amazing how often I find myself solving what are essentially the same problems over and over.) It's one reason why you'll find we'll push back on the "ohh shiny". There are many wonderful and fascinating things coming out. Tech is an amazing field to be working in. But it's also ridiculously frustrating because no one pays any attention to _why_ things are done the way they are, or _why_ approaches haven't worked in the past (I'm all for re-introducing past failed approaches, as long as there's evidence those reasons have been investigated)

You'll find a common trend amongst us in that most of us sort of ended up in the role accidentally. Schools and college teach you to become developers. Few people tend to head to college with the view of specialising in the ops side of things.

Even speaking as a comparatively old-school sysadmin, my strengths come from being flexible and adaptable. What I do today is nothing like what I was doing 5 years ago, and what I did then is nothing like what I was doing 5 years before that, and so on down the line. The field is constantly in flux.

I just have the best part of two decades of experience to both anticipate the problems, and be able to get to diagnosis quicker when things do go wrong.

Even as the older sysadmins die off I'm fairly confident there will be newer ones to replace them, because people are going to continue to learn from the problems they run in to.

ams6110 · on May 16, 2018

Ansible is one "ohh shiny" thing that has greatly increased my productivity as a sysadmin. Before that I would automate what I could with ssh and pdsh and scripts, but it was never as well polished as Ansible.

I'm even using ansible for ad-hoc stuff (tweaking a config, restarting a service) because it's easier to do that from a management server than log in to some remote host, get oriented as to the OS distribution and version, and run commends in the shell there.

noir_lord · on May 16, 2018

I like ansible with vagrant as well - it makes for a nice clean way of deploying to development environments while also been nicely 'self' documenting and not limiting (you can drop back to shell), it's a lovely tool for the most part.

Edit: The thing I really like about Ansible is how unsexy it is, it's just a nice sane way of doing largely what you could do yourself with ssh and bash but in a language that doesn't make you want to cry.

I've been around linux since the 90's and Ansible feels comfortable, predictable and stable - what you would want in a piece of software that can be mission critical in the most fundamental sense.

dozzie · on May 16, 2018

It would be even nicer if people weren't advocating it as a configuration management system, only as what it is: a deployment system.

piva00 · on May 17, 2018

I think I achieved my perfect balance of tooling for current systems with Ansible and Docker.

Ansible automates even provisioning in AWS. I never really liked CloudFormation's way of creating stacks, so I began to use Ansible to document the application's stack, have used it to deploy systems running in EC2 with RDS, ElastiCache, SQS, SNS, DynamoDB, etc.

After provisioning/configuring I'd end up with an instance in EC2 with Docker installed and from there our CI/CD would just trigger the deployment playbook that simply would do a `docker pull` of the version tagged for release and start the container.

Ansible helped to also install our Splunk forwarders as running it from Docker was a hassle still not so long ago, so we would have the best of both worlds: configurability of the host machine completely with Ansible and packaging and predictability of deployment through Docker.

I advocate this stack as simple enough to learn and use with widely used tools without their fancy (and often broken) features. Even though they can be still a bit immature, they are production-ready enough.

ci5er · on May 17, 2018

Who advocates it as a configuration management system? I've not seen that...

resonator · on May 17, 2018

https://www.ansible.com/use-cases/configuration-management

nunez · on May 17, 2018

Everyone compares it to chef and puppet

ci5er · on May 17, 2018

Ansible? I could see using it to build a container that then got "orchestrated", I guess, but ... hmmm. I've never really looked at them as doing the same thing. (Nor have I looked at either Chef/Puppet for CM. Maybe I'm just stupid... My ex- certainly thinks that I am...)

switch007 · on May 17, 2018

It calls itself one, I imagine many people also consider it one. What is a true configuration management system?

Jedd · on May 17, 2018

I used puppet -- unhappily -- for a while before discovering SaltStack.

Salt, like Ansible, is a response to the Puppet/Chef hegemony -- ruby and DSL's and tacked-together bits that are a nightmare to install and upgrade in themselves).

I'd suggest Salt did some things much cleaner than Ansible, and while it can be an orchestrator, and deployment system, it also excels as a configuration management system ... but I think typically people are talking about configuration management systems that are tightly coupled with more formal change management systems (of which I've found pretty much none that work well).

scurvy · on May 16, 2018

It's a legit concern. There was a NANOG panel about this exact thing. I believe the quote was, "Take a look around. We're all old and greying. We have a severe pipeline problem." And then much to AWS' dude's dismay, the topic shifted towards blaming cloud services because no one takes the time to learn how any of this works any more.

Want to guarantee your child's future employment? Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.

setquk · on May 16, 2018

I’m going to teach my children how to navigate the world of insane Harry-Potter-esque rules which all IaaS/PaaS platforms enforce upon you. They will become software language lawyers and be masters of the electric Disney dollar.

You know like “ahh don’t call the messaging endpoint more than 800 mega-milli-times per mega-nano-second or it will cost you three bazillion CPU credits, but only on three and a half cores which will starve all your instances, issue an invoice and proceed to melt your credit card.”

_wldu · on May 17, 2018

DDOS is now a billing issue.

kazen44 · on May 16, 2018

> Teach them how to build networks and truly understand network protocols.

I don't know how the situation is in the US, but in my country network engineering is actually quite a popular field of study. (we have college level education in network engineering).

The one thing that stands out though is that it's mostly done by youngsters who have either sysadmin experience, or worked in IT before that. Almost everyone who comes from high school goes into Software Engineering.

I think this is mainly because networking is quite an invisible field so to speak. Many people don't even know your job exists, and many young people only see the shiny hip side of it. (being Software Engineering).

Being good at network engineering is hard, especially once you get past entry level work and actually start being responsible for designing large-scale networks. Mainly because building a network is a major financial investment where garuanteeing performance is hard without either a ton of experience, or a shitton of lab time.

this kind of work pays very, very well though.

bordercases · on May 16, 2018

What country?

zamber · on May 17, 2018

Netherlands

rootlocus · on May 16, 2018

> Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.

Building networks and understanding protocols is something machines can do TODAY. The entire internet was built to survive a nuclear war. It can reshape itself, and it most definitely understand network protocols. By definition, protocols are the language of machines over the network. And it's been like this for a few decades.

The only reason knowing low level network protocols in a world where machines can code anything (which makes them better analytical thinkers than humans) is to beg the machines for mercy in their ancient tongue.

gaius · on May 16, 2018

It can reshape itself, and it most definitely understand network protocols

It really, really can't! That quote about the Internet interpreting censorship as damage and routing around it? Or the one about information "wanting" to be free? Taken wildly out of context.

rootlocus · on May 16, 2018

Then what about routing protocols such as RIP, OSPF, etc? I'm sorry, but I don't know what quotes you're referring to.

jauer · on May 16, 2018

Routing protocols convey state information between routers, but really that is just table stakes.

So, what do network engineers do? Classically, Set up and troubleshoot those systems. Currently, transforming from manual work to building systems to deploy, monitor, and remediate routers and such. In other words, the same stuff sysadmins->(SRE|PE) folks do and undergoing a similar transition.

SRE methods are quite applicable to neteng: https://landing.google.com/sre/book/chapters/eliminating-toi...

gaius · on May 16, 2018

Then what about routing protocols such as RIP, OSPF, etc?

Don't forget BGP. I don't think they do what you think they do, at least, not to the extent you think they do them. There is a hell of a lot of manual work in running any sizeable network even within a single organisation.

rootlocus · on May 16, 2018

And what exactly do they do that I don't understand?

> There is a hell of a lot of manual work in running any sizeable network even within a single organisation.

I'm not trying to say they do everything, and no manual work is required. I'm trying to say machines are already doing part of it. OP believes that give a world where machines can code, they can't design or maintain networks, which I find truly ridiculous, since machines are pretty far from doing any kind of "coding" today, but they do networking and network protocols pretty well.

gaius · on May 16, 2018

since machines are pretty far from doing any kind of "coding" today

The first attempt at a system to turn plain English that even managers could write into executable code was 1959 - COBOL. So you're right in a sense, even 59 years later - but also wrong if you think networks are any more advanced than this. The Internet really cannot "reshape itself" and probably never will be able to.

spc476 · on May 17, 2018

RIP and OSPF are interior routing protocols---that is, they're used for routing within an organization (or autonomous system in Internet lingo) and deal with technical routing issues (fastest link, most bandwidth, etc). BGP is for routing between organizations and deals with political issues than technical issues (we need to send all traffic here due to contracts, unless it goes down, then shift traffic over there; and refuse routing information from such-n-such organization because they don't have their act together).

dbt00 · on May 16, 2018

Routing protocols are how computers communicate network admin instructions to each other really really fast.

They're still just tools wielded by humans, even at a distance. Even today.

busterarm · on May 17, 2018

Moreover, who is going to write the next protocols we need?

This isn't a field that is done inventing. Not even remotely close.

rootlocus · on May 17, 2018

OP was talking about a future where developers become obsolete because machines take over the development sector. Do you think writing protocols will be something humans will do better than machines?

busterarm · on May 17, 2018

Yes. Machines might come up with something that's 'good enough' though.

zippitydoodah68 · on May 16, 2018

Been working since 86 in this industry.

You lose and gain and you should always be mindful of what is coming. Humility is good. I love you young guys, your ideas keep coming and they are mostly good.

twic · on May 16, 2018

Same problem as making sure your system doesn't lose data when a server dies. Make sure you have enough copies of the knowledge by propagating it between people. Try to have some kind of offline recording (books?) for recovery from a disaster where you lose everyone. Have an idea of how to recover at a business level if you do lose the data forever.

The trouble is making sure these plans actually work. This is why Netflix randomly execute some of their employees every month.

jlgaddis · on May 17, 2018

> This is why Netflix randomly execute some of their employees every month.

I certainly hope this practice doesn't start catching on with other employers.

telchar · on May 17, 2018

"This is why Netflix randomly execute some of their employees every month"

Would that be their King Kong application?

dsr_ · on May 16, 2018

That's part of the reason why my last two hires have been at the beginning of their career. For both of them, it was their first major sysadmin responsibility after having jobs involving tech support and occasional Linux experience.

The key is to pick smart people who are good at learning and find complex systems interesting. Then, of course, you need to have interesting projects for them to work on.

alekratz · on May 16, 2018

It's a very real concern. We have a thing called a "bus plan" for all of our tech employees (6 of us - small non-tech company). It basically attempts to cover everything that we would need to know if one of us gets hit by a bus.

seorphates · on May 16, 2018

They'll never die.

Some monkeys will just keep swinging branch to branch. Some monkeys check out the tree.

karlkatzke · on May 16, 2018

What makes you think there aren’t young people doing systems administration? Our last two hires in my most recent job were 23 and 27 respectively. Sure, they’re getting trained in the new hot cloud stuff... as the grumpy seniors figure it out first and set patterns... but they are still doing daily work with some rather ancient stuff.

spiderjerusalem · on May 16, 2018

I'm not saying there are no young people doing sysadmin. What I'm trying to say is that if the new 'infrastructure' that all sysadmins learn is not an open UNIXy system where you can grok all the internals if you want to, but closed systems owned by 2-3 major cloud players, then we kinda maybe have a problem in 20 years?

Of course, one can argue that that will just cause a new wave of openness and the cycle continues.

tapoxi · on May 16, 2018

As a "young" (30) sysadmin/devops dude I think that open, Unixy system is Kubernetes. I can take an application, dockerize it, write a helm chart and run it anywhere.

The risk is in treating anything as a black box, whether its a managed service or a container you pull from Dockerhub. It's something you'll get burned by eventually and need to learn from experience.

geggam · on May 16, 2018

You setup and manage K8s and all the kernel tunings for the host systems so the NAT layers are optimized ?

karlkatzke · on May 16, 2018

Yeah, and my point was that we start the young people that we hire on the open stuff. Then we move them up and on to the other open stuff, which runs on top of the cloud vendors.

(As almost everyone else points out, the closed cloud vendor stuff is nowhere near flexible enough for most moderately complicated use cases unless you’re running at serious scale.)

This is a false crisis.

swarnie_ · on May 16, 2018

My company seem to split responsibilities based on age. 40/50 year olds deal with oracle Linux, AIX and Solaris. Under 30's hires are more focused on cloud and mobile. We're all expected to have a footing in Windows and Oracle DB.

user5994461 · on May 16, 2018

Everyone will be running on AWS, Google Cloud and Azure, who know how to operate infra that doesn't crash all the time.

bassman9000 · on May 16, 2018

That's why it works.

Touché. Remember, there's value in battletested and proven, e.g. https://www.quora.com/Why-do-satellites-use-old-processors

brightball · on May 16, 2018

Would be interested to get your opinion on Puppet/Ansible/Chef/CFEngine/SaltStack

user5994461 · on May 16, 2018

CFEngine is basic text manipulation, it's not comparable to the rest.

Puppet and Chef was the first generation. I wouldn't recommend. All the companies and people I know using Chef migrated away from it after many disasters. Nowadays, it's only mentioned in interviews to find out if candidates have real world fire fighting experiences.

Ansible is good. Used that for managing hundreds of machines at multiple jobs (some who migrated from Chef). It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.

kazen44 · on May 16, 2018

> Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.

saltstack is a well thoughout solution in my opinion. It makes more logical sense and is less of a mumbled mess then either chef or puppet and has miles better performance then ansible.

I know quite a few shops who use it. Its definitly smaller then ansible.

mattbillenstein · on May 16, 2018

+1 for salt -- I wish it had better docs or examples of how to build out a larger system; it's hard to start with imho even if you know ansible well. The existing docs read like man pages without the helpful examples even.

At the last gig, I wrapped salt deploys with a small Slack bot, so users would fire deploys from Slack; you could see what was going out and who was pushing. It was a very very nice, simple, fast solution that should scale to hundreds of machines easily.

robohoe · on May 16, 2018

SaltStack is around. Lots of big orgs take the time to understand. Ansible is more popular because you can use it with just one playbook. Saltstack requires you to think about your environment and design your configuration management properly.

dijit · on May 16, 2018

I use salt. Multiple thousands of machines. I feel like I've barely scratched the surface of what it can do with it. I wrote some custom utilities for it. Added some functionality to handle physical deployments of an OS with redfish (the new iLO/iDRAC api).

Salt is not without warts but its definitely worth checking out.

kokey · on May 16, 2018

CFengine, at least version 3, was probably the furthest away from string manipulation (and I was given the impression that text file content manipulation was considered a bad idea with it). What killed it was promise theory, which is actually a great theory and works quite well but made writing the bundles painfully hard and also hard to maintain. Also during the early days of v3 it was probably lacking a ton of essential functions so even if you were trying to do things the right way you would bump into feature limitations. I think this put a lot of people off adopting it widely and why Chef and Puppet did so well.

Puppet and Chef is actually quite good and I still prefer it to Ansible for a number of reasons. I've certainly run it fine in environments of many thousands of servers, though I can understand that it can implode for some people at scale if they design their deployments in a certain way or structure their manifests/cookbooks a certain way. That said I've certainly seen Ansible fold on much smaller infrastructure, but that is also down to a number of factors that can be avoided or mitigated. Idempotency with Puppet is really strong which is something you want if not every single system in your environment is ephemeral, with Chef it's almost as good with that but not always with the first run, with Ansible you have to specifically consider and aim for it when writing code for it in your playbooks.

The fact that you get used to having Chef or Puppet run e.g. every half an hour is a good thing, where Ansible runs are more ad-hoc. This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks. They have a lot of things in common, but Puppet is more for managing and ensuring changes in an idempotent, non-conflicting way while Ansible is more for doing something a bit like that but more for ad-hoc or orchestration tasks. I think it's good to use both but also be sure what you use it for, since one can do a bit of what the other thing is good at but doesn't do it so well.

For example, I would consider using Ansible to do deployments and releases, rotate SSH keys, execute failovers, or even to install the Puppet agent for the first time. I would use Puppet to deploy and update monitoring agents and configuration, user access, ensure directory permissions, configure system things like rsyslog, logrotate, Postfix, ntp, etc.

dozzie · on May 16, 2018

> This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks.

That's mainly because Ansible folks advertise it as a configuration management tool, while in fact it's a deployment tool. The former needs asynchronous operations, especially because a node that is supposed to be reconfigured can be temporarily down. The latter needs to be executed synchronously, with reports being read as they come by an operator.

There are several other operation modes that are useful for a sysadmin, like running a predefined procedure with parameters supplied from the client, or running a one-off command everywhere (even on the servers that are currently down, as soon as they are up), but we don't have many tools to cover those cases.

atsaloli · on May 16, 2018

I make my living as a CFEngine consultant. CFEngine runs every 5 minutes (it's lightweight enough to do that). The evolution was: CFEngine 1 ran once a day; CFEngine 2 ran once an hour; CFEngine 3 runs every 5 minutes. Self-healing infrastructure.

emmelaich · on May 17, 2018

The concept of self-healing is a bit weird for me. Surely you want to investigate the cause before it heals?

Funny that we have tools like tripwire which have the opposite idea of the world.

My dream would be to have both functionalities in a single tool.

Bidirectionality! If you solve a problem on one machine you could pull that fix then push the same fix out to other machines as a preventative measure.

Some mix of git/osquery/augeas could do this.

Karrot_Kream · on May 16, 2018

> Ansible is great. Used that for management hundreds of machines at multiple jobs. It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

A lot of folks I know have been bitten by Ansible's performance (Ansible has a central master that runs recipes on each node, rather than having nodes "pull" from a central master).

brightball · on May 16, 2018

Ansible has a pull mode that can be turned on. There are some trade-offs with it from the normal operating model, but it's there when you get large enough to need it.

https://docs.ansible.com/ansible/2.4/ansible-pull.html

Florin_Andrei · on May 16, 2018

Ansible has a very, very low barrier to entry. You go from 0 to 100 in a very short time. It makes a lot of sense to use it when you just begin building your infrastructure.

Later on you can run Ansible Tower, deploy Ansible agents everywhere, and basically use Ansible under the same client/server model like all the other tools.

Salt is eerily similar to Ansible, it's just geared towards client/server. Being experienced with Ansible, it was weird at first to use Salt because everything looked familiar, yet slightly different.

ofrzeta · on May 17, 2018

You can also get Tower's functionality with the free version called AWX, although not as polished. https://github.com/ansible/awx

user5994461 · on May 16, 2018

I believe you are talking about Ansible tower, the paid tool from RedHat that gives a centralized server.

Ansible is not centralized. It configures servers with SSH and can operate from any user or host who has ssh access.

wmf · on May 16, 2018

Yeah, any user or host singular. Execution of a playbook is driven by one machine which can be a bottleneck.

voltagex_ · on May 16, 2018

Yeah but when you run a playbook it's running from a single machine which is calling out via SSH

emmelaich · on May 17, 2018

Not necessarily from a single machine. It's pretty easy to divide your network and control the divisions from git clones f your Ansible files.

Ultimately you could have a git clone for every machine and only ever run it against localhost.

user5994461 · on May 16, 2018

Yes. The host will run 100% CPU to handle the hundreds of SSH connections.

I've been re configuring 300 to 800 hosts many times a day, never had a problem. I think it would take a few thousands hosts for the performance to be noticeably slow and I am really not sure that other tools or systems could take it much better.

virgilp · on May 16, 2018

I know our SREs once screwed the config for sshd, and considered themselves very lucky that they had puppet on the machines and could push a fixed configuration (if they had used exclusively ansible, that'd be the end of it - no way to connect or to deploy new configuration)

[edit] To clarify - ansible is great, and we use it. Just saying that, as everything, it still has (sometimes subtle) downsides in various scenarios. If it works well for you - great, but maybe others really were bitten by it.

e12e · on May 17, 2018

There's nothing stopping you from having a sshd instance dedicated for use just by ansible, on a different port/different network, on every node. Now if that's simpler or more complex I don't know.

But "have two ways in" is a basic principle of sys admin (typically via traditional network and some out of band console access).

twic · on May 17, 2018

When i worked with physical machines, they had embedded management systems, which were on a physically separate network to the machines' main interfaces, ran a little embedded SSH server, and would (amongst other things) give you a console on the machine.

Simpler machines should still have serial consoles, and you can get those on the network via a terminal concentrator or a serial-to-ethernet adaptor.

I would love it if Ansible could control machines over an interface like that, rather than via SSH. Then you wouldn't even need to run SSH on machines which don't need it, which is most of them.

user5994461 · on May 16, 2018

Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Nothing to do with ansible really, except that ansible allows to prevent that easily.

dozzie · on May 16, 2018

> Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Wrong. Teach your sysadmin not to overload a single service with different functions (debugging channel, user-facing shell service, running remote commands, file upload, and config distribution channel), especially not the one that should not be used in batch mode, without human supervision.

When you write an application, you don't put a HTTP server in database connection handling code, but when it comes to server management, suddenly the very same approach is deemed brilliant, because you don't run an agent (which is false, because you do, it's just not a dedicated agent).

voltagex_ · on May 17, 2018

Are you advocating running multiple sshd instances in this case?

dozzie · on May 17, 2018

Good heavens, no! You'd only have two different instances of the same service that is difficult to work correctly with.

For serving as a debugging channel and user-facing shell access, SSH is fine (though I've never seen it managed properly in the presence of nodes being installed and reinstalled all the time). But for everything else (unattended):

* you don't want commands execution, port forwarding, or VPN in your file server

* you don't want remote shell in your daemon that runs parametrized procedures -- but you do want it not to break on quoting the arguments and call results (try passing shell wildcards through SSH)

* you don't want port forwarding and remote shell in config distribution channel; in fact, you want config distribution channel itself to be reconfigured as little as possible, so it should be a totally separate thing that has no other purpose whatsoever

* you don't want to maintain a human-user-like account ($HOME, shell, etc.) for any of the above, since they likely will never see a proper account on the server side; you want each of the services to have a dedicated UID in /etc/passwd, own configuration in /etc/$service, own data directory, and that's it

Each of the tasks above has a daemon that is much better at them than SSH. The only redeeming quality of SSH is that it's there already, but it becomes irrelevant when the server's expected life time gets longer than a few days.

virgilp · on May 17, 2018

Yes, because everybody knows that testing eliminates all bugs.

(it's not that testing is useless - far from it; but I thought the HN crowd knows better than to respond to issues with "that's because you didn't do enough testing!")

random_throw · on May 17, 2018

I'd venture to say you're wrong about Salt. It's being used at some large enterprises. I use it (in one of the large tech companies) on thousands of servers, with plans to up that an order of magnitude or more. Of all of the solutions mentioned, it has been the most powerful, while also being the most scalable.

Other than that, my experiences line up with yours almost exactly.

tapoxi · on May 16, 2018

I love SaltStack, its more of a python framework for managing systems over ZeroMQ than it is pure configuration management. Compared to Ansible it's more complex but faster, reactive, and significantly more flexible. I'd highly recommend it over Ansible for larger environments. For smaller ones, it depends on if the steeper learning curve is worth it.

geggam · on May 17, 2018

Ansible starts getting painful around 1500 nodes.

dba7dba · on May 16, 2018

+1 for Ansible from me.

Of all the tools, I first heard of Puppet first and so I'm assuming it was first on scene? From my limited experience, it seems Puppet is most widely used tool because of that reason. Not necessarily the best of the bunch, but first on the scene. Considering the effort required to roll it out, I am assuming whatever is deployed first will stay as the tool of choice.

I've tried out Puppet, SaltStack, and Ansible, in that order.

What I didn't like about Puppet is that once you deploy a change, the actual change can happen on the "client servers" at any point within next 20 minutes. I may be off on the exact duration but I remember that changes were deployed at any point within that range of time. To me that sounds like not a great idea. What if you want to switch over your web servers at a specific moment? And Puppet requires a dedicated command/control server.

Next I tried SaltStack. I liked it enough. Now that I think about it and hear someone else mention it, yah SaltStack is similar to Ansible. What drove me away from SaltStack was that you essentially need a dedicated command/control server from where all SaltStack commands are sent out to SaltStack "client servers". I did not want to dedicate resource (and money) for a server that is rarely used. And the personal web/lab servers I manage can grow small/large from 2 servers to 10 servers.

Next I tried Ansible. I think Ansible is the perfect choice for me. I only needed to 'devop' just a handful servers and also learn a tool that many businesses seemed to want on resume. So I picked Ansible and it's been great. Some operations are not as flexible as doing it with a shell script (and I assume same issue exists for other tools). But I've had good luck combining Ansible with little bits of shell script to get the result I need.

The best part of Ansible is that any Mac or Linux machine can be used as the "command server", provided that you have the SSH key pair on your Mac or Linux machine.

Lastly, some may not like the ad-hoc way of doing things on Ansible, but I prefer it that way.

gaius · on May 16, 2018

I first heard of Puppet first and so I'm assuming it was first on scene?

CFEngine was first, it's based on a kind of maths called "promise theory" and it solved the problem of you had many different kinds of Unix owned by many different groups and had to have a consistent way of saying "all machines belonging to group X need to have user Y and package Z" and it would abstract away the slightly differing syntax between Solaris, SunOS, IRIX, OSF/1, Ultrix, yadda yadda. This is a problem that doesn't really exist anymore.

Chef I think came next, it was written by people who knew Ruby but didn't know maths so they used CFEngine terminology like "converging" but Chef doesn't really do that, it just runs Ruby scripts. If CFEngine was a scalpel, Chef is a mallet. Chef and Puppet are related somehow, same group of devs had a falling out and went their own ways, something like that. They are much of a muchness.

Ansible is cool because it recognises the reality of why CFEngine isn't relevant nowadays: most organisations are running just one particular Linux distro so you can do away with the abstraction and get all the benefits without the complexity.

dozzie · on May 16, 2018

> it's based on a kind of maths called "promise theory"

Promise theory is not math, despite its name. It doesn't predict anything, it doesn't explain any phenomena. It's an architectural approach. Brilliant, led to a really great software (CFEngine), but it's not "maths".

atsaloli · on May 16, 2018

Basic concepts of promise theory (10 minute video by Mark Burgess who came up with it): https://www.youtube.com/watch?v=2TPsB5WuZgk

2014 introductory article: https://www.linuxjournal.com/content/promise-theory—what-it

Basic book on the subject: https://www.amazon.com/Thinking-Promises-Designing-Systems-C...

It's not "maths" like arithmentic but it's "maths" like graph theory:

Promise Theory, in the context of information science, is a model of voluntary cooperation between individual, autonomous actors or agents who publish their intentions to one another in the form of promises. It is a form of labelled graph theory, describing discrete networks of agents joined by the unilateral promises they make.

https://en.wikipedia.org/wiki/Promise_theory

dozzie · on May 17, 2018

> It's not "maths" like arithmentic but it's "maths" like graph theory

It's less like graph theory and more like inversion of control: an architecture, not a set of theorems and their proofs. Even Burgess' own book you mentioned is nothing like a mathematical handbook.

I'm a great fan of Mark Burgess and his promise theory, but calling it a mathematical theory or a mathematical domain is simply incorrect.

atsaloli · on May 19, 2018

I hear you.

The book I mentioned (Thinking in Promises) is the introductory-level public book.

https://www.amazon.com/Promise-Theory-Principles-Application... is the heavy-duty scientific stuff.

"In Search of Certainty" (https://www.amazon.com/Search-Certainty-Science-Information-...) is somewhere in-between.

I would say promise theory is its own kind of logic and notation. Thanks for the correction @dozzie.

atsaloli · on May 16, 2018

Actually the sequence was CFEngine, Puppet, Chef.

See http://verticalsysadmin.com/blog/relative-origins-of-cfengin...

dozzie · on May 16, 2018

> [...] the actual change can happen on the "client servers" at any point within next 20 minutes. [...] What if you want to switch over your web servers at a specific moment?

You don't. Configuration management is a wrong operation mode for a synchronous change. Still, you could order all your Puppet agents to run their scheduled operation now instead of leaving it waiting for its time.

nunez · on May 17, 2018

Ansible all of the way. Chef and Puppet have too much overhead in comparison. Ansible is agentless. You can either use a centralized server for deployments or you can have every instance configure itself. Also, Ansible is YAML based, which is a strength and a weakness.

Chef is a runner up. Love their community and Chef is pretty straightforward once you learn the lingo.

Puppet doesn’t really work for modern Git development workflows (Hiera and r10k are duct tape) and testing Puppet is kludgey. Also, most of the docs you’ll find for it stopped getting updated in 2015 or so.

busterarm · on May 17, 2018

I've used chef, ansible and saltstack in small startup and large scale enterprise environments.

Ansible is just about the easiest and most flexible thing going, but once you hit "very large scale" you're going to get bit by its performance and start worrying about when you actually update things. Ansible Tower starts to look good then, but it's not the well-walked path and brings you all sorts of other issues about how you distribute secrets to bootstrap things.

Chef is kind of nice when you don't have a lot of environments that you need to manage and about as flexible as you need it to be in those situations.

SaltStack shines when you really have a lot of heavy lifting to do and the Event System, Beacons and Reactors will honestly blow your mind with the complex things you can achieve in a way that's simple to reason about and maintain.

That said, there's really like 3-4 majorly different ways you can (or would want to) use Salt and understanding it and its documentation is a large cognitive investment. You will likely run into major pain at some point down the road if you choose to use it. I would only use it again if I had a really good reason to -- pretty much if there's no other alternative. I would not at all bother using it to try and do typical sysadmin automation tasks.

Strange side-note: The best managed Salt environments I've worked in or looked at were all masterless, whether at small or massive scale. It's my probably-wrong opinion that traditional master/minion SaltStack is always going to cause you enormous problems eventually when you need to either scale out or pivot on something.

mathnode · on May 16, 2018

I am genuinely curious, how much ops is hand written in just plain old bash or simple scripts?

dsr_ · on May 17, 2018

Bash is everywhere but in small quantities. If it's more than a page of bash, it's probably time to rewrite in something with stricter rules and fewer surprises, or better libraries, or both. Perl or Python is quite common at that level.

ofrzeta · on May 17, 2018

The Bash scripts are now contained in Docker files. Much better :)

icebraining · on May 16, 2018

Is puppet one of those tools?

kazen44 · on May 16, 2018

Depends,

heck. I have seem some guys do magic with CFengine and freeBSD.

(freebsd is a nice OS from an ops perspective anyhow, and has a lot more common sense and sane defaults in a lot of places).

fb03 · on May 16, 2018

You are my hero.

auslander · on May 17, 2018

> Modern devops ... million tools ... running joke

So fucking true, bro

zapita · on May 16, 2018

> Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.

Ironically, modern devops and its million broken tools are a primary source of revenue for cloud providers, helping pay for your lunch in the first place.

auslander · on May 17, 2018

You might be on the point, efficient designs are bad for cloud providers. On the other hand, shitty designs that get hacked are bad PR for them.

Thing is, none of it matters. Bottom line matters.

They care to attract company decision makers. Decision makers are engineers in small to medium businesses, and managers in big ones. Sadly, its the big ones that matter for bottom line. So target is mid management, flashy power point presentations and 'conferences' that allow for justified travel and stay.

Good mid management, with tech background, exists, but is a minority.

Not all is lost, truth is out there (c) Mulder :))

dredmorbius · on May 16, 2018

Risk arbitrage.

Present capability vs. future catastrophic risk.

Technology is debt.

zapita · on May 16, 2018

I honestly can't tell whether you're making a sincere but unintelligible argument, or just trolling.

Do you disagree with what I said?

dredmorbius · on May 17, 2018

Yes, re: last.

Tech-as-debt is a notion I'm playing with.

https://mastodon.social/@natecull/99318348047974414

https://plus.google.com/104092656004159577193/posts/jS1K9Mto...

Obvious antecedent: technical debt, Ward Cunningham.

loteck · on May 16, 2018

The clearest explanation of why this happens is at the end:

Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!

1) The merging of devs into the sysadmin role was a product of: the work of sysadmins (particularly systems change control and security compliance) not being valued in our culture.

2) Devs delighted to be free of the shackles placed upon them by sysadmins who were encumbered by the concerns expressed in this article.

If you were a devop who resolved to fix the problems bemoaned in this article, my guess is you would turn around in 60 days to discover you'd become a sysadmin.

agentultra · on May 16, 2018

I recall the idea of "devops" from this book: https://landing.google.com/sre/book.html

The stated goal of putting both systems administrators and software engineers on the same team is to reduce friction and increase communication. One of the worst, productivity-killing situations you can find yourself in when developing network software and services is caused by the traditional "old school" mentality of separating the two camps. When your software developers operate independently of your systems engineers and administrators they're forced to make assumptions about infrastructure, operations, and compliance goals. Both teams have the same goals so why are they not on the same team? I think some "old school" system administrators don't realize how costly such communication mistakes are. Getting 6 months into a development project to be told you cannot have a critical piece of infrastructure _for reasons_ is a costly, costly mistake.

Containers are a smart solution to the build problem. Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.

drbawb · on May 16, 2018

>Don't build your containers from public, un-trusted images!

The author is complaining that you can't build these private trusted images. Software developers have got it in their head that containers are a way to package & distribute software. They're not, that's what the OS's package managers are for. If your software requires Docker as a build dependency, you have failed to properly package your software.

As a concrete example look at Ubiquiti's UNMS.[1] Their package consists of downloading & installing Docker binaries on your system, not tracked by the OS package manager, and then running a bunch of containers built from these public un-trusted images you just told me not to use.

They also conveniently ignore the fact that I already have a Redis server, I already have a PostgreSQL server, I already have an NGinx proxy. (Plus I guarantee my database servers are better tuned for my hardware than some random image from Docker's library.) It is not up to some random software developer where I should be drawing the isolation boundaries on my infrastructure. They also make the big assumption I want to use Docker to manage my containers in the first place. Perhaps my company already uses Solaris LX-branded zones, or LXC, etc.

Now imagine if instead of spinning up a PostgreSQL database container, it used MS SQL as it's database of choice. You think I'm going to let some random developer dictate whether or not I should spin up another SQL Server instance and pay MS for another round of cores / CALs?

Yes - you can build your own containers, and they're fantastic - if software developers properly package that software for ease of installation & configuration. Software developers should not be dictating what container/virtualization framework I use, what configuration management I use, etc.

[1]: https://help.ubnt.com/hc/en-us/articles/115012196527-UNMS-In...

WiseWeasel · on May 16, 2018

There are public trusted images, like the so-called official repositories on Docker Hub [1]. As long as you build your images based on official repo images, you're probably fine. Just don't depend on untrusted images; instead get their dockerfile/config files, and build the images yourself.

To me, a Docker image seems like an ideal way to distribute some proprietary device management web software like Ubiquity UNMS, rather than requiring some obscure version of some database or whatever other dependency actually be installed and maintained by their clients. You can spin that image up on a server or group of servers, or on Amazon ECS or a bunch of other providers in a matter of minutes. With enough motivation, you could even export the image and manage the environment manually.

[1] https://hub.docker.com/_/php/

15155 · on May 16, 2018

It is the ideal way to distribute this kind of software. They just bundled way too much into one container.

zzzeek · on May 16, 2018

This comment makes way more sense to me that the original blog post. Yes, nobody should be relying upon docker as their distribution platform. That's pretty terrible. Ubiquity I've observed seems pretty uncomfortable just supporting the major distros, I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system. That's closed source for ya.

spookthesunset · on May 16, 2018

> I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system.

Why would you want to do that though? Treat the whole thing as a black box running inside docker and be done with it. The second you crack it open, you get to support it. Let Ubiquiti support it, after all that's what you are paying them the big bucks for.

zzzeek · on May 16, 2018

because....they only offered .deb files and I wasnt running debian or ubuntu (nor do I like to bother with it in containers I'm building myself b.c. i have no clue about debian)

the package in question has since finally offered .rpms but i haven't had time / interest in updating it. this is wifi software I'm running personally, ubiquiti only supports the windows/mac versions of it in any case.

15155 · on May 16, 2018

Ubiquiti has always done this, even before containers were "hot."

If you install their rpms or debs for any of their properties, you're almost always getting a copy of Mongo or some other dependent service... and it is probably going to be incompatible with whatever version your package manager has or you're already running (version-constraints-wise, not actual compatibility-wise).

This is an indictment of Ubiquiti, not containers in general. If their software were properly built, they'd be shipping you a docker compose setup or something with N different containers that you could substitute out (at a network level) for your own.

phaedrus · on May 16, 2018

I once worked at a company which separated IT into 3 teams: developers, DB-sysadmin (ops), and QA (who also managed deployments). Releases were supposed to go in a waterfall model from the Dev group -> QA group -> Ops. QA wanted Dev to submit Word documents for each release with blanks to be filled in with server names. However Ops was so distrustful of Dev that it was not enough for them to lock us out of Prod using regular security tools, we were also not allowed to know the NAMES of servers in Prod or how currently deployed systems were grouped.

Every release was an Abbott-and-Costello "Who's on first?" routine. Do you have any idea how hard it is (especially in computing) to ask for something without being able to utter its name?

QA: "You left servername blank on this deployment document."

Dev: "I know; Ops won't tell me. Just ask them for where the service is currently."

QA: "Ops says there's 5 unrelated legacy services with that same project name, on different servers."

Dev: "5? I only knew about 3. You know, if I could query the schemas of the Prod DB, I could tell you in a jiffy which one it is."

Ops: "Pound sand. If you want look at databases that's what the dev DB server is for."

Dev: "Erm, OK well can I give you a listing of the Dev DB schema and you tell me if it looks like the one the Prod service is talking to?"

Ops: "Oh I see you want us to do your job for you now? You can compare the schemas."

Dev: "OK..."

Ops: "Just tell us which DB server you want the schema pulled for."

Dev: "But you won't tell me the server names."

Ops: "No."

My point is this is how bad communication can be when ops and dev are not on the same team.

AlexCoventry · on May 16, 2018

I have trouble imagining the incentives driving Ops in that conversation.

devonkim · on May 16, 2018

Devs hardcoding things in their software in a rush making the software tougher to deploy and operate causing greater incident rates and therefore page-outs. Devs interested in greater resilience and stability in their software should be opting for dependency injection of pretty much every damn thing in the world around them whether it’s a network service or file system location. Otherwise, presume that it can go away at any time. A common pattern among developers trying to save time that costs more in the long run is to hardcode a path to an executable. A simple /use/local/bin/ buried in an infrequent job that is installed on developer machines but never in prod is all it would take to cause an incident in prod that costs the company millions. I say this both as someone that has written this and had to fix others committing the same error in their code and QA passing it along.

Ops tends to be where the brunt of technical debt is truly buried. Bad code is one thing but seeing the code in action with real world data is a different beast altogether.

titanomachy · on May 16, 2018

The easiest way to ensure the stability of the systems that you own is to prevent anyone from changing things on them.

WiseWeasel · on May 16, 2018

Ideally the guarding of valuable company (and incidentally customer) data.

cookiecaper · on May 16, 2018

The thing is that any separation in the roles in ineffective. Things shift around some if you embed an ops guy into the dev team directly, but it doesn't resolve the core problem. This applies to DBAs as well as ops or any other software-side segmentation as well.

The core problem is that there are "ops guys" and "dev guys". That creates conflicting incentives, even within the same team. It creates tension and a dynamic centered around bandying work around so that it's "the other guy's problem" in some situations, and hoarding logic onto the one segment so that there isn't an "obstruction" in getting things done in others. Moving the "segmented" guys directly into your team just makes these politics closer to the heart, which is not always an improvement.

Teams should be comprised of whole-platform "generalists" (in quotes because they really should be good at stuff, whereas "generalist" implies they aren't; here I just mean a competent non-specialist), where any single individual would be comfortable/capable performing any particular task that may come up. Of course, each member will have preferences and habits, little "skews", but it is important that these skews are controlled and used for mutual education, and not allowed to "flandersize" someone from "the guy who knows SQL better than the rest of us" to "full-fledged DBA who hasn't committed any C# for 3 years".

The right axis for separation is hardware v. software. If it's software-related, your dudes should be equally yoked, such that any SQL ticket would be assigned to any member of the team, or any "devops"/sysadmin/deployment ticket assigned to any member of the team.

These systems, from the OS up, are all part of the same thing, and they're all tightly integrated. Making the workload of the individual people on the team also tightly integrated is the only way to make sure that incentives align properly and that the most effective technical decisions are made, instead of decisions motivated, consciously or not, by offloading blame or other political/effort/convenience considerations that cause the overall system to suffer.

If you get into a sticky situation that requires specialized help from someone who has lived and breathed MySQL Server night and day, well, that's what consultants are for. Consultants would also be useful for inspections/sign-offs. But your core team can't tolerate being segmented out by component/implementation detail.

> Containers are a smart solution to the build problem.

Linux "containers" are a variety of things. True OS "containers" don't exist on Linux, but there are some rudimentary approximations. A Docker image is essentially a zip file, and sure, zip file-ish things may work fine for uploading artifacts to systems. Dockerfiles are unequivocally terrible, however.

unethical_ban · on May 16, 2018

I agree with parent, but I think you're taking it too far. I don't think there are enough skilled generalists to pull off your ideal, and I think software/infrastructure is too complex to allow for generalists in the breadth you describe.

I'm a security person who knows pretty good Python and simple database stuff (SQLite). I think I'm in the top 50% (humbly) of my field, probably higher.

But I don't know front-end, containers/CICD, or disrtibuted systems worth a damn.

I do believe parent, which is the idea that teams should have embedded resources. A "VM security team" operating firewalls and infrastructure and policy auditing should not only have security experts, but their own devops group that automates the crap out of everything, using 2018 best practices. Currently, my team's "dev" group is a separate team in another area whose work queue is fed by multiple, distinct teams. It makes learning and understanding our requirements really tough for them.

cookiecaper · on May 16, 2018

Phew, this has been a good exercise. Let me clarify the thesis.

The thesis is NOT that a crew of superhumans can supersede all DBAs, security engineers, and infra people in the world.

It is rather that you can be a great software-side engineer, and that you can skew/focus on a few primary concerns, and develop and maintain a working knowledge in the others, sufficient to service your core project's needs.

Specialists can be called in as spot checkers, auditors, or short-term implementers, but they shouldn't be needed for the day-to-day of building, maintaining, and deploying your software.

In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles. We can learn the varying conventions well enough to be serviceable in all the elements that we send down to hardware -- not necessarily expert, but good enough for day-to-day work.

I'm not saying that everyone on the team should be better than the best DBA guy you've ever met. I'm saying that everyone on your team should be reasonably confident with SQL. Specialists have a place in your friendly local <security/database/whatever> consultancy.

mmt · on May 17, 2018

> In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles.

Interesting that you mention this, since I think it's become something of a self-fulfilling prophecy, especially with giant cloud IAAS providers making one-size-fits-all choices of hardware to sell.

I certainly agree with you that that the basic principles are certainly the same, but that ignores the performance (and, arguably, reliability) possibilities that open up when not limited by the hardware (including network) choices of others.

BurritoAlPastor · on May 16, 2018

> But your core team can't tolerate being segmented out by component/implementation detail.

And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc. And what are junior developers supposed to do?

cookiecaper · on May 16, 2018

> And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc.

You're right that everyone is not going to start out knowing everything. No matter how senior you get, there will always be areas you know better or areas that you prefer, which are the "skews" I referred to in my original comment. When a new framework or technology or whatever is introduced, only one or two people will know it. That's all fine.

Docker is the epitome of the broken segmented model. Devs hate and resent ops telling them they can't do things. Docker promised devs that if you spend a half-hour writing instructions to build an archive that contains your app's file tree and to pull in a completely untrusted OS userland `nice-mans-alpine:4.x.malware-free`, those annoying ops people will get out of your hair, and you can go ahead pulling `bad-actors-handy-4line-totally-safe-lib` from npm to your heart's content. No more complaints about that package not being approved, or the dependencies not installed, or the runtime too slow, ha!

The whole comment thread on the original article is a case in point. Someone who is responsible for the whole software side of real systems will be horrified at the suggestion of such recklessness. However, developers who're only accountable for pushing "at least one commit per day!", and consider security and performance someone else's problem, will be thrilled at the prospect of "tearing it up with some 10x coding" while they silence "the Luddites". (who, sidebar, were too dumb to see the beauty in JavaScript back in the 90s! Pshaw!)

Which dynamic do you want to encourage?

> And what are junior developers supposed to do?

The same thing that everyone else is supposed to do: learn it, gradually, as needed. Read the docs. Seek mentorship from team members who have that "skew" (formalize this process if necessary). Read the changelogs. Read the code. Figure it out!

Many will protest and say it's outside of their comfort zone. Some will protest and say this is inefficient. That may be true in the short-term, but the system will invariably suffer if you do hard segmentation on the software work, because the falsely-separated concerns won't understand each other and end up setting up territories.

People will hate the DBA because they won't understand why he cares about "boring crap" like "normal form". People will hate the sysadmin because they won't understand why he cares about "boring crap" like "not being woken up at 3am". Your front-enders will be more gregarious and have better haircuts, leading to prioritization of front-end concerns.

Essentially, the project becomes driven by blame-shifting, protectionism, and which software-side segment has the more attractive people, because the concerns are fungible enough that any side could potentially handle them. That makes it a political competition. The project is no longer driven by technical prudence or efficiency. It's no longer about the tradeoffs involved in solving the problem at layer X instead of layer Y.

The dividing lines from OS up are arbitrary. We can't all be experts in all of it, but we can all have the expectation that we need a basic grasp over the whole system, by which I mean the WHOLE SYSTEM, and that we should become competent in the major elements used to build it, and patiently nurse this competence over time.

One team member should be able to handle 90% of the tickets that come in independently, whatever elements of the stack are affected (sysadmin, application code, database, frontend, etc.), and when they hit one of the 10% they can't do independently, they should consider it their responsibility to seek mentorship and learn the skills so that after several such rounds, they can do it independently.

The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation. I'm not sure if there are when we've already accepted the segmentation as a fact of life.

mmt · on May 16, 2018

>The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation.

That strikes me as merely wishful thinking. It's not as if there isn't already research on human cognitive abilities in general.

Do you have any scientific basis for thinking engineers are merely being held back by our acceptance of specialization, rather than by inherent cognitive limitations?

cookiecaper · on May 17, 2018

Once the downvotes start coming in, people read comments uncharitably, and the thread gets lost, but to be clear, I'm not advocating for anything that is beyond the cognitive capacity of typical software developers.

One and two-man startups provide ample evidence that working knowledge of the whole platform is not beyond human cognitive scope, even if getting this to be accepted at large requires some extra cultural encouragement and support, and some professional management of individual "skewing".

Once more, it's not that everyone has to be a hardcore expert in everything all at once. You don't want them to be.

You just want your main people to know each platform component well enough to be able to make a reasoned decision about the trade-offs involved in using one or the other for a specific task, and then to be able to own that decision as a group.

If they can't or won't do that, the platform decisions become political instead of technical. I've seen this over and over again, where massive technical problems are routed around because the Java developers have been told they can't touch Ruby, or the C# developers have been told they can't touch SQL, and the real problem never gets fixed, because we only recognize naive, scared "specialists" who insist that they can't learn Python because they're just a PHP developer, so they can't look at that piece of Python that's holding up the thing, instead of rounded, capable "generalists" who can be trusted to call in help when they're getting in over their heads, and may take an occasional "inspection" or two to make sure they're aligned with best practices.

General contractors are not electricians, but they can do a lot of routine work that involves electrical fixtures, sockets, and outlets. You call the electrician for the face-melting stuff.

General practitioner MDs are not dermatologists, but they can do a lot of work that involves routine skin disorders. They'll prescribe creams for fungal infections, rashes, acne, etc. They'll let you know you need to call in a dermatologist for the "skin-melting" stuff.

In software, we don't say "call the DBA for the database-melting stuff." We say "the DBA will write all of the SQL for you." It just doesn't seem to comport to me.

mmt · on May 17, 2018

I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.

However, I still disagree with your premise that it's merely our attitude at large somehow holding people back. Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing. I'm also not convinced that, even among founders, they're as broad generalists as you're suggesting.

You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.

To reiterate my point about cognitive capacity, if true specialists are desirable, then I allege asking them to be more of a generalist makes them a less competent specialist and therefore less valuable on the market. That's an alternate explanation for extremity of specialization than preconceived notions.

Now, personally, I share your desire for greater breadth of knowledge among all technical professionals, if for no other reason than they might have a greater appreciation for my own specialization. I just don't think it's realistic.

cookiecaper · on May 17, 2018

> I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.

No need, it wasn't really meant to be directed toward your comment specifically. I just referenced that negative misinterpretations are inferred when the comment is grey as a way to remind people that it's not likely someone would advocate such caricatures.

> Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing.

You're right, and I thought of this when I used that example. But by the same token, we can take it out a level further: professional software developers have already shown themselves as having higher-than-average cognitive abilities, because the truth is that the average human doesn't have the cognitive capacity to become a professional software developer. If they did, we'd all be paid much worse.

How far off are founders from professional software engineers? How far off are professional software engineers from the median of adults? How much additional cognitive load is required to be operational in a handful of extra platform components, especially if all those components run the same type of hardware? All good questions that I don't think either of us have ready answers for.

The other thing is that even if this is out of reach for the "average developer", it wouldn't mean that it's not an ideal to strive toward, or necessarily even unrealistic in all cases.

> You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.

Specialists should exist -- as external reference points in consulting groups.

If you want your life's mission to be building SQL queries, join a database consultancy and deal only with the SQL problems that your clients couldn't figure out on their own and decided they needed to pay $$$ to solve. If SQL and database design is truly your passion, you'll be much happier this way than you would be as a staff DBA redesigning the same rote EAV schema for Generic Business App #29, working slavishly to finish the code for that report that Important Boss #14 needs on his desk ASAP.

Creating a referral-style economy creates a lot more room in the marketplace for specialist consulting groups and gives more specialists greater reward (monetary and emotional). It simultaneously allows "generalists" to stay focused on the big picture of building and maintaining a robust and prudent system overall.

I think it's worthwhile to consider how generalist v. specialist operates in other knowledge fields, and what lessons we can take from that.

I am confident that a generalist ethos is for the best, but I'm not sure we'll get there without better cultural underpinnings, so I'm not making these statements purely out of self-righteousness (maybe only like 80% ;) ).

This dialogue has already been informative and has helped me refine my ideas and hopefully learn to present them somewhat better. Thanks! :)

pvg · on May 16, 2018

The thing is that any separation in the roles in ineffective. [...] The right axis for separation is hardware v. software.

Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.

cookiecaper · on May 16, 2018

> Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.

Hardly arbitrary -- hardware is fixed at the time of manufacture. Hardware engineers should be well-acquainted with software concerns and needs, but the years-long feedback cycle and real expenses associated with hardware development creates a natural barrier for work separation, requires different work cadence and much more stringent processes, etc.

This is not to say that a good hardware engineer shouldn't contribute to software and vice-versa, but it is to say that the roles are sufficiently divergent that it makes sense to place them in different segments. That is not the case with anything this side of the operating system, as far as I'm concerned.

pvg · on May 16, 2018

It's arbitrary when you claim there are no sensible divisions in software. I think your entire lengthy argument is a sort of elaborate fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be. It's fun but not a particularly realistic or constructive way to look at the world.

cookiecaper · on May 16, 2018

> It's arbitrary when you claim there are no sensible divisions in software.

It's about the fungibility of the problem space. I don't know how you expect your core team to make reasonable decisions about the tradeoffs if they a) don't understand more than one of the platform elements; and/or b) don't have any responsibility or accountability for the tradeoffs that get made, because now it's another segment's problem. Indeed, when I've been on teams primarily comprised of non-generalists, these decisions were almost always a matter of bureaucracy and politics.

> I think your entire lengthy argument is a sort of elaborate, lengthy fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be.

I've worked on teams that were mostly "generalist" and teams where the "generalist" type was either absent or artificially constrained. My perspectives are drawn from those experiences, and have developed based on a hard-earned worldview that says people reliably act in favor of their own expedience. Doesn't seem very fantastic to me. ¯\_(ツ)_/¯

pvg · on May 16, 2018

I don't know how you expect your core team to make reasonable decisions [...]

That's how most everything is made, not just software. In the case of software, Fred Brooks added an essay titled "Parnas was right, and I was wrong" in the 20th anniversary edition of The Mythical Man Month about this topic. Itself published over 20 years ago.

notyourday · on May 16, 2018

> Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.

You have just lost all the speed to production advantages of containers.

mst · on May 16, 2018

"speed to production" is not meant to be the primary advantage of containers.

"knowing exactly what you're running and being able to reproduce it" is meant to be the primary advantage of containers.

What you're basically saying is "if your container system admins do their job properly rather than throwing security and reliability out of the window, it can take a bit longer than not bothering". This is trivially true, but not really the point agentultra was making.

takeda · on May 16, 2018

If that's the reason, then this is broken from the start.

BTW the more I learn and use nix the more I see that this is the proper solution to what docker is currently marketed to fix.

lima · on May 16, 2018

That's how I always did it (building containers ourselves), and once the pipeline is in place, it's barely more work than pulling public images.

Speed of production advantages are absolutely not due to pulling untrusted containers. If anything, it makes your life harder.

Hard to imagine any serious production setup not doing this... In most cases, you need to modify the containers anyway to suit your needs, and how else are you going to rebuild them all when the next OpenSSL update comes out?

smrtinsert · on May 16, 2018

Not true at all. One time setup cost, in house knowledge, secure tools, optimized Dev workflow etc

aleph- · on May 16, 2018

Have you really? Building a base container to base all further images off of takes about a half hour with our build system. Fuether app builds are down to 10 minutes at a max and can honestly still be optimized. How exactly are you losing all the speed advantages?

bluGill · on May 16, 2018

No, you still get that. You are mistaking initial speed to production (longer with containers) for amortized speed to production when you scale.

commandlinefan · on May 16, 2018

> shackles placed upon them by sysadmins

Well, potentially unpopular opinion here, but an awful lot of sysadmins brought their looming obsolesence on themselves. I'm an app (as in "a program that runs on a computer", not an iOS add-on) developer, always have been. I get requirements from the business types, code it up in vi or Eclipse or whatever, get it working, and then they (the business) want to deploy the working app out to production so people can use it and the business can make money off of it. And, for decades, sysadmins have been a brick wall of pure hostility. They're not all like this, but a lot more are than aren't. Like, I get it - you're overworked and the demands on you are unreasonable. Yeah, me too. But I just work here, man. You're right, I don't know how to do your job, that's why I sent you an e-mail asking you what steps are needed to deploy an app into production since it's not documented anywhere. But rather than just tell me what you need so I can go gather that up, you're going to unload on me because you feel overworked and unappreciated, but you're sure as hell not going to unload on a manager or somebody with actual power, you're going to take it out on the developers who have no pull or voice.

mmt · on May 17, 2018

Actually, as a sysadmin, I sympathize with you, since I consider that kind of situation to be a sign of, essentially, bad system administration. It also sounds like it might be at a larger company.

Personally, I've always considered it a significant part of my job to make developers' jobs easier, especially with something like deployments and dependencies.

As such, I disagree that we've brought our own "obsolesence" on ourselves, but I do agree that those of use who have perhaps forgotten that ours is a service profession have hastened its demise.

commandlinefan · on May 17, 2018

Like I said - it’s not all of them, but a lot of them. And yes, this is endemic in big companies, not in startups.

mmt · on May 17, 2018

Sadly, this attitude among my fellow sysadmins is one of the reasons I avoid larger companies.

auslander · on May 17, 2018

> e-mail asking you what steps are needed to deploy an app into production since it's not documented anywhere

Two points, immediately:

1) Can I autoscale the app, as in what is the data/files persistence model?

2) Did you write Readme.md with a) build steps b) networking requirements c) data sources d) authentication model e) database ACID requirements ?

sudosteph · on May 16, 2018

I feel like there has always been a contingent of sysadmin / ops folks who preferred the "Better to ask for forgiveness than permission" model. They still hate when things break, (so not quite fans of developers with a "move fast and break things" philosophy) but they care more about big picture improvements and ease of upkeep than enforcing any particular process. Detecting problems and being able to roll back is typically more valuable than preventing mistakes in many cases. It may be somewhat driven by laziness, but it actually works out pretty well for collaborating with the fast-moving developer types. It also does depend on being in an environment that is tolerant to occasional mistakes or outages.

It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.

toomuchtodo · on May 16, 2018

> It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.

Working for profitable businesses where stability is valued over velocity.

rconti · on May 16, 2018

It's rough because, like many backend type jobs, the best thing that can happen is nothing breaks. Incremental improvements in stability or scalability will not be noticed, but every single change you make is a massive risk of a page at 2am, all-nighters trying to fix things, outage reports, incident reports, root cause analysis reports, etc. You're stuck between process and outcome.

You have to constantly fight the urge to just never touch anything.

mr_tristan · on May 16, 2018

This is definitely a rant that obscures the underlying point: the introduction of _untrusted_ or _unreliable_ network resources, frequently hidden in a string of dependencies.

I'm baffled how often I see an someone throw this sort of craziness - "go fetch this thing from some random third party" - into very important places, such as the startup procedures of a container. It's something I see in a culture of the two person startup just trying to get something out the door. It's definitely "technical debt", and frequently, it won't get removed. Thus, you try to scale up to meet load, and all these new instances go time out on the same external resource that's randomly having problems... boom! At the worst possible time. Never mind the potential huge security gaps.

But the specific _tools_ aren't the issue here. It's the culture of "ship something now we'll deal with fallout later". A lot of people start using Docker and won't ever look at the Dockerfile, or, will add a Maven dependency and won't even check licenses or security updates for _any_ of the transient dependencies.

Cloud technologies and containerization make everyone just think "we can do things so fast now" and never, ever pay attention to details that can come back to bite you.

On the flip side, it's a good time to be in cybersecurity; because this cultural problem will never, ever, get solved. :)

borplk · on May 16, 2018

At the end of the day it comes down to the fact that businesses just simply don't care (Equifax etc).

They like the idea of security and that's where it ends.

In many places if you try to "do things right" you will get fired in two months for being too slow/strict and they will happily replace you with a clueless easily trusting person who "goes and fetches things from random third parties".

Many times they get lucky enough to survive and they don't appreciate the risks that they took. That pace becomes the expected norm and sets the theme in the industry.

And when shit hits the fan the PR person writes a "we are oh so very sorry .. security is totally our number one priority" blog post. They blame and fire the poor bastard and replace him with another warm body.

When it comes to these "hidden" things like security companies do not reward and also punish "doing things right" so on average and over the long term we end up where we are today.

When the culture sufficiently shifts towards being sloppy you will get hammered down quick if you try to be the voice of reason because it ends up being you vs everyone else (the norm).

commandlinefan · on May 16, 2018

> just simply don't care (Equifax etc).

And, honestly, why should they? Security breaches have yet to hurt an actual company (they hurt users plenty, but not the organization that's actually responsible).

mr_tristan · on May 16, 2018

Data breaches are climbing in cost to organizations. Here's a claim, that in 2017, the average breach costs $3.62 million. https://www.scrypt.com/blog/average-cost-data-breach-2017-3-...

(I've seen similar claims in different ranges. Costs of breaches in the US are pretty high - over $7 million.)

Even Equifax probably wants the 3-4 billion in valuation it's lost since the breach.

The solution appears to be buying tools to avoid and respond to breaches quickly, instead of engaging and building in security awareness. (Microsoft's security development lifecycle comes to mind.)

IMO, both approaches are likely cost effective, though I have no numbers or research to back that up.

alfalfasprout · on May 16, 2018

You know, I used to agree with you. But the reality is you have to weigh the massive productivity boosts that things like docker bring to the table vs. the potential issues it can bring. To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.

kazen44 · on May 16, 2018

> To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.

this is a very naive way of setting up a secure production enviroment.

Your perimeter security is worthless if you are loading non public images which have malware or even worse, unknown malicious code in them.

having a data breach or hack on your hands is something which could kill the company. That risk is not worth having a slightly faster productivity boost because you or your ops team is not able or willing to build a proper private repository setup.

dvfjsdhgfv · on May 16, 2018

Yes, it's definitely not something related only to Docker, it's today's culture of trusting all possible code only because someone placed it on Github.

AlexCoventry · on May 16, 2018

There's so much leverage in that code, though. It's clear how the culture evolved.

rdsubhas · on May 16, 2018

As a "major theme", the author takes:

> Consider for example Hadoop. Nobody seems to know how to build Hadoop from scratch. It’s an incredible mess of dependencies, version requirements and build tools.

And as the major introduction to the blog post:

> I’m not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.

Huh? Old-school sysadmins know how to keep systems running, manage updates and upgrades. At the same time nobody knows how to build Hadoop from scratch. At the same time, Hadoop build instructions themselves have curl|sh scripts or mirrors and the wiki page is outdated. And it uses Java (and thus maven/ivy). And that downloads the internet.

According to the blog, Hadoop, maven/ivy/sbt/any dependency manager, package managers, and everything is broken. But the tagline is:

> This rant is about containers, prebuilt VMs

What does any of this have to do with the "Age of containers" and pre-built VMs? Is the author just talking about Gentoo/LFS-style "compile the whole system from scratch"?

This feels like an incredibly rushed rant. I can only envision the author requiring to setup hadoop for the first time, breaking their head for a few days (it happens), and taking it out on everything.

emilsedgh · on May 16, 2018

I think the logic is, if we didn't rely on Containers and prebuilt VM's, Hadoop had to be easier to build to be useful.

badloginagain · on May 16, 2018

The point everyone seems to be missing, and the one I think most important, is that we're no longer building from trusted sources.

Build systems just download and run random code from the internet without verifying that its the correct code, from the correct source.

Its a ticking time bomb.

dullgiulio · on May 16, 2018

There is SSL/TLS, unless it's done wrong (invalid certificates get ignored by the dependencies manager), it's safer than the old "md5 of the file" systems.

Now, some dependencies are fraudolent (especially true in the Javascript world because it eventually targets a lot of user browsers), but nobody ever checked the sources anyway...

cesarb · on May 16, 2018

TLS only verifies that have connected to the correct server. It can't verify whether the package on the server has been replaced by a malicious one. For that, you need a "md5 of the file" (these days, a sha256, because md5 has long been broken).

dullgiulio · on May 17, 2018

You need to make sure the hash is also not tampered, both on server and in flight to the user. How do you do that?

If the answer is: use TLS, there is no point in having the file hash at all.

dijit · on May 17, 2018

No, the answer is to use PGP and a manifest hash.

This is how package managers work. TLS doesn't replace those.

mcguire · on May 16, 2018

Which isn't really true; as a sysadmin (I'd say "former", but once you're a sysadmin, you're always a sysadmin), I've seen lots of things with horrible build and dependency nightmares, and that was before package managers, containers, and virtual machine images became de rigeuer.

Think of a self-hosting programming language: you can't build it without a running installation of a previous version. (Anyone remembering "On Trusting Trust" at this point?) Or any application in an image-based language like Smalltalk. Development becomes path-dependent. It's inevitable to get into a situation where A and B cannot be made to work together, except in a derivative of a version that someone, somewhere made while holding their mouth the right way.

Pre-built containers and VMs are an admission that path-dependence is the way stuff is supposed to be.

icebraining · on May 16, 2018

I think that is the author's logic. Except it's not very logic, since Hadoop (or Bigtop) doesn't use either.

mcguire · on May 16, 2018

Picture this: you need to use Hadoop. Do you:

A) work through building it yourself, or

B) get a container that claims to have a running Hadoop and hope it works for you?

If B wasn't on the table, what would happen?

icebraining · on May 16, 2018

If I need to use Hadoop, I'll download one of the pre-built binaries that they offer on their site.

You'll notice that the Debian Wiki users have given up on building it since 2010. That was three years before Docker even appeared. Almost nobody was using containers back then.

icedchai · on May 16, 2018

Hadoop isn't even that difficult to set up. I've built it from source, and installed it from binaries.

Containers are totally unnecessary here, just as they are for most java apps.

rdsubhas · on May 16, 2018

That's like saying, "If we didn't invent the internet, we would have never had privacy issues". OK, so if we didn't rely on containers - would hadoop have had a perfect set of packages for every distribution? Let's say that the packages for Arch linux were broken. What next?

That's the whole problem with the article. It takes a problem (building Hadoop was bad), correlates it to a completely different tool (because we have docker, hadoop build scripts are bad), and goes on to rant about everything else.

commandlinefan · on May 16, 2018

I've seen this brewing for a while, and getting worse and worse. Back in the 80's and 90's, there were developers who would code their own sorting or hashing routines rather than linking in some external library to handle this "solved" problem. The perjorative term "Not Invented Here" (NIH) grew to describe those developers and they were shamed into reusing code whenever there was code to reuse. And in some cases (like sort routines), it makes perfect sense. However, NIH accusations have grown to "if there's something vaguely similar to what you're writing, you must use it, even if that involves more custom coding to artificially bend it to the case at hand than you would have developed in the first place", culminating in things like the completely empty, useless (but enormous) Spring "framework" or, to a lesser extent, things like Angular that sort of do some things, but create far more problems than they solve (and definitely add more development overhead than they remove).

ironjunkie · on May 16, 2018

Interesting take on the NIH term. I thought this was more an ego thing for the big tech companies. They love to reinvent existing things to look like geniuses

commandlinefan · on May 16, 2018

I think you're correct on the origin of the term, and I should have mentioned that - but in the past few decades, I've been accused of "NIH"-ing whenever I've "rolled something" of my own from authentication to IoC. Just because there's a library that has a particular description attached to it doesn't mean that it should be used as often as it can be.

AlexCoventry · on May 16, 2018

The author has added an update to the bottom of the post which I think makes his main intended message clearer:

Update: it was pointed out that this started way before Docker: »Docker is the new ‘curl | sudo bash‘«. That’s right, but it’s now pretty much mainstream to download and run untrusted software in your “datacenter”. That is bad, really bad. Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!

carrja99 · on May 16, 2018

It's a rant, my inclination is that it has to do with a very specific situation the author is facing at work.

djsumdog · on May 16, 2018

The author should have grabbed the .sdeb or the debian build scripts and tore them apart if they really wanted to make a point (if, upon examining the build, there was one to make).

I mean there is a lot of cognitive load/disconnect we're talk about. As an ops guy, I can't look into every package. That's why I trust the package manager (apt-get, yum, whatever) and all the build maintainers who either volunteer or work on for Redhat/Canonical/SuSE/IBM/whoever.

Things get through. That's why we have all those security people out there who are digging around for bug bounties and find crap like the recent Ubuntu Snap package craziness.

Docker containers can be good. You can use an official Ubuntu or Alpine image, build your base, and create scripts to make sure your base containers don't go out of date. Most people don't do that. The official Docker containers are kinda a mess, but at least they're maintained. Grabbing some random container off Dockerhub? Yea that's not going to end well; unless you just use their source to build your own. Or if it's a container continually maintained but the person/company who wrote the service.

Docker containers do need better security introspection and that's going to be a big deal going forward. But this article is all rant and some, but not enough, substance.

mcguire · on May 16, 2018

"Docker containers do need better security introspection and that's going to be a big deal going forward."

Exactly!

And npm. And maven. And every damn package system for every damn programming language since package systems are now a requirement.

smaddox · on May 16, 2018

Yes, but shouldn't you have separate "build" and "deploy" container images? You should "build" a particular version once, "deploy" the result into a test environment, test it thoroughly, and then "deploy" to production, right?

This is not my job (yet). Please tell me if I'm wrong, because I'll need to do it in the next few months.

mkirklions · on May 16, 2018

Can I complain about pre-built VMs?

I swear by the time I figure out my path problems I have 4 folders named /code/

And my working file system is /code/code/index.php

So when I run node.js (to compile my .scss) it messes up the folder pathing and ugh... why cant we do everything on production?

/rant

pc86 · on May 16, 2018

> None of these "fancy" tools still builds by a traditional make command.

Is there anything more "get-off-my-lawn" than "These tools don't use the thing I like!"

megaman22 · on May 16, 2018

But I just don't understand why we have to have 47 half-built over-complicated build systems or job runners or whatever the new fad term is for every language, when there's something that does what they all do, is battle-tested, and has been around for decades.

Everyone repeat after me. Makefiles are not scary. I can write a shell script. Do I really need to learn grunt/gulp/webpack/npm/rake/fake/maven/gradle/ant and on and on and on?

Probably somebody has released another one in the time it's taken me to write this comment.

eropple · on May 16, 2018

Makefiles aren't scary. But they're also not particularly good.

I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever). Shell plumbing is fine for informal and small-scale stuff, and I make my code conform if somebody down the line (who may be me) wants to get out their duct tape, but the world is more complex than what /bin/sh can see. Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better. The profusion of tools happens because they iterate on each other to be better. If old tools were sufficient, people would use them because learning new ones is hard.

So, yes, you do need to learn those tools. Or invent a shell that isn't tooth-pullingly difficult to use with a JSON file (and do not say `jq`, I love `jq` as an inspector but it does not step to `JSON.parse` and a working subscript operator). Or change `make` so that a git checkout won't trigger a full rebuild. Lots of baseline, stump-simple things that `make` is just not going to do for you because it's built for a frankly outmoded method of development.

colanderman · on May 16, 2018

> I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever).

You can use Ruby in Make. I don't know Ruby but here's some Python:

  SHELL = python
  .SHELLFLAGS = -c
  .ONESHELL:
  .DELETE_ON_ERROR:

  foo.txt: bar.json
      import json
      with open('$<') as input, open('$@', 'w') as output:
        output.write(json.load(input)['the_text'])

(Caveat: I've never actually tried this at scale.)

You can even use different languages for different recipes with per-recipe variable settings.

eropple · on May 16, 2018

Ha, that's a neat trick! Trouble is, for either Python or Ruby it becomes tricky due to stuff like dependency management. You'll have to `bundle exec make` to get sane library paths for Ruby or `pipenv run` for Python, etcetera etcetera.

At that point I think you might as well just use a language-native one.

Great pull, though.

avar · on May 16, 2018

The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.

I think you're (understandably) misinformed about what Makefiles do because you've run into some bad ones. The thing they're doing is managing a N-level deep dependency tree in a declarative way. So if A->B->C you can run something to generate C, then B can run, and finally A, and this can all be done in parallel for hundreds of files.

On the individual rule level this is really simple, e.g. just turning a .c file into a .o file, then finally some rule that depends on all *.o being generated creates a program out of them.

The language-native ones are usually much worse. They're easier to use at the outset because they don't make you create this dependency graph, but that also means that they can't run in parallel on your N cores, and they usually suck at incrementally updating the build.

colanderman · on May 16, 2018

> The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.

I'm not sure what you mean? How would this allow you to use, say, Python as the language for recipes? Just having Make drop straight into Python kind of defeats the purpose of Make.

avar · on May 16, 2018

You'd use Python as the language for the recipe that turns (in this example) a given .c file into a .o file, while leaving the Makefile to do what it's good at, declaring the DAG dependency tree needed to incrementally build it.

The point is that people conflate these two things. They open some random Makefile and see that it's mostly doing stuff with shellscripts, and think "oh I should do this all in Python", and then write some monstrosity that doesn't make a dependency DAG and thus can't run in parallel or be easily understood.

Instead they should have split the actual logic they found in shellscripts in to Python scripts the Makefile invokes.

colanderman · on May 16, 2018

Nevermind, I misread you. I missed "rule" where you said "beginning of every Makefile rule". (I thought you were suggesting just having the default rule run some enormous Python script, which I've unfortunately seen before.)

Karrot_Kream · on May 16, 2018

> Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable

Is it that hard to learn shell? Why is it so painful? What makes it the "lowest common denominator"? I use it all the time, but I admit at work I am one of the few.

> Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better.

This is too vague. What makes a Ruby based build or JS based build a more "refined" tool? It sounds like familiarity is the real issue here.

> If old tools were sufficient, people would use them because learning new ones is hard.

How many people even know Makefiles these days anyway? The "modern" approach seems to be, learn a programming language and then try to do everything inside of it. Some languages are more interested in this cloistered philosophy than others (like JS).

If anything, I think the reason these build tools keep being proliferated is because nobody wants to learn anything more than the bare minimum to "be productive" (which, depending on what you're working on, can be anything from pushing out customer demos for a company that will never sell, to microservices operating at scale). Learning a language and never leaving its paradigms/comforts is easy.

adamc · on May 16, 2018

Your second paragraph got to the heart of it. If we want to use some standard build toolchain, it needs to use a nice language and not feel obscure. I was explaining to someone a bash script I wrote, and he said "why not use Python". There were reasons but... he was right, Python would be much easier to use and maintain, and we have a lot more developers who know it.

That said, Maven is incredibly suck-tastic.

eropple · on May 16, 2018

Eh. It's not my favorite thing out there, but Maven's fine for what it is. It's designed for and explicitly for well-behaved Java artifacts. If your Java artifacts are not well-behaved, you're going to have a bad time--in my experience, most of those cases are doing things you probably shouldn't be doing.

(You may be a wizard and have a reason to do them, for sure--but that's what writing Maven plugins is for. Or not using Maven. You've got choices.)

clhodapp · on May 16, 2018

Given the limitations of the platform, there really isn't a such a thing as a well-behaved JVM library that depends on other libraries, unfortunately. Oracle really dropped the ball by only serving their own needs with the module system.

eropple · on May 16, 2018

Can you expand on this? Having done a pretty decent bit of JVM development, I've never really run into issues even doing some not-out-of-the-box stuff.

weaksauce · on May 16, 2018

What reasons? I did some ruby shell automation and it’s dead simple to get it working and working correctly. I imagine python has a similar story.

mst · on May 16, 2018

> `JSON.parse` and a working subscript operator

https://github.com/ingydotnet/json-bash

    source json.bash
    
    json='{"name":"Jason","friends":["Jimmy","Joe"]}'
    JSON.load "$json"
    joe=$(JSON.get /friends/1)
    JSON.put /friends/2 Jeff
    new_json=$(JSON.dump)

It's still not as trivial as javascript, but "tooth-pullingly difficult" is a little unfair.