Hacker News new | past | comments | ask | show | jobs | submit login
The sad state of sysadmin in the age of containers (vitavonni.de)
970 points by Spakman on April 22, 2015 | hide | past | favorite | 443 comments



This bothers me as well. Even tasks as simple as adding a repository are now being "improved" with a curl | sudo bash style setup[1].

However, installing from source with make was (and remains) a mess. It may work if you're dedicated to maintaining one application and (part of) its stack. But even then it usually leads to out of date software and tracking versions by hand.

Many people have this weird aversion to doing basic sysadmin stuff with Linux. What makes it weird is that it's really simple. Often easier than figuring out another deploy system.

(The neckbeard in me blames the popularity of OSX on dev machines.)

[1] https://nodesource.com/blog/nodejs-v012-iojs-and-the-nodesou...


I agree that the "just curl this into bash" instructions are nightmare - on any platform.

I think a lot of this is a result of what I like to call the "Kumbaya approach to project/team management":

This is where you have a team (either for a single project or a team at a consulting agency, etc) that is effectively all development-focused staff, possibly with some who dabble in Infrastructure/Ops. In this environment, when a decision about something like "how do we get a reliable build of X for our production server deployment system" needs to be made or a system needs to be supported, no idea is "bad", because no one has the experience or confidence to be able to say "that's a stupid idea, we are not making `curl http://bit.ly/foo | sudo bash` the first line of a deployment script"[1]

[1] yes this is an exaggeration, but there are some simply shocking things happening in real environments, that are not far off that mark.

Edit: to make it absolutely clear about what I was referring to with [1]:

The specific point I was making was running something they don't even see (how many people would actually look at the script before piping it to bash/sh ?) from a non-encrypted source, and relying on a redirection service that could remove/change your short url at any time.

Unfortunately I was stupid enough to ddg it (duckduckgo it, as opposed to google it) and apparently this exact use-case was previously the recommended way of installing RVM[2]

[2] http://stackoverflow.com/questions/5421800/rvm-system-wide-i...

-


I think part of this is because there aren't any trusted, fully open source, artifact repositories that work with the various package indices out there.

Like, most of the way deployment should work is that you come up with some collection of packages that need to be installed and you iterate through and install them. Bob's your uncle.

Thing is, all the packages you need live out in the wild internet. Ideally, you'd just be able to take a package, vet it, and put it in your local artifact store and then when your production deployment system (using apt or yum or pip or gems or maven or whatever) needs a package, it looks at your local artifact store and grabs it and goes about its business. Never knowing or touching the outside world.

And your developers would all write their apps to deploy through the normal packaging methods that everyone and their mother is already familiar with and they could just put them into the existing package index as well.

But you've gotta lay out pretty serious moola (from when I last looked into available solutions to this) or set up a half dozen different artifact stores if you want to do things that way. And good luck managing your cached and private artifacts if you do. And on top of that developers don't necessarily know how to set up a PyPi or a RPM index or whatever so that the storage is reliable and you've got the right security settings or whatever else. (I know I sure don't and I'm not really interested in reading all of the ones I'd end up needing).


"And on top of that developers don't necessarily know how to set up a PyPi or a RPM index or whatever so that the storage is reliable and you've got the right security settings or whatever else. (I know I sure don't and I'm not really "

Setting up RPM is shockingly easy. It can get more complex, but the basic system is:

  REPOBASE=/srv/www/htdocs/
  createrepo -v  $REPOBASE
  gpg -a --detach-sign --default-key "Sign Repo" $REPOBASE/repodata/repomd.xml
  gpg -a --export "Sign Repo" > $REPOBASE/repodata/repomd.xml.key
That will create a repo from all the .rpm files in the REPOBASE. Also you will of course need a GPG key pair, but that can be generated with `gpg --gen-key` where you give it a description of "Sign Repo" (or change the above commands to the key description you used).

Then you get to decide on the deployment machine if you want to trust the repo (or you don't trust any and import the key via some other process, aka direct gpg import).

Of course you can find a bunch of more detailed explanations with $SEARCHENGINE, but if it takes more than a day to figure it out, your doing something wrong.

Building a set of RPM's isn't that much harder if you have a proper build system. But these are the kinds of things you give up when you decide to grab the latest immature hotness created by someone on their day off.


With docker, as referenced in TFA... you can simply vet a base image, and use that for your application... upgrades? create a new/updated base image and test/deploy against that.


And how do you "simply vet a base image"?


Same as everything: look at how it was built. Many of the images are built by CI systems according to Dockerfiles and scripts maintained in public GitHub repos. Audit those, then use them yourself if you're worried about the integrity of the services and systems between the code and the repository.


> Unfortunately I was stupid enough to ddg it (duckduckgo it, as opposed to google it) and apparently this exact use-case was previously the recommended way of installing RVM[2]

Not only "previously", it's the current recommended way to install rvm. From their front page:

>> curl -sSL https://get.rvm.io | bash -s stable

[1] http://rvm.io/


It's also (one of the) recommended ways to do it for Docker[1]. I've noticed a few blog posts that touch on "here's how to use Docker for X" suggest piping it straight into `sudo sh` without so much as looking at what's going to be run first. Sigh.

[1] http://get.docker.io/


Oh I agree there are problems still, but its an improvement over the previous - it's using HTTPS and it's calling the RVM domain - before it was plain HTTP to bit.ly


Also the installer is now signed via GPG.

https://rvm.io/rvm/security


But then there's a circular dependency because the GPG key is retrieved by the bash script that is wget'd.


It doesn't have to be circular. The script is secured by HTTPS (and hopefully has the key embedded in the script itself?) which can then retrieve the installer and verify it using the key.


The problem is that in this scenario, the GPG key and signature serves no practical purpose.

The whole security, whether GPG is invoked or not, relies on the security of the HTTPS connection alone.

If the HTTPS cannot be trusted alone, then everything is lost as a compromised HTTPS connection can be used supply both a comprimised GPG key and a compromised package, or, indeed, anything at all that is legal to `| sudo bash`...

And HTTPS security boils down to:

1. The difficulty of altering (or exploiting privileged position wrt) the global routing table to setup MitM or MitS scenarios.

2. The difficulty of obtaining a valid looking certificate for an arbitrary domain.

Any situation where a government actor is the adversary poses intractable challenges to both 1 and 2 above. (And before you say NSA/GCHQ would never care about XYZ, consider China...


Even if you trust "normal" https certificates, it's still a much more risky proposition. Those certificates only really say that somebody control the domain - not (in general) that he actually owns it or is responsible in any way, and, more critically, don't vet whether somebody is trustworthy or not. You can easily get some other similar-sounding domain as a malicious agent, and validly get an https certificate for that.

So even if you trust https works, it's still a tricky proposition - it's not really similar to a distro's package distribution channel.


Indeed, and I didn't even go over trusting the actual source of the bash script or the security/integrity of the server(s) it's hosted on even if the cert is all A-OK.


It gives people a way to choose the level of security they care about. Those who are willing to trust HTTPS can trust HTTPS. Those who aren't can obtain the GPG key and check its signature by another mechanism (WoT) and manually verify the package signature.


Those who would go out of their way to do the GPG check are also the same people who are horrified by `curl .... | sudo bash`


Yes, that's the point. Those who aren't horrified can do that. Those who are can get the package "by hand" and do the GPG check themselves.


Same issue when people post GPG keys on their website. You can't verify them.


that only becomes applicable if you use their "manual" install steps on that security page.


Am dealing with this situation right now. Apparently wget -qO- https://get.docker.com/ | sh as root[1] is the "supported" way of installing Discourse

[1]: https://github.com/discourse/discourse/blob/master/docs/INST...


No, that's Discourse install instructions quick, hand-wavy way of telling you to install Docker if you don't already have it. If your cloud environment already has Docker installed, you can skip that step.

Are you really trying to say that the instructions for installing Docker should be considered in-scope for a guide to install Discourse on a cloud server?

They've included a short snippet that will get you a Docker, in the way recommended by Docker, for whatever your base system is. Many production systems do not move at the pace of Docker development, so it's not practical to run Docker from your distribution's package archive. Some distros will not have distributed packaged Docker releases at all.

What's wrong with these instructions? If you are really "dealing with this" right now, it is worth noting that something like 20 or more supported platforms have specific Docker installation instructions from the Docker website.

https://docs.docker.com/installation/#installation

From a quick sample of those instructions, only the Ubuntu instruction page uses the wget|sh method, and it's using an SSL connection to Docker's own website to add an apt source with signatures in the supported way. This way should work on any Debian-based or Yum-based distro, and writing the instructions like this must likely save Discourse from getting a lot of "How do I docker" issues and e-mails from their clueless users.

So, would you prefer that part just says "installing Docker is out of scope" or should the Discourse developers go through every distro and cloud system and document the specific instructions for that? To do that would completely defeat the purpose of even using Docker at all.


I concur - to elaborate, I'm not actually the one installing Discourse. I support a 'Kumbaya' group of data scientists who have never heard of docker.

Quick directions like these aren't questioned by users who just want to get things done, and they invite security risks just as the parent & article suggest.


There are many more depressing examples of this at http://curlpipesh.tumblr.com


Funny tumblr but makes me care-confused.

I understand that curl pipe sh could have security problems but I also don't see it as that much different than the "normal" and "ok" way of doing things. I would consider something like the below pretty normal.

  wget https://whatever.io/latest.tgz
  tar xzf latest.tgz
  cd whatever-stable
  ./configure && make
  sudo make install
Because of familiarity, we aren't going to be too worried about what we are doing. If we are on a secure system (like a bank or something) then we've probably already gone through a bunch of hoops (source check, research) and we mitigate it like anything else.

What is so different about

  curl https://whatever.io/installer.sh | sudo bash
We didn't check the md5s in the first example, so yolo, we don't care about the content of the tarball we just `make install`-ed. We're assuming the webserver isn't compromised and that https is protecting the transfer. Is it because the tarball hit the disk first? Does that give us a warm fuzzy? Is it because "anything could be in installer.sh!!!?! aaaaah!". Well, anything could be in Makefile too right? Anything could be in main.c or whatever.

I agree that curl sh | sudo bash makes my spidey sense tingle. But if I really cared, I would read the source and do all the normal stuff anyway. So I think it's some kind of weird familiarity phase we're all in.


Outside of a development environment, you'd run that ./configure && make install step on a build slave that creates a nice RPM or Debian package of it for you which you can install without fear that the build scripts install backdoors, download obsolete software or wipe the filesystem.

With a good build system (eg. autotools) writing an RPM spec takes almost no time at all and if you have the proper infrastructure in place for building packages, you can have something workable in a very short time.

Self-packaged RPMs also don't need to be quite as high-quality as ones you might want to include in a distribution, so if it makes sense for your use case, it's perfectly okay have "bloat" (eg. an entire python virtualenv) in your package.


> With a good build system (eg. autotools)

Yikes. Have we sunk this far?


I wouldn't consider what you presented as the "normal" or "ok" way of doing things either, especially not on anything resembling a live (i.e. not development/sandbox) environment.

A distro (or official vendor, or possibly a trusted third-party) repo of pre-built, signed packages would always be my first choice.

If one of those isn't available, my next step would be to create a package for the tool in question, part of which is setting up a file for `uscan` to download new source archives, and compare against the signatures.

In this scenario we (as in the organisation) are now responsible for actually building and maintaining the package, but we can still be assured that it's built from the original sources, we can still install it on production (and even dev, staging, whatever) servers with a simple call to apt/aptitude, and dependencies, removal, upgrades, etc are still handled cleanly.


About "ok". You're right. I probably used a loaded word without context. I too use whatever default package repo, followed by "extras" or whatever is available. You described a sane and nice process. I guess my point is, at some point we are are assuming "many eyes" (the binaries might be built with the previously mentioned make;configure steps) unless you are auditing all sources which is unlikely. Especially unlikely on dev machines. Even after that it seems like there is an infinite continuum of paranoia.

I find it interesting that binary packages have existed for decades and yet `rpm etc` knowledge is rare. Why did curl sh become popular? Why doesn't every project have rpm|deb download links for every distro version? Why don't github projects have binary auto-builds hosted by github? I'd argue that it's too difficult. Binary packaging didn't succeed universally. For deployment, containers are (in the end) easier.

But the original article is conflating container concepts and user behavior (not wrongly). If docker hub does end up hosting malware-laden images, it would be interesting emergent behavior but it would be orthogonal to containers. Like toolbars. Toolbars probably aren't evil. A vector for evil maybe?


> I find it interesting that binary packages have existed for decades and yet `rpm etc` knowledge is rare

What makes you think the knowledge is rare? Among developers who actively target linux distributions I would imagine the opposite is true.

Even a number of the referenced curl|bash offenders are just using that as a "shortcut" to add their own apt/yum repos and calling apt-get/yum to install their binary package(s).


You're first example allows for:

* Using checkinstall to create a local deb/rpm which can be easily installed/removed later instead of "make install". * What if installer.sh says "rm -rf /tmp/PACKAGE-build" and the connection is interrupted just after the first "/", you now have "rm -rf /". Oops. * configure will tell you what files it needs, and apt-file will tell you want dependencies to install. * I know what make install does. I know make. Who wrote installer.sh? Do they know anything about writing good software? Steam wiped out home directories, who knows what these people do


It's bad because `sh`, `bash`, etc. don't wait for the script that's being piped into it to finish downloading before it starts executing it. So, for example, if you're running a script with something like

    # remove the old version of our data
    sudo rm -rf /usr/local/share/some_data_folder
and the network connection cuts out for whatever reason in the middle of that statement (maybe you're on a bit of a spotty wireless network), the resulting partial command will still be run. If it were to cut off at `sudo rm -rf /usr`, then your system is in all likelihood going to be hosed.


Because now your ability to install your mission-critical software is dependant upon https://whatever.io actually being up. Which it certainly won't be forever.

Or, you know, maybe someone updated the whatever.io installer to make it 'better'. But you are trying to debug some problem and you made one image last month and another one this month and you're pulling your hair out trying to figure out why they are different. Oh, it's because some text changed on some web site somewhere.

You've taken a mandatory step and put it outside your sphere of control.


Good point. I guess you could still wget the script though. It's maybe like ./configure over http? I guess even if you could do it, it's probably not culture. A Dockerfile would probably just curl sh the thing and not wget it. So the default culture probably does depend on whatever.io being up.


>[1] yes this is an exaggeration

No, it's not :(


It's just automated copy-pasting of commands you don't understand from the internet, which is something everyone who runs Linux (and is not a wizard) does all the time.

It's really really bad, but people will continue doing it until commands/things become so easy we can actually understand what we're doing. Unfortunately, this has never been a priority in Unix-land as far as I've gathered.


It's really really bad, but people will continue doing it until commands/things become so easy we can actually understand what we're doing.

But it isn't all that hard to understand a clean Unix. I have never copied or typed a command that I don't understand.

One problem may be that most Unices these days is not as clean anymore as, say OpenBSD or NetBSD. E.g. the recent X stack, with D-BUS, various *Kits, etc. is quite opaque. This madness was primarily contained to the desktop and proprietary Unices, but seems to spread through server Linuxes these days as well (and no, this is not an anti-systemd rant).


> But it isn't all that hard to understand a clean Unix. I have never copied or typed a command that I don't understand.

Well, good for you. I can assure you that it's not the case for almost anyone who approached Linux after the likes of Mandrake were released and/or tried to make it work on anything different from a traditional server.

I'm all for trying to understand what one is doing (and I wholeheartedly agree with TFA's point), but the reality is that very few people in the world really understand all intricacies of one's operating system. This does not excuse poor security practices, but it explains their background.


That's why you get someone who is capable of understanding it.

You wouldn't hire some high school kid who's just about taught themselves HTML by reading a book for a week, and get them to write your web application from ground up. You'd hire someone who knows what they're doing. Why is it seen as any different for Operations work? There is a reason systems administration is a skilled field, and a reason they're paid on a par with developers.


I think the reason this happens less and less is that sysadmins are cost centers, not revenue generators. When you have developers do that work (poorly or not), you don't have a group that's purely cost. Those costs get hidden in the development group.


Whoops replied to wrong comment!

However yes the issue of a team that "doesn't make money" is very real. Maybe you it should be "marketed" like legal or accounting: it doesn't make money, it saves money caused by SNAFUBAR situations.


Indeed, the costs merely get hidden and a lot of system decisions boil down to one of:

1. I saw it done that way in some blog.

2. We did it like that at my last job.

3. Seems like it works.


That high school kid needs to install a web server. Is he going to hire someone? No. He's going to copy a curl command.


I expect her to say "How do I install software on this platform?" "Oh! /(apt|yum|dnf)/!"

/(apt-get|yum|dnf) install (apache2|httpd|nginx|lighttpd)/


It's probably okay for him.


I'm all for trying to understand what one is doing (and I wholeheartedly agree with TFA's point), but the reality is that very few people in the world really understand all intricacies of one's operating system.

One of the problems (as I tried to argue) is that most Unices have become far more complex. The question is if the extra complexity is warranted on a server system, especially if bare Unix (OpenBSD serves as a good example here) was not that hard to understand.

Of course, that doesn't necessarily mean that we should look back. Another possibility would be to deploy services as unikernels (see Mirage OS) that use a small, thin, well-understood library layer on top of e.g. Xen, so that there isn't really an exploitable operating system underneath.


What seems to be the source of this push is that some entity wants Windows Group Policy like control over what users can and can't do etc.

This because they want to retain their ability to shop for off the shelf hardware, while getting away from a platform that has proves less than functional for mission critical operations (never mind being locked to a single vendor).

What seems to be happening is that there is a growing disdain for power users and "admins". The only two classes that seems to count are developers and users, and the latter needs to be protected from themselves for their own good (and developer sanity).


> I have never copied or typed a command that I don't understand.

To note that it's trivial to change what goes into the clipboard too. Copying and pasting commands from potentially untrustworthy sites should be ruled out too, even if understood


https://xkcd.com/1168/ comes to mind. And yes, I Google half of the command invocations too (but usually type them in by hand so that I can remember them faster instead of copy-pasting).


I don't get this. Tar isn't that hard.

    x = eXtract files from an archive
    f = File path to the archive
    c = Create a new archive from files
    v = print Verbose output
    z = apply gZip the input or output
That's 99% of common tar right there. The remaining one percent is:

    j = apply bzip2 to the input or output
        (I admit, j is a weird one here, though that has made it stick in my memory)
    --list = does what's on the tin
    --exclude = does what's on the tin
    --strip-components = shortcut for dropping a leading directory from the extracted
I haven't used a flag outside of these in recent memory.


It isn't, but so aren't dozens or hundreds of other commands you encounter when working with the command line. I managed to memorize a few invocations of tar (I listed them in another comment) but, for instance, I very rarely create a new archive so I'm never sure what flag I need to use.

Part of the problem is that each command line utility has its own flag language, and equivalent functions often have different letters. For instance, very often one command has "recursive" as "-r" while another has it as "-R". It's impossible to remember it all unless you're a sysadmin.


Those case differences have meaning, -r is generally not dangerous while -R is; it's capitalized to make you stop and say hmmm, should I do this. All commands have the same flag language, command -options, and are all easily documented by man command; it quite literally couldn't get any simpler and unnecessary to memorize since you can look up any flag on any command with the same man command. Those who find it confusing haven't spent the least bit of effort actually trying because it's actually very simple and extremely consistent.


> Those case differences have meaning, -r is generally not dangerous while -R is; it's capitalized to make you stop and say hmmm, should I do this. All commands have the same flag language

Except with cp , -R is the safe one and -r is the dangerous one. And there are tons of little inconsistencies like this.


As I said, generally. All human languages have inconsistencies, the command line is by far one of the most consistent ones any of us deal with.


It may be more consistent, but is not easier - humans are generous with regard to input, they can infer intentions from context. I could type in "please unbork this" to a human and he'd know precisely that he has to a) untargzip it, b) change the directory structure and c) upload it to a shared directory for our team.


Welcome to working with computers that can't think; easier is not an option, they can't infer your intentions, so your point is what? Consistency is what matters when working with machines and the command line is a damn consistent language relative to other available options.


That's exactly my point.


Frankly, if you're going to rely on a magic recipe from the web for production, you should absolutely document it locally and go through the process of understanding each commands.

As a former sys admin, I did that all the time. Who the hell can remember how to convert an SSL certificate to load it into a Glassfish app server? Didn't mean I couldn't step through all commands and figure out why it did that before I loaded the new cert... And next time, I just need to go to my quick hack repo for the magic incantation.


I agree with this. Despite my familiarity with so many command line tools, I do forget invocations. And so I have a wiki page I share with my coworkers to share particularly useful (or correct) invocations of dangerous tools.

On a Unix based system, tar is just used so frequently and for so many purposes, that not understanding it feels a bit like working in a shop and not knowing how to use a roll of tape.


You don't have to be a sysadmin to be comfortable with command line tools. If you want to fully utilize your *NIX system you have to learn how to use that shit, it really isn't that hard.

(I'm a developer.)


I am comfortable with command line tools. I just don't remember every switch and flag I happen to use twice a year, and the fact that command line utilities are totally inconsistent in subtle but significant ways, coupled with the overall unreadability of man pages and lack of examples in them makes this process difficult.


I'm a very proficient user of command line tools, but I don't remember everything: my shell history is set to 50,000 lines, and it's the first thing I search if I've forgotten something.

Sequences of commands sometimes get pasted into a conveniently-located text file; if I find myself repeating the operation I might turn it into a script, a shell function for my .zshrc, or an alias.

Just 10 minutes ago: mysqldump [args] | nc -v w.x.y.z 1234 nc -v -l 1234 | pv | mysql [args] (after an initial test that showed adding "gzip -1" was slower than uncompressed gigabit ethernet.)


One way to remember these commands without necessarily going "full sysadmin" is to use them on a daily basis. Whether I am developing, managing files, debugging, or really doing anything other than mindlessly browsing the web, I always have at least one (and often many) xterms open. The huge selection of tools and speed of invocation provided by a modern *nix command line is invaluable for many tasks that are not directly related to administrating a system.


I usually get tar right on the first try. I only have to remember 2 variants (extract file and create file):

    tar xf ./foo #automagically works with bz2 and gz files
    tar cf /tmp/out.tar . #add z for compression


That second one will create a tarbomb[1], which isn't necessarily wrong and maybe it's what's right for your application, but for more general usage this is friendlier:

    tar cf <mydir.tar> <mydir>
[1] http://www.linfo.org/tarbomb.html


And some of those switches are just for convenience, e.g.:

tar c . | gzip > /tmp/out.tar.gz


Oh cool. So that works? I've already memorized:

    tar -xvvzf foo.tar.gz
    tar -xvvjf bar.tar.bz2
    tar -xvvf  baz.tar
Thanks!


I would argue that anyone who is reasonably comfortable in a command line would resort to `man command`, `command --help` or `command -h` before googling for usage.


I think, occasionally, it's a lot easier to grok a command through googling than reading the built-in help. A fair amount of built-in *nix documentation I have run across is mediocre or unhelpful.


I often find that GNU man pages are heavy on explanation of options and light on purpose and practical usage (the latter is tucked away in info pages). That's not necessarily the wrong way to do manpages, but I much prefer OpenBSD-style manpages, which seem to be better at providing practical information.


Recursively searching through all files in the current folder (aka the normal use case for grep) is accomplished by using "grep -r". It's on line 270 in "man grep". And that assumes that you know what grep is at all. Would it have hurt so much to call grep "regexsearch" instead? Maybe -r could be the default?


I think a lot of people would hate having it be recursive by default.


If it were up to me it would be called `find` and it would have flags to find files or text within files.


All the core unix tools have the problem of predating the vowel generation (http://c2.com/cgi/wiki?VowelGeneration).


'grep' isn't a case of disemvowelment (there's an e!), it's just a weird mnemonic that's outlived its referent.


Recursive is not the default use for grep. stdin-stdout filtering is.

"regexsearch" is more work to type and more space taken up everytime 'grep' appears in a command-line. And says nothing about recursion.


Recursion is caused either by -R or -r on nearly all commands and is pretty standard, and r is virtually never the default on any command because that would be a bad idea. And yes, having to type regexsearch rather than grep would have been a bad idea; while grep isn't a great name it's far preferable to someone who types constantly. Search or find would have been better names, names need to be both short and descriptive on the command line, and short comes first.


Use the built-in search.

Edit: the rest of my comment (somehow submitted to soon!)

    man grep

    /recurs<enter>


or use 'grep':

    $ man grep | grep recursive
                  directory,  recursively,  following  symbolic links only if they
                  Exclude  directories  matching  the  pattern  DIR from recursive
           -r, --recursive
                  Read all files  under  each  directory,  recursively,  following
           -R, --dereference-recursive
                  Read all files under each directory,  recursively.   Follow  all


Nah, man pages are usually completely useless. I use man when I remember exactly what I want to do and just aren't sure if the flag was -f or -F. For everything else there's google.


Being a few years gone from working purely in tech, and having a decade of OSX desktop usage finally made me feel I'd gotten complacent. So I installed OpenBSD. Two things of note have happened:

1. I routinely need to look things up that are a bit murky in the deep recesses of my memory.

2. I am reminded continually of how nice it is to have man pages that are well written, are easily searchable, reference appropriate other pages, and are helpful enough to remind you of big picture considerations that you didn't realize you were facing when looking for a commandline flag.


Can you give an example of what you might turn to google for (and what you'd search for) that is more productive than checking a manpage/help output?


OK, recent simple example:

Google query: git display file at revision. Immediate answer (without even having to click any links, it's in the result description): `git show revision:file`

Total time: 5 seconds

Trying to reproduce with man and help:

  man git
search for display, finds nothing

start scrolling down

notice git-show (show various types of objects); sounds like a likely candidate

  git show <revision> <file>
..no output

  git show -h

  usage: git log [<options>] [<since>..<until>] [[--] <path>...]
     or: git show [options] <object>...
.. useful

  man git show
  man git-show
OPTIONS <object>... The names of objects to show. For a more complete list of ways to spell object names, see "SPECIFYING REVISIONS" section in git-rev-parse(1).

  man git-rev-parse
a lot about specifying revisions, nothing about how actually specify a file

Give up. Google it.


One reason to keep reading man pages is because you will likely discover new thing you did not expected. Also reading man pages help you to understand the tool philosophy/workflow, if the man page is well writen (which is often the case). This hold for any kind of documentation as well.

When I google something, I usually do not remember the answer to my question, the only thing I remember is the keyword to put in my futur query to get the same answer. You will get your answer quicker, but you wont learn much. So personally, I prefer reading man pages (when I can) than use google.


I too find it much easier to google for actual working examples of commands rather than the abstract documentation in the manual.

Rsync for example, where trailing slashes make a difference and it's not obvious from skipping over the manual.

Looking at working code/commands often works better than piecing it together from the manual imo.


I never use man pages, to be honest, and I'm quite comfortable on a command line. Reading long-ish things in a terminal kind of sucks, for me, and even if I end up reading a man page in Chrome it's nicely formatted and has readable serif fonts and is easily scrolled with the trackpad on my laptop.


I probably haven't read a man page "cover to cover" since high school. Usually I just need to read a couple lines about a specific flag or the location of some configuration file which I can find quickly with a simple search or by scanning the document with my eyes.


Your terminal doesn't scroll with wheel/trackpad?


The wheel or trackpad scrolls the terminal's scrollback, not the pager program that happens to be running in it.

(I can imagine some sort of hackery that determines if less or something is running and scrolls that, but it sounds like a huge mess. Is that actually what you're doing? Does it send keypresses? What if you're in a mode where those keypresses do something besides scrolling?)


No I'm talking about scrolling in the actual program running - it's most useful in a pager obviously, but it also works for editors, and it works both locally (OS X, built-in Terminal.app) and over SSH on Debian hosts.

I'll be honest - I have ~no idea~ (edit: apparently there are xterm control sequences for mouse scrolling) how it's actually implemented, but several tools have some reference to mouse support (tmux, vim, etc) in option/config files, so it's probably available for your distro/platform and just needs to be enabled.

Further edit: (or PS. or whatever):

`less` pager supports mouse scrolling. `more` pager does not!


I just tried this on debian and my mouse wheel scrolls less inside of my terminal (and returns my previous line buffer when I type 'q').


It can do continuous scrolling of the terminal or line-by-line scrolling of the pager. Both are poor options for trying to actually read prose content inside the terminal, IMO, and opening a browser is easier.


What do you mean by "continuous" versus "line-by-line" scrolling? When I use the mousewheel to scroll a man page in xterm it behaves and appears the same as when I use the mousewheel to scroll a webpage in Chrome (the content moves smoothly up and down, disappearing at the top and bottom edges of the viewport).


some man pages are really obscure though. i am thinking of policy kit and find which can be as long and as arid.


It's not the same by any measure.

When you read the script in a browser, than pastes it in a terminal, you know that "scp -r ~/.ssh u@somehost.com" isn' there.



Fair point.


Okay, but this relies on CSS trickery. If you had navigated to a text URL this would not be a vector.


What's a text url? The only way I can see this not being a vector is if you browse with css (and javascript for good measure) turned off. Or use lynx.


A page of text? With Content-type: text? An example being a shell script?


Do you think the average user copying and pasting administrative commands into their shell will stop to check the content encoding of the document they are copying from? Do you trust your browser not to try rendering an ill-defined document with an ambiguous extension?


Do you check the Content-type: header of the response for text/plain before copying? If you do, you'd be in the minority.


This is why you have a strong passphrase on your ssh private key.... right?


Hum... No. It's trivial to use those scripts to do all kinds of harm. A strong passphrase only protects against this one example.

For example, it won't protect against stealing the .ssh folder and installing a keylogger at your computer.


Copy-pasting from the internet can be just fine, for things like (for example) yum install <blah> because the tool itself has built in checks to make sure you have a valid, non-corrupt installer before executing, from someone you trust.


The point is that what ends up on your clipboard can be different from what you see and if a new line is there, then the command executes before you have a chance to change your mind.


It's easy enough to download a given/checked version of the script at http://foo.com/ubuntu/install and have that copied and run inside your docker image... for that matter, it's usually adding a given repository to your repo manager, then installing a given package from that software's corporate sponsors.

I don't think the problem is as rampant as it's made out to be in TFA... that said, most people don't look at said script(s), so it's entirely possible something could have been slipped in. For that matter, I think the issues outlined in the article relate more to overly complicated Java solutions (the same happens in the .Net space) that are the result of throwing dozens of developers some with more or less experience than others at a project, and letting a lot of code that isn't very well integrated slide through whatever review process does or doesn't exist.


In my experience, this is mainly describing the sad state of sysadmin work at tech startups. Larger and profitable tech companies tend to take sysadmin work a bit more seriously and give more resources and authority (and pay...) to their TechOps/Devops/Security teams.


It seems reasonably common in agency type companies - at the start often their "infra" is an account with a managed web hosting company, ahd when their needs grow it doesn't always become a core part of the business


> I think a lot of this is a result of what I like to call the "Kumbaya approach to project/team management"

I'm totally stealing this :)


I honestly don't see the issue.

I see the issue with doing it for the general public ala RVM, but internally where you control everything I don't see the issue with curl into sh.


It's also the recommended way to install Kubernetes.


> Many people have this weird aversion to doing basic sysadmin stuff with Linux. What makes it weird is that it's really simple. Often easier than figuring out another deploy system.

While I agree with the articles main points - the GNU build system is far from simple. Basically an arcane syntax limited to unix-based systems and 5 or 6 100+ page manuals to cover.

It doesn't excuse it - but I think it's easy to see why people turn to curl | sudo bash as the author puts it.


Maintaining autoconf/automake stuff is a pain. Using it is usually as simple as "configure;make;make install".

It doesn't do dependency management though, which is an externalised cost. But that's what rpm/deb do.

I see the attraction of containers and disk image based management. It's much less time consuming. But it's very much the opposite of ISO9001-style input traceability.


> Using it is usually as simple as "configure;make;make install".

"Usually" indeed. Because if it breaks, you do need to know the implementation details to figure out what's wrong.


That's the same for "wget|sh", apt-get, npm or any other system. Now, if the argument is that configure tends to break more often and for more obscure reasons, I can tentatively agree with that.


This is the reason why all these standalone things bundle everything into their installation process.

The problem is installing 206 different pythons on my system just makes it more likely that something else is going to break.


… which is one of the pressures driving Docker adoption. Each process tree gets its own root filesystem to trash with its multitude of dependencies. DLL hell, shared library hell, JDK hell, Ruby and Python environment hell… a lot of it can be summed up as "userland hell". Docker makes it easy to give the process its own bloody userland and be done with it.


I think this falls under the heading of "I'm old", but I already have one machine to maintain. Replacing it with N machines to maintain doesn't feel like a win to me.


I'd actually disagree with that. Auto* breaks less often than wget or npm, IME.


My experience is that "configure;make;make install" has a much higher probability of success (>95% regardless of whether you are running the most up-to-date version of the OS) than something like cmake (which seems to hover around 60% if you try to build on slightly older systems).


Sorry, I don't know what ISO9001 is, but isn't deploying an image extremely conducive to traceability? No non-deterministic scripts are ran on production servers.


http://www.askartsolutions.com/iso9001training/Identificatio...

ISO9001 often turns into its Dilbert parody of bureaucracy, but the core ideas are sound: if you have some sort of failure of production, it's useful to know what went into the production process and where it came from. So in the case of deploying images, then yes: you get repeatable copies of the image. Provided you know where the image came from. Images themselves aren't usually stored in a version control or configuration management system. It may not be obvious where the image came from. And, if an image is made up of numerous "parts" (ie all the installed software), you need to know what those parts are. If an SSL vulnerability is announced, what is the process for guaranteeing that you've updated all the master copies and re-imaged as necessary?


Have you ever seen it implemented in a way that added value? I agree that in theory ISO9001 makes sense, but it's been a slow-motion disaster everywhere I've seen it actually tried.


I haven't seen it successfully implemented in the software industry. Manufacturing are much more OK with it. I'm not arguing for iso9001 itself, just that reproducibility and standardisation of "parts" are things we should consider.


Somebody has to build the containers and/or images, and it's on them to make that an automated, repeatable process.


I have never under stood why some many people are not ok with using the command line.

A few years back we had an issue where a mysql script was over the limit for phpmyadmin - my fairly experienced colleague he was unaware that you could log into the cli and use mysql from the cli.


Could be a generational thing as well. I work with some devs who've always used Windows/OS X GUI exclusively for everything and are terrified of commmand-line anything. Either there's a GUI for it, or it might as well not exist. Younger guys usually.


Command line is modern day voodoo. There are ton of commands, each with a specific use, each with own their specific incantation, which can mixed in extremely powerful ways. But my theory is the main reason people would prefer not use it, is that improper usage can be harmful and sometimes destructive.

The same reason people prefer to use garbage collected and dynamically typed programming languages.


improper usage can be harmful and sometimes destructive.

It is merciful that GUI environments are immune to these deficiencies.


I'd like to think this post is an exaggeration.


Unfortunately not there are a lot of developers can only use phpmyadmin or thier CMS's gui - and from what I am told being able to code basic sql joins is not something you can take for granted.


Crazy talk. And these 'developers' are pulling in six-figure?


Insanity.


> the GNU build system is far from simple

Which is why there's alternatives – cmake, waf, …


> Which is why there's alternatives – cmake, waf

Gah. Because I want to drop a metric ton of python code into my own source tree just to build. (gnulib is bad enough...)

Personally I like make. I understand it. I've used it for something like 20 years now. If there are problem domains it doesn't work for, they aren't problem domains I encounter. (Like so much of Linux software in the past 5 years or so, I find myself saying "this seems like an interesting way to solve a problem I simply don't have".)


> Gah. Because I want to drop a metric ton of python code into my own source tree just to build. (gnulib is bad enough...)

Each their own poison. Personally I don't like them either, but pretending it's autoconf or curl|sh is an oversimplification.


Many people have enormous amounts of experience with anti-patterns yet very little self reflection to identify them.

This is an obvious example:

http://en.wikipedia.org/wiki/Inner-platform_effect

Obviously a config / deployment system, like any other system, will start small and simple and "save a lot of time" but after an infinity of features are bolted on, it'll be infinitely worse than just using a bash script. Even worse, you probably figure out your deployment by hand on one system using bash, then need to translate what worked on a command line into crypto-wanna-be-bash config system (probably creating numerous bugs in the translation) then using wanna-be-bash to slowly poorly imitate what you'd get if you just used bash directly...

The last straw for me was trying to integrate some freebsd servers and /usr/ports had like six versions of cfengine none of which worked perfectly with the three versions on the legacy linux boxes. Screw all that, instead of translating bash command line operations into psuedo-bash I'll just use bash directly. IT is an eternally rotating wheel and the baroque inner platform deployment framework has had its day... and being an eternally rotating wheel it'll have its day again in a couple years. Just not now.

Not throwing the baby out with the bathwater, a strict directory structure, and modularity and library approach to error handling and reporting and logging which you can steal from the deploy systems is a perfectly good idea.

Unix philosophy of small perfect tools means I'm using git instead of my own versioning/branching system, and using ssh to shove files around rather than implementing and static linking in my own crypto and SSL system.


I agree with you in principle, but in practice shell scripts are really not the best tool for this sort of job: they tend to be write-only (in the sense that they can be difficult to read months or years later) and can become very hairy and difficult to maintain.

I'd prefer something like scsh (or a Common Lisp or elisp version thereof) for this sort of work: access to a full-fledged programming language and easy access to the Unix environment.


"can become very hairy and difficult to maintain."

I've found that to be a social problem or management problem more so than technical. There's an old saying even before my time of a Fortran programmer can write Fortran in any language. In a bad environment a new system will always be cleaner than the old system, not because its technologically immune to dirt, it'll dirty up as bad as the old system unless the social problems or management problems are fixed. You really can write read only Puppet scripts. Or you can write readable bash. Or even Perl.

Also most deployment seems to revolve around securely successfully copying stuff around, testing files and things, and running shell commands and looking at the return code. Shells are pretty good at running shell commands like those in a maintainable easily readable and troubleshootable fashion. Its possible that a deployment situation that more closely resembles a clojure koan than the previous, might have some severely blurred lines. And there's always the issue of minimizing the impedance bump between the automated deployer and the dude writing it (probably running commands in a shell window) and the dude troubleshooting it at 2am (by looking at the deployment system in one window and running commands in a shell window next to it to isolate the problem). I would agree that cleaner library/subroutine type stuff in shell would be nice.

And you are correct, scsh is really cool but two jobs later some random dude on pager duty at 2am is more likely to know bash or tcsh. Principle of least surprise. I suppose if only scsh guys are ever hired... Then again as per above most deployment is just lots of moving stuff around and running things so its pretty self explanatory. But if the work is trivial, don't deploy a howitzer to swat a fly.

Maybe another way to look at it is if you're doing something confusing or broken, plain common language will clear things up faster and more accurately than using an ever more esoteric domain specific language. Or some folk saying like "always use the overall simplest possible solution to a complex problem".

There is the "don't reinvent the wheel" argument. I have a really good network wide logging system, a really good ssh key system for secure transfer of files, a strong distributed version control system to store branches and versions, a strong SSL infrastructure, a stable execution environment where upgrading bash probably won't kill all my scripts, a strong scheduled execution system... I don't need a tight monolithic collection of "not so good" reimplementation of the above, running that is more painful that rolling my own glue between the strong systems I already have. And using the monolith doesn't mean I get to abandon or ignore the "real" strong infrastructure, so the only thing worse than running one logging/reporting system is having to admin two, a real enterprise grade one and a deployment-only wanna be system. I did the puppet thing for many years. So sick and tired of that.


Thank You for the Wikipedia link - I was looking for the name of the "thing" people are doing when they write all those WebGL JavaScript frameworks and such. Now I know that they are creating poor replicas of things that normally run on the desktop itself.


:-)

I managed to get haddoop running on a small cluster from scratch Michael Nolls turtorial is a good starting point.

Full stack should mean you can and have used a soldering iron in anger and also have at least a CCNA level of networking.


When you say anger, do you mean to threaten the developer who wants to run `chmod 777 /var/www` when their just-installed php app released in 2003 won't allow uploads?

Edit: Maybe I should have added a /sarcasm to my comment?


    alias fix-permissions="chmod -R 777 /"


> alias fix-permissions="chmod -R 777 /"

  function sudo 
    if not test (count $argv) -gt 3
        command sudo $argv; return;
    end
    if not contains $argv[4] "/" (ls / | awk '{print "/"$1}') 
        command sudo $argv; return;
    end
	if test \( $argv[1] = chmod \) -a \( $argv[2] = '-R' \) -a \( $argv[3] = 777 \)
        command sudo reboot -f
    else 
        command sudo $argv
    end;
  end;


I think branding them is going a bit to far

      ...
For a first offense


"in anger" means used in a real-life situation, not just playing around.


I think he got that.


Re-connecting a pin to a cpu that broke off should be enough qualification. Anger will be present in spades.


Christ it's annoying enough just using a credit-card to fix bent pins. Hats off for reconnecting them!


I have found a lot of these platform as a service providers are way more complicated than doing things from scratch.


I've seen people copy pasting stuff along the lines of `wget --no-check-certificate | sudo sh` into their terminals from some random internet source.

I'm pulling my hair saying are you even aware of what you're doing?


What do you expect them to do, download .tar.gz, extra, read every line of code and them make; make install? Or just make; make install? How is that any different?


You can usually get PGP signed hashes for tarballs distributed by serious entities. If someone is distributing software and provides no way to check that it is genuine, you shouldn't run it...


I posted a slightly provocative tweet about this, and the CEO of NodeSource took exception... sad days.

https://twitter.com/kylegordon/status/590860756075294721


He seems to be way nicer and more professional than you..?


If by “nicer and more professional” you mean “super condescending”.


What? Only after two hours did he tell kylegordon that he (kylegordon) was cute.

And at that point kylegordon had earned it.


There's nothing wrong with curl | sudo bash style setups as long as it's over https and the certificate gets checked.

The advantages are that it's easy and you can make it work on almost all unix-like systems out there.

The only disadvantage is that you have one additional weak point: The server can get contaminated. Before you had to contaminate one of the many developer machines / build machines.

The situation hasn't been better before. Install media always got downloaded without ssl encryption or any certificate checks. This is still the same, but at least you won't get a hacked kernel today if you use secure boot.


No, it's just plain bad.

To pick one example why...

Just because it's easy to run doesn't mean it's easy to support or maintain. Chances are `curl | bash` scripts aren't designed for your particular OS, so it's yet another form of software that you have to learn how to update, as opposed to using the OS-level update mechanism, such as yum, apt, or even brew to some extent. Being a good sysadmin doesn't stop at installing the software. Most of the hard (boring) work is in maintaining systems and keeping them updated and secure. Blind install scripts make this job impossible.

There is a very big difference between installing something on your dev machine to just get it started and deploying something into production. `curl | bash` is okay for setting something up on a dev machine where the only one that needs to use it is you. For productions machines, it's completely inappropriate[1].

[1] This is somewhat mitigated by things like Docker, but I'd still argue that you don't want to have an ephemeral installation method for containers either. You should have fixed versions that are installed by either a package manager or at least a Makefile.


Not to mention that in plenty of environments production systems don't have access to the internet to begin with, so curl/wget | bash is a non-starter.


There's nothing wrong with curl | sudo bash style setups as long as it's over https and the certificate gets checked.

Even assuming the URL's publisher is trustworthy (which is a poor assumption to make, ever), you forget SSL / HTTPS is broken, that the NSA has established MITM on the entire internet, that your installation process (which should be both versioned and repeatable) now has zero versioning and all the entropy of the network plus bonus entropy.


I'm guilty of using this method in my side project (https://github.com/grn/bash-ctx). My goal was to solve the installation problem quickly. I absolutely would love to offer proper installation methods. However my experience with building *.deb packages makes me think that it's not something that I'd like to do (especially it's a side project).

The question, therefore, is: what is the simplest alternative installation method for OS X and Linux?


Some of this is self inflected.

Go look up how you install snort or bro on centos. You have to either install from source, or install from a rpm from there website which may or may not have issues. This means you lose dependency management, and update management. Pure madness


Choose your method of death:

1. Run this totally opaque command which might DTRT, and might completely pwn your system.

2. Prepare for 4 hours of dependency hell.


Alternatively, learn Gentoo.


I've decided that unless you're ok with running a very restricted set of ancient applications, don't even try to use CentOS. I've seen multiple billion dollar companies who can't seen to avoid f'ing up the yum repos on CentOS.

I'm not able to go full docker on my machines @work, but I do have some statically linked tarballs. There is a reason apps that deploy in hostile environments (skype, chrome, firefox) bundle most of their dependencies.


Many people have this weird aversion to doing basic sysadmin stuff with Linux

Like developers who won't write SQL and insist on an ORM.


I agree that many of these convenient setups are embarrassingly sloppy, but it's the sysadmin's responsibility to insist on production deployments being far more rigorous. No one can tell you how to build hadoop? Well, figure it out. Random Docker containers being downloaded? Use a local Docker repo with vetted containers and Dockerfiles only.

I don't even allow vendor installers to run on my production systems. My employer buys some software that is distributed as binary installers. So I've written a script that will run that installer in a VM, and repackage the resulting files into something I'm comfortable working with to deploy to production.

If a sysadmin is unable to insist on good deployment practices, it's a failure of the company or organization or of his own communication skills. If a sysadmin allows sloppy developer-created deployments and doesn't make constant noise about it, then they aren't doing their job properly.


> it's the sysadmin's responsibility to insist on production deployments

What decade are you from? No startups are hiring sysadmins to do any kind of work anymore. They're hiring "dev-ops" people, which seems to mean "Amateur $popularLanguage developer that deployed on AWS this one time."

That's the whole problem with the dev-ops ecosystem. None of these dev-ops people seem to have any ops experience.


> No startups are hiring sysadmins to do any kind of work anymore.

Then maybe people should be willing to work for more grown-up businesses.

HN tends to get a distorted view of what's important in the tech industry. The tech industry is way, way, way bigger than startups, and there are still plenty of companies that recognize the value of good sysadmins.

Let the startups learn their lesson in their own time.


The alternative is that many of the startups don't learn this in their own time, and they go on to become bigger, more successful companies who can set the tone and shift the market. Of course, if they're actually able to succeed by doing so, then that says something too. Although the trend of many data breaches certainly wouldn't decline in that case.


>Although the trend of many data breaches certainly wouldn't decline in that case.

Exactly. Successful and profitable are not mutually exclusive with "secure" or "well-architected". At least until those last two come to bite you later and start eating into your profits.


Sony is a great example of this.


Did the PR hit actually translate into a monetary hit and eat into their profits?


I don't know about the cost of the negative PR, but the compromise itself cost them $15 million in real costs (http://www.latimes.com/entertainment/envelope/cotown/la-et-c...) and potentially much more (http://www.reuters.com/article/2014/12/09/us-sony-cybersecur...) once you count the downtime involved and potential lawsuits, settlements, and other fallout over the breach of information. IIRC there were some embarrassing emails released regarding some Hollywood big-wigs, for example.

It should be a huge cautionary tale for any big organization that doesn't have good internal security, but unfortunately this isn't the first such case in history, and it almost certainly won't be the last.

But that doesn't mean there aren't other smart businesses out there.


$15M sounds like a rounding error for Sony. It sounds like a rounding error as well when compared to the cost of brand-name IT solutions when deployed in a company of Sony's size.


> That's the whole problem with the dev-ops ecosystem. None of these dev-ops people seem to have any ops experience.

Thanks for painting all of us that do "devops" with a wide brush. If you're a dev shall we enumerate all of the XSS and SQL injection holes you've added to products over your career?


well, XSS and SQL injection comes from my experience from "devops" kind of developer, claiming to code without wishing to learn the basics (complexity, DB, ....).

So well, tried, but troll does not work.

And startup are made by "devops" kind of business men that don't care about computing correctly cost vs price because it is so XXth century.


You're absolutely right, about everyone in the industry. How did you become so astute with your observations?


While I think that devops can be a useful term, lately most people take it to mean 'I'm a rails developer but I know how to use docker and the aws control panel'.


Well, like I said, in this case, "it's a failure of the company".


>> No one can tell you how to build hadoop? Well, figure it out.

I get the impression that several people working on debian couldn't work this one out!


I think most people who use debian would tend to install things using debian packages, which in this case usually means adding cloudera to your apt sources list and using apt-get.

It is a pretty straightforward process:

http://www.cloudera.com/content/cloudera/en/documentation/cd...


I think the complaint was that it's difficult figuring out how to build Hadoop from source. That page you linked is how to install pre-built binaries, which you rightly point out is fairly trivial.


Sure, I agree that debian people would want to install debian packages.

What debian users/hackers/amateur admins like me really want is packages that are first class citizens, that the debian guys have picked up, sanitised, analysed and made part of the system.

I'll take software from the debian repos every time if I can. And it's pretty damning if people who are familiar with build systems and package creation can't figure it out!


Hadoop is insane. The elephant is fitting. Is it really the best choice, or has someone done something cleaner in golang or c++11?


> Is it really the best choice, or has someone done something cleaner in golang or c++11?

What does the language have to do with the program?

Hadoop is what it is because it's a complex problem with a fittingly complex solution. Simply re-writing it in your pet language won't somehow make it "better".


Go and modern C++ are both quite a bit more terse than Java. They also produce binaries which don't necessarily require a runtime to be available on every server (just ABI compatibility).

(I have no horse in this race, I am just writing what I think the grandparent comment was referring to)


> They also produce binaries which don't necessarily require a runtime to be available on every server

Just like Java[0]. It is just a matter of choosing the right compiler for the use case at hand.

[0] - http://www.excelsiorjet.com/ (one from many vendors)


Cool concept, I didn't realise this existed. Can you run Hadoop and friends under this? I've worked at companies with over 500 servers in a Hadoop cluster and literally never once heard about anything other than using Oracle's JRE aside from one proposal to use OpenJDK which was shot down pretty quickly.


I don't have experience with Hadoop.

Almost all commercial JVMs have some form of AOT or JIT caching, specially those that target embedded systems.

Sun never added support to the reference JVM for political reasons, as they would rather push for plain JIT.

Oracle is now finally thinking about adding support for it, with no official statement if it will make it into 9 or later.

JEP 197 is the start of those changes, http://openjdk.java.net/jeps/197

Oracle Labs also has SubstrateVM, which is an AOT compiler built with Graal and Truffle.


Way back in the day, GCC's gcj compiler would do AOT compilation of Java, however I believe it stopped being developed at jdk5 support.


If I am not mistaken most the developers abandoned the project to work on the Eclipse compiler and OpenJDK when those projects became available.

GCC only keeps gcj around due to its unit tests.


There's also things like exec4j which bundles everything including a JVM into an executable which one can just run... and things like AdvancedInstaller and Install4j will also allow one to bundle a JVM.

So producing a binary which doesn't require a separate runtime really isn't a problem.


Since you mention it, Java 8 brings bundling and installers support into the reference JDK.


C++ does usually require a runtime.


C++'s runtime is small and ubiquitous. Depending on how the software is written (if it allows disabling exceptions and rtti), it might be the same size as C's runtime, which is practically (but not totally) nonexistant.

I'm not an expert on Java, but my experience with it is that it's runtime is fairly huge and requires custom installation.


C++'s runtime is worse than Java's in that sense. Most JVMs can run most Java bytecode, but your libstdc++ has to be from the same version of the same compiler that your application was compiled with.


It was quite surprising for me the first time I did a little embedded work and discovered I couldn't run binaries that were compiled against glibc on my musl-libc based system, and vice-versa. I had initially thought they all just supported the same c89 spec so should work...


Yep. It's 99% ABI compatible, but that 1% will kill you.

For that matter, as you allude even C has a runtime.


Which C++ runtime is ubiquitous? I can think of at least 3 C++ runtimes (MS, libstdc++, libc++).


I spent an entire day last week attempting to build hadoop with LZO compression support. There are many outdated guides on the internet about how to do this, and I eventually gave up and spent a few hours getting the cloudera packages to install in a Dockerfile so I could reproduce my work later.

Figuring out which software packages I needed, how to modify my environment variables, which compiler to get, and where to put everything in the correct directory was the entire difficulty.

If it were written in Go instead of Java, I could have done `go get apache.org/hadoop` and it would have been done instead of giving up after hours of frustration.

Go has almost no new features that make it an interesting language from a programming language perspective. Go's win is that it makes the actual running of real software in production better. Hadoop's difficult is exactly why InfluxDB exists at all.


> If it were written in Go instead of Java, I could have done `go get apache.org/hadoop`

This complaint is just about packaging, and not the language itself. Any project can have good or back packing scripts, and for Java there are plenty of ways to make it "good".

Not to mention, the BUILDING.txt document clearly states they use maven[1] and to build you just do: mvn compile

> Go's win is that it makes the actual running of real software in production better

This might just be a familiarity issue, because once you launch the program, all things are equal.

And yes, you can bundle a JVM with your java app, which makes it exactly like GO's statically linked runtime and just as portable without any fuss.

[1] https://github.com/apache/hadoop/blob/trunk/BUILDING.txt


> no new features

Go gets us better performance and concurrency out of the box.


> Go gets us better performance

Than Java? At best, GO performs on par with Java, but is often measured 10-20% slower.[1][2][3]

This is usually attributed to the far more mature optimizing compiler in the JVM, which ultimately compiles bytecode down to native machine code, especially for hot paths. Java performance for long running applications is on par with C (one of the reasons it's a primary choice for very high performing applications such as HFT, Stock Exchanges, Banking, etc).

> concurrency out of the box.

Java absolutely supports concurrency "out of the box"...[4]

[1] http://zhen.org/blog/go-vs-java-decoding-billions-of-integer...

[2] http://stackoverflow.com/questions/20875341/why-golang-is-sl...

[3] http://www.reddit.com/r/golang/comments/2r1ybd/speed_of_go_c...

[4] http://docs.oracle.com/javase/7/docs/api/java/util/concurren...


Hell, if we look at real-world-ish applications, the techempower benchmarks show go at easily 50% slower than a bunch of different Java options.


>What does the language have to do with the program?

I happen to agree with you whole heartedly, if you spend enough time here though you'll see the inevitable comment about how anything made in php is worthless insecure garbage and anyone who spends their time developing a php application are amateurs at best.

This isn't really a comment at you, just wanting to point out how much that convention is challenged.


http://www.pachyderm.io is modern alternative.


Apache Spark is a good replacement for Hadoop now. It's written in Scala.


Spark is a good replacement for MapReduce. MapReduce != Hadoop.


Fair enough, but the original article was about Hadoop MapReduce wasn't it? It specifically says:

"without even using any of the HBaseGiraphFlumeCrunchPigHiveMahoutSolrSparkElasticsearch (or any other of the Apache chaos) mess yet."


Surely at minimum Hadoop developers could tell you!


Have you even been in a project where the developers didn't know how to build it? It's a strange situation, with huge environments being passed from one computer to another, and treasured with more care than the code itself.


This happened to me about a decade ago. A very smart sysadmin in the company created an acronis image for machine deployments. They very carefully documented everything they changed, and how to recreate it. Then someone else created an image from one of the imaged machines without documenting what they changed. This happened a couple dozen or so times until the image pretty much was a mess of hand installed binaries, configuration hacks, etc. It literally took another person 6 months to untwist what was actually on the machine by md5suming the crap out of everything guessing at versions until they found a match, and documenting it.

That sounds like the state of a lot of docker images.


Well fuck me. I just spent two weeks fiddling with Vagrant and Docker and finally got everything up and humming only to come into this thread. Going to refrain from slapping the SysAdmin title on myself for now.


Docker is awesome, but you shouldn't be using blind base images. Use Dockerfiles, they're self-documenting.


Unless you build your own base images... odds are you will be using something someone else built. Even the host OS probably wasn't compiled by you.

In general, my base images are often debian:wheezy, ubuntu:trusty or alpine:latest ... From here, a number of times I've tracked down the dockerfiles (usually in github) for a given image... for the most part, if the image is a default image, I've got a fair amount of trust in that (the build system is pretty sane in that regard)... though some bits aren't always as straight forward.

I learned a lot just from reading/tracing through the dockerfiles for iojs and mono ... What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager. I'm not certain it's nearly as big of a problem as people make it out to be (with exception to hadoop/java projects, which tend to be far more complicated than they should be).

golang's onbuild containers are really interesting. I've also been playing with building in one node container with build tools, then deploying the resulting node_modules + app into another more barebones container base.


Well, you have to trust something somewhere. Unless you're always compiling from source (which you can do with Docker), and you've read the source, etc.. but even then, you have to trust the compiler and the hardware.

Anyway, yes, you can make your own base images. But, images `should` be light enough where you can build them each iteration. I've done dev stacks where literally each `save/commit/run of a test` built the docker container from the dockerfile in the background! With the caching docker does it really doesn't add any overhead to the process.

> What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager.

Yup! Pretty much. Other than some config stuff for very specific use cases (VPN, whatever.)


A legend at one company about 5 years ago is that the company's next world-shaking product was being built partially with a single computer that was shipped around from office to office, because no one knew how to build the build environment again. Again this was circa 2010. :-)


I think more disconcerting is the rise of "sysadmins" who think they're qualified sysadmins because they know how to bash and docker.


This is hardly a new problem- and in many ways, I'm not sure it's a problem at all compared to the company cultural issues brought up by skywhopper.

Whether it's programming or system administration, you're always going to have new people getting excited about the sudden power they've learned. Being able to make computers do things opens up this whole new world, and when people find themselves in that world they may end up overestimating their skills and underestimating how much they need to grow. What they fail at understanding they make up with in enthusiasm, and with experience they become more knowledgable about what they don't know.

If we waited until they were "qualified" for jobs they would never get the experience to become qualified. At the same time there is more than enough room in the current job market to support people of lower skillsets, and for some companies that's considered an investment (junior people tend to turn to senior people over time).

This is where it becomes a company culture issue. If a company is smart they'll have a few senior people making sure things are held to the right standard, and a few junior people who can get things done but need some guidance and direction. However, lots of companies (especially the smaller ones who may be more constrained by budget) go for the cheaper route and would rather hire someone junior as their main support. The problem isn't that the sysadmins aren't qualified sysadmins, it's that they're junior system admins who have been hired for the wrong job. Companies that fail to value experience tend to suffer as a result.


I've found that there isn't an easy ramp into system admin from university -- most of the talent comes from dogmatic self learning in computer repair shops or subpar IT shops. All the good guys at $BIG_SOFTWARE_COMPANY seem to be in their 30s after putting in years doing /tedious/, but extremely useful, work for little pay.


My uni used student sysadmins to run hosting for Open Source projects. Great experience on production infrastructure without big dollars on the line when mistakes are made.

http://osuosl.org/about


Amen to that. When I see some of the job desc in job postings for DevOps/Sysadmin, I wonder. Is there really someone out there will all the skills that are asked for?


I'm reasonably certain there isnt - not for the payband offered.


Wanted:

3-5 years of linux system administration experience 3-5 years of windows 2000/2010 administration experience 3-5 years of networking level tcp/ip experience with custom protocols 3-5 years of c++ experience 3-5 years of .net experience 3-5 years of .....

I think more than half the job postings out there are created by entry/mid level hr persons who find similar job descriptions on other sites and copy paste requirements. This then has propagated into monster job descriptions you see now.

I noted this as well, for the pay these companies are offering, anyone with that level of experience they are asking for would laugh and move on. It's almost as if it's a trojan horse of a job post. Only those stupid enough to apply to a job post like that are the kinds of employees they are looking for.


As a hiring manager, it's very easy to filter these people out at the interview stage.

Being a system administrator requires a very specific personality type that has little to do with experience and more to do with attitude and critical thinking.

Sadly, people are right that startups are skipping past admins, thinking they're not needed anymore. Then later they need to hire one to clean up the giant mess.


It's really that easy. Pick your favorite software that happens to have broken SSL certs (such as RVM as of a few months ago), and tell them to install it. If they balk a the prospect of disabling SSL cert checking on the wget command, then they're worth their weight in gold.


most of the startups fail before any system cleanup is necessary


@skywhopper "it's a failure of the company or organization or of his own communication skills" <~ Oh man, ever had a rant from The Management like "we pay you to do what we say"? No one usually cares about communication skills of sysadmin. Yes, its a failure of organisation. Sad truth is - most organisations are failed. Sysadmin today is a marginal job at a small company, where people respect you, or a job in the medium or large company where he or she are just peons.


make is the least-auditable build tool imaginable. You don't have to obfuscate a Makefile, they come pre-obfuscated; you could put the "own me" commands right there in "plain" Make. Not to mention that it's often easier to tell whether a Java .class file is doing anything nefarious than whether a .c file is. How many sysadmins read the entire source of everything they install anyway?

Maven, on the contrary, is the biggest single source of signed packages around. Every package in maven central has a GPG signature - the exact same gold standard that Debian follows. The problems Debian faces with packaging Hadoop are largely of their own making; Debian was happy to integrate Perl/CPAN into apt, but somehow refuses to do the same with any other language.

> Instead of writing clean, modular architecture, everything these days morphs into a huge mess of interlocked dependencies. Last I checked, the Hadoop classpath was already over 100 jars. I bet it is now 150

That's exactly what clean modular architecture means. Small jars that do one thing well. They're all signed.

Bigtop is indeed terrible for security, but its target audience is people who want a one-stop build solution - not the kind of people who want to build everything themselves and carefully audit it. If you are someone who cares about security, the hadoop jars are right there with pgp signatures in the maven central repository, and the source is there if you want to build it.


Makefiles don't really enter into it and getting software signed by the developer isn't that valuable or useful.

The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system. They also have a great track record at it.

Maven does not do any of these things; Maven does nothing to protect the system administrator from a stupid developer, it just makes it easier for their code to breed and fester.

You must understand that the sysadmin has an enormous responsibility that is difficult for programmers to fully appreciate: You don't feel responsible for your bugs, you don't feel responsible for mistakes made by the developer of a library you use, and you certainly don't feel responsible for the behaviour of some other program on the same machine as your software, after all: Your program is sufficiently modular and scalable and even if it isn't, programming is hard, and every software has bugs.

But the sysadmin does feel responsible. He is responsible for the decisions you make, so if you seem to be making decisions that help him (like making it easy for you to get your software into debian) then he finds it easier to trust you. If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up), how or when you will communicate with remote hosts, how much bandwidth you'll use, and so on: That's what Maven is. It's a surprise box that encourages shotgun debugging and using ausearch features to do upgrades. Maven is a programmer-decision that causes a lot of sysadmins grief a few months to a few years after deployment, so it shouldn't surprise you to find that the seasoned sysadmin is hostile to it.


Debian has a terrible track record. Just look at the OpenSSL/Valgrind disaster. As a former upstream developer myself (on the Wine project), all Linux distros found unique ways to mangle and break our software but Debian and derived distros were by far the worst. We simply refused to do tech support for users who had installed Wine from their distribution the level of brokenness was so high.

You may feel that developers are some kind of loose cannons who don't care about quality and Debian is some kind of gold standard. From the other side of the fence, we do care about the quality of our software and Debian is a disaster zone in which people without sufficient competence routinely patch packages and break them. I specifically ask people not to package my software these days to avoid being sucked back into that world.

As a sysadmin you shouldn't even be running Maven. It's a build tool. The moment you're running it you're being a developer, not a sysadmin. If there are bugs or deficiencies in the software you're trying to run go talk to upstream and get them fixed, don't blame the build tool for not being Debian enough.


I don't know agree with that.

Debian feels like a distribution maintained by a bunch of sysadmins: People who have shit to do, and who understand that the purpose of a machine is to get stuff done, not to run some software.

A lot of sysadmins believe since they are responsible for the software, they need to be able to build stuff and fix some stuff themselves (i.e. it can't wait for upstream). In my experience, it's usually something stupid (like commenting out some logspam), but it's critical enough that I can imagine a lot of shops making it mandatory to ensure they can do this.

Really proactive sysadmins do try to run fuzzers and valgrind and do try to look for bugs rather than waiting for them to strike. And sometimes they get it completely wrong, as in the OpenSSL/Valgrind disaster, but they usually ask first[1].

Now I don't agree with everything Debian do, and I don't want to defend everything they do, either, but I think programmers in general need to get out a certain amount of humility when dealing with sysadmins: Because when these sysadmins say that they're not going to package hadoop because the hadoop build process is bullshit, it isn't appropriate to reply "well you guys fucked up openssl, so what do you know?"

One thing that would help is if we didn't look at it as Programmers on one side of the fence and Sysadmins on another side. Programmers have problems to solve, and sysadmins have problems to solve, and maybe you can help each other help solve each other's problems.

[1]: http://marc.info/?l=openssl-dev&m=114651085826293&w=2


I find it weird that you consider 'packaging' to be something a sysadmin should do, but 'building' to be something they should not do. Aren't they both forms of 'prepping code for use'?

And then state that you don't want your own software packaged. So, if a sysadmin is not allowed to build and not allowed to package, how are they supposed to get your code into production? "curl foo | sh"?


I don't consider packaging to be a sysadmin task. On any sane OS (i.e. anything not Linux/BSD), packaging is done by the upstream developers. That doesn't happen on Linux because of the culture of unstable APIs and general inconsistencies between distributions, but for my current app, I am providing DEBs and woe betide the distro developer who thinks it's a good idea to repackage things themselves ...


Well, we're going to have to agree to disagree there, because I think Windows packaging is fucking insane.

One of the things I loved about my move to linux and .deb land was that if I uninstalled something, I knew it was uninstalled. I didn't have to rely on the packager remembering to remove all their bits, or even remembering to include an uninstall option at all. Or rely on them not to do drive-by installs (which big names like Adobe still do, out in the open). And not have every significant program install it's own "phone home" mechanism to check for updates. The crapstorm that is Windows packaging is a fantastic example of a place where developers love and care for their own product, but care not a jot for how the system as a whole should go together.


I read that the other way around. He specifically asks people to refrain from packaging his software. I think the implicit implication is that he does want sysadmins to run his carefully constructed build scripts in order to install the application.


You really nailed it. Among my duties are systems administration for a company that works with a lot of software development vendors. We have a user acceptance team that makes sure that we get what we ordered, that the QC stays at a high level. So functional problems, that's their deal. But they're not sysadmins, they can't easily see what developer choices make administering the servers more complicated, more fragile, more expensive, or more insecure. This shifts my job from the end of the process (here, run this!) to the beginning (hey guys, let's use these tools instead, it'll make everyone's lives easier).

As such I'm very pro containers as they will eliminate a ton of deployment effort and allow me to manage different environments much more easily. But it means that there needs to be a much bigger magnifying glass on the container contents early in the process as opposed to the moment of deployment.


> The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system.

Maven has exactly the same capabilities as deb does - you can depend on versions, depend on a range of possible versions, exclude things that conflict and so forth. And it puts even more emphasis on fully reproducible builds (with the aid of the JVM) - in that respect it's closer to nix than apt.

> But the sysadmin does feel responsible. He is responsible for the decisions you make, so if you seem to be making decisions that help him (like making it easy for you to get your software into debian) then he finds it easier to trust you. If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up)

Wow, self-important much? Too many sysadmins seem to forget that the system exists to run the programs, not the other way around.

> If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up), how or when you will communicate with remote hosts, how much bandwidth you'll use, and so on

On the contrary, maven makes the requirements much simpler. I have literally one dependency, a JVM, so it can run on any host you like (no need to worry about shared library conflicts with some other application). It needs to download one file (shaded jar) from our internal repo, and execute one command to run it. That's it.

> That's what Maven is. It's a surprise box that encourages shotgun debugging and using ausearch features to do upgrades.

No, it's just the opposite. All the dependencies and project structure are right there in declarative XML. It's what make should have been.


> No, it's just the opposite. All the dependencies and project structure are right there in declarative XML. It's what make should have been.

When make was written most machines would have just exploded at the sight of a typical build.xml, and downloading tens or hundreds of packages from anywhere was simply out of the question.

Also, 'dependency' means something completeley different in make as opposed to maven - I don't think modern build systems do even care much for make-style deps.


> When make was written most machines would have just exploded at the sight of a typical build.xml, and downloading tens or hundreds of packages from anywhere was simply out of the question.

Sure. But the notion of doing things declaratively existed (Prolog predates make by five years). And the biggest difference between make and the scripts that preceded it is that it's more structured, with a graph of targets rather than just a list of commands.

If you add the ability to reuse libraries of targets (something that sort-of exists via implicit make rules), restrict targets to something a little more structured than random shell commands, and - yes - add the ability to fetch dependencies (including target definitions) from a repository, you end up with something very like maven.


> The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system. They also have a great track record at it.

The analog of the aspects of what Debian does that you're talking about here in the HStack world are companies like Cloudera, who, surprise, make their stuff available as debs and PPAs.

Building your own Hadoop from source and complaining that the resulting product is unvetted is sort of like doing a git pull of all the Linux dependencies and building that.


But the problem is that the ones doing the vetting (i.e. Debian) have given up on making a vettable distribution because the build is so broken.


But Debian isn't the Debian of Hadoop. Cloudera is.

Why should we assume the Debian Foundation is the sole trusted source of every type of software?


What you are saying makes no sense.

And yes, if I'm using Debian and didn't add any PPA or extra sources, then the Debian Foundation IS the sole trusted source of software. And you do that because you know hey won't fuck up the system, which (and that's the whole point of this thread) the others certainly don't.

Now Debian is telling you: we see now way to distribute this software and guarantee what you are getting or that it won't fuck up the system.

Do you think I'd consider installing that junk?


So everyone who runs Hadoop is installing junk? Seems like plenty of other companies have been able to build businesses on it without adhering to your Debian-only rules...


Signed packages isn't about just being signed.

I could sign anything I like, but that doesn't make it any more secure for you to curl it into /bin/bash.

Signatures are about who signs it, and that's not something mvn has solved at all. Mvn is a free-for-all of binary code that very well could own my system, wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

I'd trust foo.jar signed by debian over foo.jar signed by bobTheJavaBuilder@gmail.com anyday... and mvn only gives you the latter.

So yeah, sure, they're signed, but it doesn't actually matter if you don't take the time to hook into the chain of trust (and believe me, mvn does not ask you to trust your transitive jar dependency 50 down the line) or have a trusted third party (debian) do their own validations.


> wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

And not only that, by shipping the source and requiring that binaries can be built from the source, who signs it is no longer blind trust. Others can audit it.

Reproducible builds should improve this even further.


> wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

This comes with a huge tradeoff, and I guess it's that tradeoff that makes developers like myself opt to sometimes even pipe the cURL to bash. I almost never download any software I actually plan to use through official system repositories, because whatever comes out of apt-get, it's almost always two years behind the last release and missing half the features I need. Sure, I'll apt-get install that libfoo-dev dependency, because I don't care what version it is as long as it's from the last decade. But for any application I actually need to use, it's either git repo or official binary download.


> whatever comes out of apt-get, it's almost always two years behind the last release and missing half the features I need

As a sysadmin, I love that, but I've had to come to terms with the fact that some developers have the attention span of hummingbirds who had Cap'n Crunch for breakfast ("two years old" is still very new software from an administration perspective).

So, I've basically accepted the fact that whatever stack the developers use I'll build and maintain directly from upstream -- with the price being that the version and extensions/whatever used are frozen from the moment a given project starts.


"Small jars that do one thing well. "

Oh, the "unix" philosophy.


Who can't read a Makefile? Who can't at least read the output of make -n? It's terrifying to me that you're suggesting that people can't and don't.

It's not even a security thing. I've had poorly-written Makefiles that would have blown things away thanks to an unset variable on a certain platform, for example.


> Who can't read a Makefile? Who can't at least read the output of make -n? It's terrifying to me that you're suggesting that people can't and don't.

Can I read a Makefile? Sure. But 90%+ of Makefiles these days are 12000 line automatically generated monstrosities. It's not worth my time to bother opening the Makefile in a text editor in case it isn't, and I'd be amazed if many people did.

make -n you can do I guess. But unless you're also auditing all the source code I'm not sure there's a lot of value in it.

> It's not even a security thing. I've had poorly-written Makefiles that would have blown things away thanks to an unset variable on a certain platform, for example.

Yep. Maven doesn't do that.


At my last job we used a micro-service architecture on AWS (EC2 and RDS). Using Ansible playbooks for various types of servers and roles for each service, we created a new server instance every deploy. All servers were running FreeBSD and using daemontools to control services. For testing, hotfixes, and manual checking of logs, it was easy to complement with manual ssh. Save old and new instance in case something goes wrong. Ansible is just a thin layer on top of shell scripts, and reasonably straightforward to understand and parameterize. Worked wonderful in most cases (possible exception of build server because of shared libraries and a complex workflow with git pull/trigger, but I don't think that was the fault of the overall architecture).

That said, I agree that sbt is an abomination and doesn't lend itself to a sane and secure workflow, unfortunately.

http://martinfowler.com/bliki/PhoenixServer.html

http://www.ansible.com/


I also learned from a place with good practices, which used daemontools to run services and a custom deploy system in bash and python which actually did the right kinds of things. (and I was fully-manually admin-ing my own linux systems for years before that)

As an early employee at a new place, I'm now using ansible and docker because nowadays people want to use that stuff, and it is a lot faster to get started with than writing a new proper deploy system from scratch.

But I build all the docker images we use and version them with the date. I also don't use ansible roles from ansible-galaxy, and I don't even organize tasks into roles, just into task-include files. Our ansible tasks use bash helper scripts wherever necessary to do things the right way, because the built-in modules are often too granular / not connected enough to fully check state. I also replaced the docker plugin for ansible, to manage state a bit better. So overall it's not too bad.

I guess my point is that, having done it all from-scratch first, using some of the modern automation stuff isn't too bad. But you have to know what not to use. People new to "devops", using all the fancy stuff now available, who didn't have the introduction I did... it's not surprising they end up in a mess and don't even recognize it.


This is the key. The OP's major complaint is with prebuilt containers from potentially untrustworthy sources, but he passes this off as a fundamental problem with containers themselves.

The reality is that you can (and probably should) build your own container rather than using a public one from docker hub. You know exactly what is in it, and can trust it completely.


in reality a dev will pass a prebuilt and non updatable container to the sysadmin tho. so the op is exactly right! it doesnt matter where its coming from if you cant verify,rebuild or update it.


sbt is an abomination, but unless you're in the library business you can just use maven, which is wonderful.


I love a good rant as much as the next guy, but unfortunately, rants are rarely actionable.

> Maven, ivy and sbt are the go-to tools for having your system download unsigned binary data from the internet and run it on your computer.

The root of the problem is that out of the total number of libraries available in language X, only a small subset is packaged in Debian/RHEL. This may be more egregious with large, Java enterprisy software, but you could easily end up with the same problem in Ruby or Python.

You cannot reasonably expect developers to package and maintain all their dependencies properly. The least worse solution would be to:

- still use maven to manage dependencies

- create a Debian/RHEL package incorporating the dependencies (effectively vendoring them in the package)

Unfortunately, it is not that simple, because you need to make sure that your vendored-in-the-package dependencies are somewhere where they will not conflict with another package with the same idea and the same dependencies (or better, the same idea and a different version of the same dependencies). Which means you need to keep them out of /usr/share/java and make sure the classpath points at the right location.

However, it seems that developer tend to avoid this kind of rigmarole and instead go for the "install dependencies as a local user" for certain classes of application (eg, webapps) because packaging is not fun.


Maven packages aren't actually unsigned either. They're downloaded over SSL and to get into Maven Central you need to sign with a GPG key.

The problem is that you normally cannot find a path to the developers through the web of trust, of course, but that's not Maven's fault. That's the fault of the web of trust (more accurately called "handful of strings of trust").

Debian/Red Hat code signing doesn't prove very much either. All it proves is that the package came from Debian or Red Hat. But did they modify the software along the way, doing some kind of MITM attack on the upstream developers? Quite possibly! At least with Maven you don't have that problem.


The problem is that you normally cannot find a path to the developers through the web of trust, of course, but that's not Maven's fault.

It is Maven's vault. Debian could have taken the same approach, but instead they have decided to vet prospective developers for a few years, and use one signing key for package indexes.

As a result, in Debian:

- It is feasible to actually check packages, because you don't need a large WoT.

- You know that the maintainer of a package was a least vetted for some time before they could put anything in the archive.

With Maven:

- It's infeasible to check the signatures, since nearly no one has a complete enough WoT.

- Anyone can submit a library, no vetting necessary.

- Maven does not check signatures by default.

So, basically the only thing you can do in Maven is hand-pick a small set of libraries. Verify them in some way and stick to specific versions, putting their checksums in your POM. Unfortunately, as the article already touches upon, too many Java libraries pull in half of the internet, so it may not be very practical.


> It is Maven's vault. Debian could have taken the same approach, but instead they have decided to vet prospective developers for a few years, and use one signing key for package indexes.

Yes. But this creates a much higher barrier to entry. As usual, it is a trade-off, but it is obvious that with a Debian-like system, the number of packages on Maven/CPAN/Pypi/Rubygems would be considerably smaller.


> You cannot reasonably expect developers to package and maintain all their dependencies properly.

I think that this is a good point, but it all comes down to quality control.

You wouldn't accept a new dependency into your project if it is buggy or has a bad API.

So why is bad packaging, a hacked-up build system or inability to build from an auditable source considered acceptable in many communities today?


I don't think that's acceptable, but ranting and ignoring the underlying issue doesn't help.


> You cannot reasonably expect developers to package and maintain all their dependencies properly.

Why not? This is exactly what developers are expected to do. Every developer must manage dependencies for their application to work.

What you mean is they can't be reasonably expected to do it well.

In most situations, this is truly a trivial amount of investment relative to the overall cost of developing and maintaining an application.

There are ecosystems which make the task easier or more difficult/annoying, but that cost should be accounted for when choosing which development platform to use.


> In most situations, this is truly a trivial amount of investment relative to the overall cost of developing and maintaining an application.

I wouldn't say that creating and owning packaging for, say 50 libraries (assuming you only deploy to a single platform) represents a "trivial amount of investment" for a standard small developer team.


Honestly, if a particular development environment for a project required tracking 50 libraries not supported by any distro community and my development team was so small that I couldn't do that in a sane manner...

I might reevaluate the suitability of that particular language for the project.

On a side, it only takes marginally more time to build a package for a distro as it does to build the package by itself. If you've already got the build done, 3 experienced guys could knock out 50 packages in a week. Inexperienced (in this task), but competent, devs should be able to do it in 2 weeks.


    > You cannot reasonably expect developers to package and 
    > maintain all their dependencies properly.
What? With appropriate tooling, of course you can.


I don't know any tooling that turns (recursively) a Maven pom file into a Debian repository of Debian policy-abiding .debs, which are magically updated when the pom file changes.


If you mean Debian policy-abiding in the sense of "signed by a Debian developer in the Debian WoT" then no, but that's not something you could ever do automatically. But for the rest, I've done all the individual pieces before: it is trivial to generate a .deb from a maven pom and put it in a debian repository, it's trivial to do some operation on all the dependencies of a maven project, and it's trivial to hook something into a maven repository to happen whenever a new artifact is uploaded. You absolutely could do this if you wanted to, and it wouldn't be more than a few days' work.

(Of course it wouldn't provide any value, which is why no-one does it).


If they are not public, that's obviously not the same thing, though you're still going through a lot more complexity than "here is this .war, put it on the server and reload the webapp".


"Stack is the new term for "I have no idea what I'm actually using"." - this made my day!


Same for "framework" which is: I have no idea what I'm doing


Sometimes yes, but sometimes you started writing CGIs in C, then Perl, than you wrote your microframework, then you decided to use a standard one. This has been my evolution and even if I don't understand everything inside the frameworks I'm using now I have a general idea. And furthermore, what can we do about it? Writing code from scratch or maintaining or own frameworks is more or less the way to losing customers, unless you are a Facebook and you call engineer and push a React.


If more people went through that process, the frameworks we have might be fewer and of better quality.


Judging by the sheer number of frameworks we have, everyone who went through that process actually ended up writing one, or five.


That sounds like where I am coming from. Looking at Django questions on Stack Overflow, a lot of people don't.


Thank you for this. I hate the "frameworks are for people who don't know what they're doing!" meme. Sometimes they're just for people who have thought about the trade-offs and decided a popular framework has many advantages.


Same goes for 'abstraction', it hides the essence of what is happening. Therefore every abstraction is evil.


Hopefully this is sarcasm. Code without abstraction can also very efficiently hide what is happening by having a disastrous signal-to-noise ratio, combined with all the potential for errors you get when repeating the same pattern many times.


Code is abstraction. With no abstraction you have to build your systems with lots of nand gates and a clock.


Abstractions are necessary to write software, unless you speak binary code, so calling them evil is a bit hyperbolic. My point is at some level you have to trust the abstractions of a system or else nothing would get done. That doesn't mean you shouldn't have a conceptual understanding of the lower levels, but they aren't evil!


Its possible to use abstractions that aren't inversions or leaky. Practically no one does a good job of it, so you are correct in practice and experience, although in theory it is possible and sometimes people do pull it off successfully.

http://en.wikipedia.org/wiki/Abstraction_inversion

http://en.wikipedia.org/wiki/Leaky_abstraction


I think you should strive to understand what is happening under the abstraction, but abstraction is a useful tool. It's a bit like calling something a "crutch". Sounds bad, but what if you have a broken leg? Crutches allow you to get over the problem and make progress. I have to write software that runs on any hardware from any number of vendors and multiple operating systems. Not using abstractions would destroy my effectiveness.


How are people missing the sarcasm of this comment?


Programming languages are for lazy people who don't like flipping switches to code aseembly by hand.


"Framework" at least pretty reliably means some form of code generation is going on.


I was doing sysadmin the "right way" a long, long time ago, and I don't see much difference. Maybe the author regularly does full audits of the source code of every package he downloads, and of course disassembles every executable and library in the underlying OS, but most of us don't. There's no wisdom or security to be gained from the act of running "make", much less "make install".


>> Maybe the author regularly does full audits of the source code of every package he downloads, and of course disassembles every executable and library in the underlying OS, but most of us don't.

This is not the point being made. Trust is often offloaded, say to the debian people, but it is present in most modern linux systems as a basic part of the setup.

>> There's no wisdom or security to be gained from the act of running "make", much less "make install".

'make' is brought up not because everyone should be running "make" or "make install", but because it's a standard and it is understood by many people. It's brought up in the context of hadoop because the hadoop build system appears to be just so complicated and non-standard, including pulling in untrusted sources from all over the place, that it is near impossible to set it up as a well-audited, standardised package.

Given this, it is likely to be a hive of vulnerabilities, either during the setup phase (if any of the third party servers gets compromised or MITM'd) or during deployment (that java VM it pulled in during setup is never going to get patched).


There's a huge difference.

> Maybe the author regularly does full audits of the source code of every package he downloads, and of course disassembles every executable and library in the underlying OS, but most of us don't.

What matters is that the source code is auditable. It only takes one person to investigate something suspicious, raise a flag and get it fixed.

This is certainly still true for Debian - not being able to build from source is considered a release blocking bug.


Assuming you:

- trust your compiler and linker

- trust your tar extractor / package manager / whatever

- trust your editor

- trust your http library (or whatever you used to download/distribute the code)


It comes down to trusting two key things. Trusting your initial distribution download (which contains the package signing keys), and trusting the toolchain (in a Reflections on Trusting Trust way).

But the deeper you go, the harder it is for malicious code to reside there. In theory it's possible, but in practice I'd like someone to show me some code somebody could have written into the toolchain a decade ago, without hindsight, which could still exist today.

Whichever way, it's clearly far tougher for a malicious actor to compromise a system by injecting something into a distribution ecosystem than it is to inject a signed-by-unknown-reputation binary-only package into the Maven ecosystem.


>> - trust your http library (or whatever you used to download/distribute the code)

This can be overcome with signing.

We're all well aware of how deep this rabbit-hole goes, however that doesn't mean that it's a good idea to throw all trust away.


This is where reproducible builds [0] come in. We can trust our binaries much more if the same build inputs yield the same build output.

Building on that, could we find a fixed point where the OS builds the exact same bootstrap binaries that were used to bootstrap the OS to begin with? [1] That would give us even more confidence that the binaries we're using are as they should be. Interesting place for experimentation.

[0] https://reproducible.debian.net/reproducible.html [1] https://gnu.org/software/guix/manual/html_node/Bootstrapping...


Trust isn't binary. It's a scale of weighted risks.


> There's no wisdom or security to be gained from the act of running "make", much less "make install".

You were not doing sysadmin right. DESTDIR and checkinstall are two vital tools that are learned from the mistakes of make/make install.


The contract between operations and dev (as concepts, not as people) is in need of renewal.

To my mind, that was what "devops" was supposed to be, but it's been a bit of a dogpile in the years since the term gained popularity.

Systems are opaque to most developers, and many developers wish to make their software opaque to the system on which it runs. This is a failure on behalf of our entire profession, not any one group.

Infrastructure software is in a bit of a renaissance period, but it's very early days. Packaging software is a total mystery to most developers. I don't even need to back that up with examples, most of us can recall the last time we can across a well packaged piece of software with joy due to sheer rarity. I'd be very surprised to find the average age of a Debian maintainer was trending anything but upwards, and steeply.

Containers are being misused, but that's because the alternatives we've been building for ourselves have not kept up with the strong user experience narrative of web and mobile software.

We need to do better.


It would be nice to have a well known 'devops manifesto'. I google'd it and came across this: https://sites.google.com/a/jezhumble.net/devops-manifesto/. Which I think is actually pretty decent - the emphasis on cross functional product teams, for instance.

In my mind, that is largely what devops is about - team ownership of the entire product, which includes infrastructure. Instead of having a silo'd 'ops' team writing ansible scripts and doing deployment, this should be part of the team (which could mean having an opsy guy on the team).

Anyways, as it pertains to containers, I think containers are more a practice than a principle. It tends to happen naturally when you want reproducible builds and continuous delivery. It's not really about making systems opaque to software, imo, but rather making your product artifacts reproducible (if you rely on running ./configure; make at deploy time, you never know what you'll end up with since dependencies are dynamically determined).



This 1 page poorly titled wrong rant is the #2 story on this site?

"Ever tried to security update a container?" lol. you are doing it wrong.

"Essentially, the Docker approach boils down to downloading an unsigned binary, running it, and hoping it doesn't contain any backdoor into your companies network." nope https://blog.docker.com/2014/10/docker-1-3-signed-images-pro...

"»Docker is the new 'curl | sudo bash'«" no it's not. most intelligent companies are building their own images from scratch.

People that care about what's in their stack take the time to understand what's in there & how to build things.


I think you're wrong. I think most users are not installing trusted builds from their OS vendors. Piping curl to bash is incredibly common--many popular software packagers are doing it [1].

About a year and a half ago, I was playing around with Docker and made a build of memcached for my local environment and uploaded it to the registry [2] and then forgot all about it. Fast-forward to me writing this post and checking on it: 12 people have downloaded this! Who? I have no idea. It doesn't even have a proper description, but people tried it out and presumably ran it. It wasn't a malicious build but it certainly could have been. I'm sure that it would have hundreds of downloads if I had taken the time to make a legit-sounding description with b.s. promises of some special optimization or security hardening.

The state of software packaging in 2015 is truly dreadful. We spent most of the 2000's improving packaging technology to the point where we had safe, reliable tools that were easy for most folks to use. Here in the 2010's, software authors have rejected these toolsets in favor of bespoke, "kustom" installation tools and hacks. I just don't get it. Have people not heard of fpm [3]?

[1] http://output.chrissnell.com/post/69023793377/stop-piping-cu...

[2] https://registry.hub.docker.com/u/chrissnell/memcached/

[3] https://github.com/jordansissel/fpm


It appears this was finally changed mid-March, but after initial release in December image signing initially worked as follows:

Docker’s report that a downloaded image is “verified” is based solely on the presence of a signed manifest, and Docker never verifies the image checksum from the manifest. An attacker could provide any image alongside a signed manifest.

https://news.ycombinator.com/item?id=8788770

https://titanous.com/posts/docker-insecurity

https://github.com/docker/docker/issues/9719

edit: add hn discussion, github issue.


So much truth in this.

We've been doing some work with Elastic Beanstalk lately, and - while it certainly does one or two things that are extremely clever and useful - in the end it just feels like this bizarre mix of complete magic and incredibly convoluted arcana. Everything feels very out of our control and locks us into an ecosystem that considerably limits our choices and flexibility (unless we invest the time in becoming experts in EB, which really isn't particularly something we have the time for). And, as the author of this post says, the security ramifications, while orthogonal, are also deeply troubling.


    This rant is about containers, prebuilt VMs, and the incredible mess they cause because their concept lacks notions of "trust" and "upgrades".
Prebuild VMs? Sure, I wouldn't touch them except for evaluating a project, and for commercial software you may not have a choice.

But docker containers at least usually provide a dockerfile that describes exactly how a binary image is built. You just clone the source repo, audit the few lines of build commands and then build your own private registry. It's nearly no more trust than following the instructions of README or INSTALL. Just because fools are pulling down pre-build images and running in their datacentre, doesn't mean that's what way you should do it. And the problem with 'old-school' sysadmins is they are often far too quick to reject new practices, citing tired excuses based on misunderstandings of the technologies.

    Ever tried to security update a container?
Yeah I have. It's easy if you have already built your 'stack' to scale horizontally (which means you have at least 2 or more of everything in a HA or LB config). You rebuild against a fully patched base-OS container, spin-up, send some test load to it & validate, then bring into service. Repeat for rest of nodes at that tier.

If you are trying to be an old-school sysadmin that expects to console or SSH in and run 'yum upgrade' or 'apt-get upgrade' your containers then you are doing containers wrong...


then you are doing containers wrong...

The old-school sysadmins I know scoff at Docker's idea of 'containers'. Linux containers were already a thing, and don't need an entire copy of an OS ported around with them. To them, containers are a way of enveloping a process to limit it, not a way of distributing packaged software. They may or may not be doing 'docker' right, but they certainly know what 'linux containers' are.


Well I should have qualified it with 'docker containers'. But yeah, those that have been around long enough in Linux container land have all dealt with vserver, openvz, lxc, etc, and all of those carried around this 'entire copy' of an OS, per container (ignoring vserver's vhashify). Docker helps you to spin up N containers running all sorts of applications based on the single master image.

Docker, whether your view is good or bad, brings something more than just another container implementation to the table...


> Linux containers were already a thing, and don't need an entire copy of an OS ported around with them.

Neither do Docker containers. You can build off scratch and put the literal bare minimum you need in it. I've done it a few different times. It's rarely done because the time and effort almost never makes up for the complexity and cost, but if your old-school sysadmins are scoffing it's on them.


Response to exactly that idea from one of these guys: "Why do you need docker to just build an executable?"


Deterministic builds. Isolating shitty build scripts from my system.


I don't. But it helps make a straightforward deployable of highly coupled libraries and tools in a way that's more comprehensible to other people.

But I'll get off your lawn now.


Ain't my lawn; I consider myself a mid-range sysadmin. I stepped out of support and into sysadmin land about 4 years ago. But I know some 'from-the-birth-of-linux' guys, who live and breathe this stuff in a way I never will. When I get home from staring at terminals, I want to watch movies and play video games, not swear at something on a breadboard :)


> If you are trying to be an old-school sysadmin that expects to console or SSH in and run 'yum upgrade' or 'apt-get upgrade' your containers then you are doing containers wrong...

A container is a chroot environment for running a service. Basically. It's perfectly possible to 'yum upgrade' or 'apt-get upgrade' them.

I think what you're trying to say is this is a bad idea because containers are not supposed to be managed individually. The magical fluffy dream of Docker is to never have to manage another individual OS again; just make one image, then kick it out to all your machines and make them use the new image. No resolving dependencies, no verifying individual file checksums, no upgrading or downgrading or conflicting different package versions for every server and service. Just do it once, and then push it out everywhere, and everything magically works.

Right?

Here's the thing: Containers don't do away with the idea that a developer might use a totally bleeding-edge piece of software to update one of your many apps, and that you may end up with multiple incompatible versions of software on your systems. In fact, Docker kind of trades on that as a feature. "Install any shit software you want and it'll never conflict with other containers!" But this lie is shown for what it is once you start looking at containers as individual physical machines.

Back in the day we had to 'yum install some-specific-architecture-and-version-of-this-package' on a particular machine to make that machine serve that software. You do the same with Docker, because you have a particular Docker container with a specific version of the package, and all the other packages and OS requirements in that container. You still have a one-off machine to install, maintain and troubleshoot. The only differences are it isn't physical anymore, and you perform updates on the image, not on the machine.

Just like you ended up with a machine (or three machines) with three different versions of BDB, you end up with three containers with three different versions of BDB. Instead of using 'apt' or 'yum' to install them, you write their Dockerfiles and build them, test them, then roll out their updated images. You do a lot more from scratch now because a Linux distro hasn't done the work for you, and so you also run into all the headaches that someone packaging software for a Linux distro usually runs into.

One of the worst things about the 'new devops way' seems to be the non-reproducibility of things like Dockerfiles. To build a Docker image we slap together whatever bleeding-edge files we had on date X-Y-Z, and whatever comes out at the end is considered 'production'. Ignoring the patches, the quality control, the stable released software distributed by distributions on reliable mirrors, and generally without tuning the software at all for the particular system you're running it on. Try to build that Dockerfile again in a year and suddenly it doesn't work the same as it used to, or a bug magically appears on your production system, or you have to apply a patch and now you need to unroll all the commands used for individual stages of the build process for a single package and figure out how to make it still work like the original was built, etc.

Containers by themselves are fine things to use. The problem is how they've effectively sold a lie to everyone that uses them: that there is no sysadmin work to be done. That don't worry, Mr. Javascript Dev, you too can build infrastructure and deploy it without learning the many lessons and best practices of an industry that has been here longer than you've been alive.

This wouldn't even be far from the truth if it was, say, a RedHat-built set of container images, or a Debian-built set of container images. Then at least there'd be an expert who's building software in a reliable uniform way and along a particular standard. They would provide you the software updates, and even provide you with tools and instructions on how to use them to manage your whole software infrastructure without having to write software yourself.

In the age of 'everyone should learn to code', everything can be fixed by writing more code, and copy-and-pasting binaries built on a developer's desktop counts as production deployment. [To be fair, developers were doing this 10 years ago with Java apps, and we hated it]


I corrected myself for the other person to mean '[..] upgrade your docker containers'. I heavily use LXC containers (and had used openvz and vserver before that) and treat them as individual servers.

All your points about bad sysadmin practices are OS & container agnostic - they can happen on any platform, don't drag Docker into it. Sure there is a culture of 'docker run somebinaryimage [..]' but those people are the ones that do "curl | sudo bash" as well.

Your claim about non-reproducibility of Dockerfiles is bogus. The result of a Dockerfile build gives you precisely the reproducibility you desire. Every time you run a container from that image built from a Dockerfile, you'll get the same filesystem & environment.

Docker 1.6's "Content Addressable Image Identifiers" addresses your build in a year concern by allowing dockerfiles to refer to a digest to ensure you are building against exactly the image you expect (rather than the result of some build process that yum -y upgrades etc, which I think is what you were getting at).


docker can totally be dragged into this. they keep selling the lie and encouraging most terrible practices and design. talk about self inflicted...


> Every time you run a container from that image built from a Dockerfile, you'll get the same filesystem & environment

The image (and thus container) are the same. Trying to rebuild it from scratch leaves it not the same, typically because the way people put together Dockerfiles and build images does not follow a standard. And it has to do with the container culture.

Let's take a Dockerfile for the CentOS 7 version of nginx (https://github.com/CentOS/CentOS-Dockerfiles/blob/master/ngi...)

  FROM centos:centos7
  MAINTAINER The CentOS Project <cloud-ops@centos.org>
  
  RUN yum -y update; yum clean all
Right off the bat I think: what the hell? Why are they doing a yum update? A yum update today may very well leave the system in a completely different state than a year ago. There's likely been some package updated in that time, which changes the state of the system. Right off the bat we're screwed.

  RUN yum -y install epel-release tar ; yum clean all
  RUN yum -y install nginx ; yum clean all
And what the hell is this?! There's no version, no build, no checksum. What the hell did we just install? If those packages change in a year, we're screwed. Not to mention 'epel-release' and 'tar' should be set as dependencies somewhere, and the type of dependencies too.

  ADD nginx.conf /etc/nginx/nginx.conf
  RUN echo "daemon off;" >> /etc/nginx/nginx.conf
Oh, cool, just use whatever the hell this config is which may or may not be different from the one that shipped with the original package. And let's just modify it for no apparent reason, too. And definitely make sure we have no way to know what version of that file we're using for this image. Lovely.

  RUN curl https://git.centos.org/sources/httpd/c7/acf5cccf4afaecf3afeb18c50ae59fd5c6504910 \
      | tar -xz -C /usr/share/nginx/html \
      --strip-components=1
The hell? We're pulling some random git sources from a git server which i'm willing to bet has no standard mirrors? On closer inspection it doesn't really look like a git server, but a host named git which hosts files whose names are a checksum, though we don't know where this is from or what exactly it refers to. If this server or file disappears, good luck knowing what in hell was being downloaded here.

  RUN sed -i -e 's/Apache/nginx/g' -e '/apache_pb.gif/d' \ 
      /usr/share/nginx/html/index.html
  
  EXPOSE 80
  
  CMD [ "/usr/sbin/nginx" ]


As a Java / Hadoop / Spark / Scala fan, all I can say is, it's a little embarrassing, not sure how the Java ecosystem around hadoop became so sloppy (I witness it first hand on a daily basis). I wish more people who are concerned with security / ease of build would turn into contributing to maven, sbt, ivy and the hadoop project. Instead of hating the Java ecosystem, why not join it and make it better? Hadoop is ubiquitous, maven (and ivy / sbt) are the de facto dependency management and built tools for that ecosystem, and if it's broken, (or alienating people who are used to just have make / rpm / deb for anything) then those people should join and try to make it better.

Whether you like java / maven / ivy / sbt or not, good chances you'll end up forced to work with Hadoop (Java) or Spark (Scala), both of which use maven / sbt for dependency and build.

I say, It's all open source, if it's broken, and you know where it's broken, I think the Hadoop / Java community will be happy to get suggestions / pull requests to improve it.


In theory you are right, unfortunately at least the hadoop "community" is difficult to work with. Hundreds of JIRAs with patches in limbo state for month/years b/c nobody of the paid developers at cloudera/horton bothers to take a look. Also the political things happening behind the scenes are way more complex than you might think. It is frustrating...


I for one like java/scala but am quite wary of maven&co.

All I need is a build tool. I can download a bunch of jar dependencies myself if needed. In the "old days" we just put them into the CVS repository or in a tarball to download along with the source and an ant script to build the whole thing.

Granted, this did lead to outdated libraries at times and thus potentially was a security threat in itself. But at least it didn't roll the concerns of obtaining dependencies and building into a single tool. Often it's nice to just get the dependencies and build the project your own way, e.g. in your IDE of choice. Or vice versa, obtain the depdencies on your own (e.g. plugging in a different version than recommended) and then use the automated build.

I think those things should be provided by separate, simple tools.


Why add all the missing pieces to the other tools (maven, etc) when the OS has the tool (rpm, deb, etc) with all these pieces already?

I think that is the reaction that most people have when they see maven or similar tools.


It seems rather easy and effective, for NSA-like agencies, to hide crude exploits in complex projects. An unintended effect of Snowden's whistle blowing is that it has become easier, because it has let them know that they don't need plausible deniability requirement anymore.

Until Snowden, they were very cautious not to be caught, because, you know, what might happen if public opinion knew what a bunch of crooks they were? Now, they know that public opinion doesn't really care, and that if they're caught, they can mostly shrug it off, with politicians' complicity.

So, shoving a rather crude and detectable exploit in a messy product has become practically doable. If I were in charge of subsidies distribution for some 3-letters agency, I'd pour more money on Docker, Maven etc. than on TLS.



Every once in a while someone figures out that we could entirely solve the dependency problem by packaging all the dependencies with the application. Everyone gets excited. After a while everyone gets unexcited when the problems associated with this approach become obvious.

Docker is merely a more extreme example of the "package everything with the application" idea...


I'll bite. What's obviously the problem about it? Or, "if it's good enough for Google...."

Vendoring dependencies and static linking is quite popular in executables, not just docker. Dynamic linking and shared libraries seem to be becoming a relic, deservedly.

BTW, the extreme example of "package everything with the app" is the unikernel movement.


An interesting point that I didn't see the author bring up is the concept of how Docker images can be built in a layered fashion, and the potential for a false sense of security.

For example, you start with some sort of base image -- say phusion/baseimage-docker[1] -- and proceed to layer your application on top of it. You "trust" Phusion. They do Phusion Passenger, it's a real piece of software you heard of, and it's not some random person on the internet.

At some point, there's a bug, a problem, a security flaw, and you're waiting on them to fix it... nothing, nothing. Maybe they get hacked and their base image is now infected. I haven't bothered to look, but I'm guessing it would be a trivial amount of work to start the process of culling the most popular base images used by public Dockerflles, looking for the biggest trojan horse.

It seems like the whole model is ripe for pushing an understanding of what is actually running on a machine -- soup to nuts -- to the way side, and establishing a non-existent trust on the building blocks you're using, lulling people into a false sense of security about their containers. A lot of people already believe that they're already doing something much more secure by running containers, and arguably, they are... except for all of the places where malicious software can be added in, and the potential container breakout techniques.

[1] https://github.com/phusion/baseimage-docker


FWIW you can easily recreate a base image by just copy/pasting the Dockerfile for that image at the top of your own.

I did this for the Jruby images we base our stack on.

I've been doing both dev and ops work for nearly a decade. I feel for what the guy is saying, but these aren't tech problems, they're process problems.

Relying on apt packages for everything makes using more recent features ridiculously hard and slows up the works in pushing features out. I'll trade a little security to be more nimble. I say that because as someone who's worn the hats of operations, development, and co-founder, I realize that you can't have it all. There simply isn't enough time and bandwidth in most companies.


Sure, and that's all reasonable stuff. I mostly posted this because while encouraging people to use wildly insecure installation processes like 'curl ... | sudo bash' is terrible, it's easily recognized as being terrible. To me, the Docker ethos is, perhaps, deceptively bad in terms of security. Deceptive enough that it it can lull people into a false sense of security, etc etc.

I mean, we'll see if it happens. My fears might be entirely unfounded, or phusion/baseimage-docker might get trojaned. Who knows. :P


System administration is as important as ever. Docker and other containers just simplify system administration across many different machines. The standard Unix user land tools are excellent and very flexible, but they are fucking god awful at configuration management. Docker solves the problem of "how do I make sure I have the same versions and configurations of everything on all 500 of my compute nodes without having to lock them down completely?" This question is meaningless if your base system image sucks, so you still need a proper sysadmin to build your Docker images.

Many places are rolling this type of sysadmin work up into DevOps. This scares graybeard sysadmins, because they see DevOps automating them out of a job. What they fail to see is that DevOps is a step up for them: it's an explicit admission that system administration is as important as software development, and needs to be integrated into the software development process and managed through whatever management processes and tools the core dev team uses.

The ultimate driver behind this is a shift in the way technology organizations are managed. A few years ago, you would have functional silos: development, operations, product, etc. that would all contribute to one or more products. Employees reported up through the functional lead, and incentives were doled out based on cost effectiveness. This didn't work well. So what started happening is that engineering executives began building product-focused silos instead. A development manager is no longer in charge of just software developers, but also QA, scalability and deployment. If the operations folks fuck up the deployment, the development manager gets chewed out about it. So the dev manager is going to bring as much of that under her control as she can.

Docker/Maven/etc. are the abstraction layer between the teams that manage the infrastructure (physical servers, VMWare pools, storage, network, etc) and the teams that manage the applications. This is no excuse for bad sysadmin practices; you still need good sysadmins in the DevOps role. But here's the kicker: DevOps often pays more than system administration! And if you're a SME in a very specific thing (say, Cassandra administration) you can be in a support role across a number of different teams, making sure their DevOps folks deploy Cassandra in a sane way.

(Yes, I realize all of this is centered on huge companies with massive engineering organizations. Small organizations have always required sysadmins to wear multiple hats, so none of this is new.)


Sorry, I don't see a response to the key point here. If Docker doesn't sign its containers and doesn't check signatures before applying one to a running system, then it's simply not secure. It may be one little feature of all the things Docker does for you. But that's what's lacking according to the blog post's author.


But Docker does sign containers and has since 1.3. It needs to be configured properly though. Maven signs packages by default and won't install dependencies unless the checksums match (large orgs run their own Maven repositories and only proxy trusted repositories).

This is why you still need system administrators. We just call them DevOps now. Any decently large organization would have a Docker SME whose job it is to know how to set up Docker securely.


Many places are rolling this type of sysadmin work up into DevOps. This scares graybeard sysadmins, because they see DevOps automating them out of a job.

Nope, not really. just wait till you move to a new job, and you inherit a docker/rockit/etc system. You need to patch openssl/glibc/etc however, half the containers are built with an old build system that's been replaced. You've got 15 containers based on fedora20 which is EOL, one of your apps relies on a bug in fedora21 which is also EOL.

Oh yeah, its just you, you have no resources, and you're on call to fix it when it fails (yeah devops is a nice way of saying unpaid overtime)

Oh and you need to replace two of three physical hosts, but you can't hot migrate containers, and you loose quorum on your cluster if you take one of the hosts down.

Looks its as simple as this. I'm a sysadmin, I know I know, you think I can't code, you think I know nothing about programming. This is bollocks. Two things: One I've seen this all before. Containers? yeah thats just fancy batch processing. Two, you know how when ever you log in to a new machine and all your files are there, not only that its faster than your laptop? thats me, making things fast. Its my job.

DevOps is a step up Not really, it seems to be a way of getting devs to do out of hours. Or allowing people with no experience of programming doing programming, or no experience of infrastructure doing infrastructure. A decent system admin does all of the "devop" things already. If they don't have a build system, git/svn controlled config management, they they arn't real sysadmins, they are over reaching helpdesk monkies.

fucking god awful at configuration management

dunno what you've been using, but I can configure 5000 machines inside 15 minutes with 10 lines of code and one ssh command.

A few years ago, you would have functional silos Only in certain companies. If you have politics, you'll get silos.

Docker/Maven/etc. are the abstraction layer between the teams

can't use technology to overcome procedural problems. If your teams arn't talking, your infrastructure is going to be shit. If your teams don't think about others when they produce their products, then things will fall through the cracks.

An example, Say you want 10 VMs of x size. If you have a coheesive system, you could email/phone/talk to a guy and you'll have some machines. If your provisioning team had thought ahead they'll have made an API that spins up machines, ties them to your accounting code, and configured them to your environment. That's not technology, that just good practice.


Bad DevOps is bad. But bad DevOps is basically no worse than what people were doing before: you'd just have a bunch of VMs running fedora20 with no way of easily patching all of them at once. Except some of the VMs may be running fedora23 because they were part of an expansion that happened 2 years after the original set and the guy who deployed them couldn't find a fedora20 image. And at least with a container, you can more easily use AWS for spare capacity/redundancy while you migrate servers. DevOps doesn't fix every sysadmin problem, but it gives you a lot more options that can be developed/deployed in a small amount of time.

DevOps is bad when you take your worst developer and say "do sysadmin tasks and still write application code". It works much better when you take an experienced sysadmin and embed them into a dev team. Make them do code reviews on deployment scripts with a developer, assign tasks within sprints, etc. Code reviews aren't because you don't know how to code -- IMO the primary benefit of code reviews is the education of the reviewer. Likewise, the sysadmin's struggles become the developers' struggles, and the developers are more likely to write applications that are easy to support if they have some role in supporting them.

Every company over a certain size has politics. It's unavoidable. Maybe Google doesn't -- I don't know. But not every company can be Google. You can't use technology to overcome process problems, but you can and should use technology as a part of a redesigned, better process. DevOps gives you more options, and has a positive effect on the culture of a development org. It asks them to think of portability and supportability as a concern.


> Update: it was pointed out that this started way before Docker

Yes, like in the 90's, at least, when people started using Java. Even prior to Maven there were jars, and we didn't really know what was in them. And prior to that, I didn't understand how every piece of software or hardware worked.

I was a big proponent of Gentoo when it came out because of building everything from source, but the fact is: I don't have time to look through and understand every line of code. Even compilers can and have injected malicious behavior in the past. Firmware cannot even be trusted.

Some level of trust and reliance on others needs to be there. While it is true that there will always be people that betray that trust, without the trust, we would be hermits living alone off the land- which may not be so bad, but that's another story.


> »Docker is the new 'curl | sudo bash'«.

Fully agree with this.

Maybe I am just only another grey beard grumpy developer, but the new generations that grew up with GNU/Linux instead of UNIX, bash the security of other OSes and then go running such commands all the time.


Sad really.

docker and any user in the group docker, or lets say, any user capable of sending commands to the docker daemon running as root - is root on that system.

docker -v /:/f -w /f yourimage /bin/bash -c "echo root:and:so:on > /f/etc/shadow"


I think the author is mixing up a few different topics. If you're going to blame container frameworks for people sharing software in insecure ways you might as well blame the fact that executables are portable between compatible systems. Might as well blame the fact that there's a network while you're at it. We run docker throughout our infrastructure, but it is a deployment and dependency management technology, not a vector for infection. We run only our own images, which are all built from source or validated binaries. So what do insecure or unreliable practices have to do with containers, specifically?


The article is not an attack of Docker, but the way it's being used by many.


>> This rant is about containers, prebuilt VMs, and the incredible mess they cause because their concept lacks notions of "trust" and "upgrades".

Oh, ok.


For people who are interested in learning more about the problem. This is a really great paper : https://www.informatik.tu-darmstadt.de/fileadmin/user_upload...


"Maven, ivy and sbt are the go-to tools for having your system download unsigned binary data from the internet and run it on your computer." You should setup a maven repository (Nexus, Artifactory) for your organisation if you want to have more control on binaries. Seems that artifactory can host docker files: https://www.jfrog.com/confluence/display/RTF/Docker+Reposito...


Kind of what I was going to say... The article seems to blame the tools, but there are more secure ways of using these same tools.


Right, do folks really belief Maven, Ivy, Gradle, Sbt are tools you use in production? These are developer tools for use on workstations and CI servers. If you want to promote your stuff to other environments like production use your own private repository (Nexus, etc).


They may be if you have a team without much sysadmin experience. The way you develop could be the way you deploy to production.

These are the same teams that have overprivileged accounts for the database or sudo-enabled users running applications or chmod 777 all over the place.

Even things like Chef cookbooks have this going on. If you want to build from source because it's not in your repository, then you're necessarily going to need to drag in sbt or gradle. (see https://github.com/hw-cookbooks/kafka/blob/develop/recipes/d... as an example). Sure you could figure out the mirrors and download the correct binary from the website. You could also use this recipe to compile everything and then package it up to host yourself. (Both of these actions require writing custom recipes). Not everyone has time to do this, and this magical recipe you found online works great on the development server! Just add it to the production server and now we've just used sbt in production on a software team.


Is it a coincidence that all the technologies the OP complains about are Java (Hadoop, Apache Bigtop, Maven, ivy, sbt, HBaseGiraphFlumeCrunchPigHiveMahoutSolrSparkElasticsearch)?


I don't think it's a coincidence. The Java ecosystem is intentionally isolated from the Unix ecosystem, because one of Java's goals was portability in an age when Windows, Mac, and Linux were all very different operating systems with very little in common. Java has its own Java-y build infrastructure, which relies much less on the concept of "trusting the source", and much more on the simple fact that the JVM is a sandbox that can be tuned to whatever security requirements the sysadmin desires.

Running Java apps (especially Docker-ized Java apps) is less like installing a Unix package (even if it's masquerading as doing so), and more like starting an instance of some untrusted VM image on your (software-defined-)network. It can use some of your computer's resources, but it has no permissions to touch any of your data or services unless you grant them to it. It really is like an app, or a web page.


Node/npm should get a mention. On a simple static website I've seen, a couple of grunt tasks end up pulling in over 14,000 files. And a hugely nested directory structure.

Part of it is the ... interesting ... idea that individual functions should come in their own module. Some npm modules are literally 6 lines of code. But they get packaged up just like everything else. There's no concept of having a stdlib or something. (Apparently node/v8/minifiers aren't smart enough to do a good job if you use a stdlib.)


I imagine they are just involved with the java ecosystem so the examples they know involve java.


I completely agree with this except I think of it more as a problem of release engineering rather than system administration.

The trouble is, the sysadmin's job is to deploy things. The developers job is to write code. Often release engineering isn't thought of at all or if it is it's given to the last qualified or least suspecting folks without any requirement from operations.

Developers aren't taught about release engineering or deployment in school at all. In fact, it seems to me most university curricula do everything possible to hide all that from students.

Compounding that is the developers desire to get new code out conflicting with the sysadmins requirement to keep things stable in the face of limited QA automation. This leads to the common conflict between dev and ops.

This is to me a large part of what has led to the DevOps movement. This gives the developers information about the deployment and perhaps even access to it or a version of it and/or a voice in deciding how things are deployed.

Hopefully we can standardize things widely enough that universities can teach this without fear of focusing on useless technologies that will be discarded in 3-5 years.


As an ex sysadmin I really like the container infrastructure. Manage the whole configuration on the main machine with puppet and deploy the blackbox applications (everything ruby and java related) with docker/rocket.


It's nice to have the option. Containers are awesome for many things/projects, but sometime you just want to run the damn application on a server of your choice, without any container stuff.

I can't remember what the application was, but I've seen an application where the only installation instructions where for Docker. That's just plain silly.

My concern with containers is that the wrong people will use it. There is a ton of software out there with just barely runs and make all kinds of assumption about it's environment. I fear that rather than design better, more correct software, these people/companies will start packing up their development environments as containers (more or else) and just ship those. Of cause that's no reason to discourage the use of containers, we just need to be critical of what is inside them.


Maybe you can send a copy of your BRMS to your competitors while you are it.


Today I learned that the "curl | sudo" idiom is actually a thing people really do. Truly, everything is awful.


If you want to have guaranteed runtime linkage built from trusted source, you might want to give BOSH (http://bosh.io) a look for config/release management - it insists/prefers on compiling all dependencies from source, from trusted links, with signature checks. For example with Hadoop, here is the build script:

https://github.com/cf-platform-eng/hadoop-boshrelease/tree/m...

Learning curve is a bit steep but it's another approach to this immutable infrastructure trend that's built for large production, enables rolling canary upgrades, etc.


And before "curl | sh" it was download from freshmeat and run "tar xfz; cd; make install".

It's not really better. Fact is we are running huge and complicated frameworks with lots of dependencies. These technologies are new and evolve fast. Distros don't have enough volunteers to decouple this mess and thus fail to provide stable packages. There is a good chance nobody wants an old version of hadoop anyways.

Containers are a whole other problem. Always bothered me that no one cares about building these images themselves. The documentation is there, you can build your own docker/vagrant/... containers and vms. It's just nobody seems to care anymore? Sometimes I don't even know where these images come from, distro, community, ...?


I think it's because people are getting worse at explaining things and writing docs.

As a student many times I wanted to learn how things work, but most tutorials/docs just ask you to type in a few magical lines without much explanation. Maybe the authors think their audience won't understand anyway, but I think it's the authors' ineptitude if they can't explain what their programs do in an accessible way.

I really hope there could be more projects like i3[1] and flask[2].

1: http://i3wm.org/docs/userguide.html 2: http://flask.pocoo.org/docs/0.10/


It's not just system administration...

Minecraft is a meta-game about downloading unsigned JARs from the 'net and running them with your own user account.


I've often thought about just offering my services as a sys-admin to the multitude of small startups that popup locally. Many, many developers -- almost no sys-admining skills amongst them. Just fire up a server on AWS and away you go.


The 'curl | sudo bash' mention reminds me of OS X Homebrew. The one-liner installation script is still published in the home page front and center (though it's running from a trusted source Github).

http://brew.sh/

One might argue that that easy of oneliner installer script is exactly what make Homebrew gains popularity. And dev machine is different from production environment in terms installation packages. Still, I agree that proper container management in local network and perhaps new security features from upstream container vendors would help the situation.


I agree, I think accessibility made it popular. Security and ease of use are usually opposing forces.

The article has some interesting discussion points. I don't understand the absolute fear of | bash installers. It's open source, read the script. That's the argument people make for `./configure; make; make install` programs. I think it's because it's new, or it's too easy.

But the article does have a point about trusted containers. But security isn't a download or a product anyway. Security isn't even guaranteed.


So, asking the obvious question: what's the solution to that?


The obvious question is: what's the real problem with that?

A container is a container, as long as docker itself has not bug, the container can only harm the containers content.

Most problems exists in the custom created software in the container (e.g. web-services with bugs, backdoors, ....), this will be a problem for Docker, VMs, Real-Servers, whatever too.

The real problem is the interoperability of different container, if you link the whole data, without any audit, to another container, you can have a problem, but this problem is not docker specific.


>> A container is a container, as long as docker itself has not bug, the container can only harm the containers content.

Presumably a container has network access of some sort? Malicious code could start probing and attacking anything exposed that way.

>> this will be a problem for Docker, VMs, Real-Servers, whatever too.

The implication is that you wouldn't get into this situation with a 'Real-Server' so easily, because you wouldn't just download an image and run it, without having an update/patch strategy or having much more idea of what's going on inside it.


But you assume that a container HAS full network access. A firewall must be configured, but a firewall must be configured for a VM too. My point is, that their is not so a huge difference for production systems.


>> But you assume that a container HAS full network access.

No, I'm presuming it has some sort of network access, a malicious container could (for instance) still probe other containers for vulnerabilities, serve malware etc etc without full network access.

>> A firewall must be configured, but a firewall must be configured for a VM too. My point is, that their is not so a huge difference for production systems.

If you're downloading VM images from somewhere and running them without checking what's in them you'll run into the same problem, sure.

The problem being pointed out here is that when applications are bundled outside of the purview of a packager like debian you -

  - don't have as much trust in the origin of the app
  - don't have an easy way to keep up on library patchlevels etc for security


You seem to ignore or downplay all the other ways a container can cause problems, including but not limited to:

- Being a backdoor to the rest of your network (sniffing network traffic, or more simply reverse ssh-tunneling to an outside server)

- All the various "fun" botnet-related activities (spam being the king here)

- Actively serving malware to the rest of your network.

EDIT: Formatting, and well, others answered your question less specifically but more eloquently.


> A container is a container, as long as docker itself has not bug, the container can only harm the containers content.

So given that there are no bugs and as long as the Linux kernel is free from local privilege escalation exploits. That seems long odds to trust in.


The same trust i have in a VM or a RM.


Not true if the software in your VM or RM is managed by a package manager and comes from a place that issues security updates, patches etc.

One of the criticisms in the article is that much of what's going on now, either with containerisation or weird build systems like Hadoop's, misses out on this.


As a Windows/VMWare/Exchange/Cisco admin, this is all completely foreign to me. Does this indirectly make a case for paying a vendor real money to manage their product properly? All vendors we work with provide installers that handle installing any dependencies. Updates are a few clicks. Occasionally we run into a vendor who provides installation/upgrade instructions, that involve manually copying files and hand editing config files. We replace those vendors. Error-prone people should not be doing manual file copying/editing or dependency checking, tasks that computers are dozens or orders of magnitude more competent at. This is B2B stuff where businesses should manage their product or risk getting sued out of business. The current environment seems to be that using free or opensource products is "free, with purchase of a team of consultants". Why not just pay the money to a vendor to provide, and support, a real product? It seems backwards to call this a sad state of sysadmin. This is like Boeing providing its leftover parts and a 9000 page manual on 747 assembly, and people complaining about the "sad state of mechanics". That's backwards. Buy a 747 from Boeing if that's what you need.


Give me the command line and I'll build anything!


> Maven, ivy and sbt are the go-to tools for having your system download unsigned binary data from the internet and run it on your computer.

Not Maven.


I'm a bit puzzled. Let's say I decide to not download the binary but build it from source. Unless I actually read the source, I'm trusting the community to have read it, which consists of other people thinking I have read it.

In my view, this is true for the OS itself. So unless I read everything, I'm fucked. And I don't. Thus I'm fucked.

Do I miss something?


This is also pretty much what happened with OpenSSL. Which is why I'm amused by the holier-than-thou attitudes in here.


> And then hope the gradle build doesn't throw a 200 line useless backtrace

This is more the fault of the language Gradle chose for its build configuration language. Most build scripts are between 20 to 50 lines long, but reading through those Groovy stack traces eliminates its supposedly write-one-read-many-times benefits.

Hopefully Gradleware will fix this problem for Gradle 3. They've already enabled Gradle to be configured on the fly by Java code, and could by working towards allowing any dynamic language to be a build language through an API. Alternatively, they've just employed one of the ex-Groovy developers recently made jobless by Pivotal pulling funding from Groovy and Grails last month -- they might get him to write a better lightweight DSL from scratch that parses the existing syntax, but isn't weighed down by all of the present cruft.


A lot of big projects are terrible to build.

Once upon a time minimising dependencies was considered good practice. Now I get a pasting if I write clean code without reusing someone else's library... even if the suggestions don't solve my problem directly or at all and comes complete with a sloppy 'no one click build/deploy' configuration... the kind I was embarrassed to produce on my stand alone projects in my teenage bedroom days.

Shame on developers everywhere for tolerating this mess. I (am lucky enough to enjoy the freedom of choice that I) would leave a job if not allowed to start fixing such a situation from day one.

That being said, good sysadmins and developers should work out these problems properly instead of shortcutting through someone else's half arsed effort via Google.


On the one hand it's nice that developers are using more libraries and writing less from scratch. On the other hand dependencies are out of control. The one thing that really bothers me is code that depends on a particular IDE. That shit drives me up the wall.


Is this the sad rabbit hole reality of attempting to abstract every last component?


Regarding curl PACKAGE | sudo bash...

Just what do you think happens when you run `yum update` or Windows Update?

If you don't trust DNS or the network then you have serious challenges which frankly aren't even solved by air gapping file transfers.


There is some truth in this, yes. On the other hand, maven (mentioned by the author) clearly was very successful at abstracting from the tools we use. I can remember how much time I wasted with build scripts and dependency management and all that before. (And I still do on some other platforms.) The problems only arise if the abstraction is not working well enough. This might indeed be true for containers - there is maybe too much complexity in there that currently can't be properly encapsulated.


Because Docker is good at containing crap, it is used to cover a multitude of sins. Before using Docker, please simplify your install and upgrade processes.


Containers and VMs are part of the solution to this problem, not a cause. Try managing the same dependencies across N platforms rather than one container!


Fully agree with everything written there, with the exception of "apps." I believe it is not a Microsoft term, but an Apple term. Is it not?


Incidentally I feel like my admin "skills" have never improved faster than since I started working with docker.

Docker lets me iterate on system configuration faster than ever, and that means learning the details and quirks of certain software faster. Then again, I usually don't use prebuilt VMs and containers, but have to prepare them for people who don't want to pay for good sysadmins..


Nicely said, I thought I'm the only one who noticed this ^_^ This is one of the reason why I tried Docker/Vagrant images few times and said no thanks :) I would rather spend my time and install everything on separate server my self then have unknown set of packages or security holes. As few articles on HN shown this containers not secure at all.


I don't think any experienced organization is going to just download containers off the internet to use on their servers. Which is why there is self-hosted Registry applications that corps and big companies buy to host their own, where they build their images to support their applications and that are vetted through traditional corp policy


This "working out of the box" phylosophy is the direct result of devs using Windows and OSX platforms for creating those programs. They now think "Linux and *BSD should be as easy to use as Mac.". Indeed, the majority of devs is mediocre amateur sysadmins. They know next to nothing beyond their preffered language.


70s system software is really showing its age. Containers are just a hack to make its complexity sometimes easier to manage.


Emperor Joseph II: My dear young man, don't take it too hard. Your work is ingenious. It's quality work. And there are simply too many notes, that's all. Just cut a few and it will be perfect.

Mozart: Which few did you have in mind, Majesty?


Sys admins got relegated to tech support so web devs could add sys admin to their work flow.


Looks like description of the projects with bad build system, not like problem with e.g. maven. Maven downloads binaries from the HTTPS server. You can always get those libraries and rebuild them from sources into your internal repository.


So I like this article and want to learn more about sys admin/dev ops/whatever but where do I go. Is docker bad? What is a good starting point? What are best practices?


Can someone tell me what realistic security problems this mode of operation introduces that can't be mitigated/avoided with sensible network and backup configurations?


Precompiled binaries from random sources is a major security concern.


I thought this post was overly cynical and full of generalizations. I don't really understand what point is trying to be made here.

"Everybody" "Nobody" "Nobody" "None of" "everything got Windows-ized" every sentence is a broad generalization on top of cynicism so it's hard to find any value in the point trying to be made.


This is the price of devops.



What is really ironic is that none of this "tools" solves any fundamental problem of so-called version hell and none of this containers are fundamentally different from

  ./configure --prefix=/xxxx && make && make -s install
with or without following

  chroot /yyyy
The big "innovation" of having so-called "virtual env" (they call it "reproducible [development] environment) for each "hello world" (a whole python/ruby/java/etc installation with all packages and its dependencies in your [home] project directory) solves no real problem, only pushes it to the next guy (what they call devops).

Some idiots even advocating to have a whole snapshot of an OS attached to your "hello world", and even to make it what they call "purely functional" or even "monadic" (why not, of someone pays for that).

Unfortunately, there is no way to ignore complexity of versions and package dependencies or easily pushing it to "devops". Creating a zillion of "container images" with just your "reproducible development environment" or a whole "OS snapshot" just multiplies entities without a necessity.

Programmer must be aware of which version of what API implemented with what version of package or library he using and explicitly assert and maintain these requirements, like all the very few sane software projects (git, nginx, redis, postgress) do.

btw, the GNU autotools (which gives us ./configure) is somewhat evolved real-world solution - you have to explicitly check each version of each API both at the compile (build) time and refuse to build in case of unsatisfied dependencies and at the install time (and package manager must refuse to install if case of mismatch). This is the only way back to sanity, however "painful" it is.


> Programmer must be aware of which version of what API implemented with what version of package or library he using and explicitly assert and maintain these requirements, like all the very few sane software projects (git, nginx, redis, postgress) do.

Except that when you're doing anything that looks like an actual end-user application (as opposed to infrastructure), you end up using dozens of libraries which themselves have dependencies, so suddenly you're supposed to "explicitly assert and maintain" hundreds of different library versions, none of which is in any way relevant to the application you're building.

I myself see Docker containers as the only reasonable way for giving a service application to people to deploy on their machines, because even the programming runtime I need is 5 years out of date on Debian/Ubuntu, and installing that stuff manually is a) pain, and b) different on every operating system.


> as the only reasonable way for giving a service application to people to deploy on their machines

Take a look how git or nginx compiles from the source on any machine imaginable.

There is absolutely no fundamental problem with ./configure; make; make install.


That would be the same git that's still basically unusable on windows? And have you ever tried to cross-compile it?

> There is absolutely no fundamental problem with ./configure; make; make install.

The fundamental problem is incompatible versions of dependencies. Arguably it's in linux's dynamic linker rather than a problem with configure/make. But if you need to run something that depends on libfoo 2.3 and something else that depends on libfoo 2.4 on the same machine, you need something like docker.


Does it work well when you have third-party dependencies?


I think you are conflating purely functional, reproducible environments with the (for lack of a better term) Docker way of managing containers where you just make full disk image for everything with lots of duplication. It is very possible to have the former while avoiding the latter. Both the Nix and GNU Guix projects succeed at this. I recommend taking a look at them to see if they address your concerns.


I am old-school sysadmin and I still thinking in terms of what ldd command tells me, how shared libraries are implemented, how various dlopen based FFIs work, why I need this or that.

The other approaches, such as Java's (where we "abstract out an OS") in practice leading only to a bigger mess, because it boils down to the very same libc, libm, libffi and friends. JVM is an ordinary userlevel program, so it obeys to the restrictions and rules for any other userlevel program. This is the sad truth for Java zealots.

Basically, one cannot ignore an OS (at least when you still want to dlopen cand call the stuff instead of re-implementing it poorly) - it is just a wrong idea, leading to all these ridiculous FS-inside-JVM implementations and other messed up layers of unnecessary, redundant abstractions.

System administration is still hard and it (necessity to think, understand and analyze) cannot be eliminated by some bunch of shell or ruby scripts and wishful thinking.


I would be interested to know why you call purely functional package management "idiotic", given that one of its main goals is to solve the problem of version hell? I.e., to make it easy for a developer to specify that application X should use version V of library Y, without interfering with other applications on your system?


Very simple. There is so-called atomic operations, or transactions, which are good enough to solve the problem. Any "pure functionality" in this context is plain nonsense.


Gentoo and BSD ports exist for a reason.


I don't have a strong opinion either way about Docker, but I understand the OP's gripes.

Stack is the new term for "I have no idea what I'm actually using".

This was great. It leaves me to wonder what "full-stack" means.

For one thing, we have a culture of trust inversion. I wrote about it in a blog post about a month ago: https://michaelochurch.wordpress.com/2015/03/25/never-invent... . The "startup" brand (and it is a brand) has won and most companies trust in-house programmers less than they trust off-the-shelf solutions. This tends to be a self-fulfilling prophecy. Because few corporations will budget the time to do something well (make it fast, make it secure, make it maintainable) it only makes sense to use third-party software heavily and use one's own people to handle the glue code, integration, and icky custom work. (That, of course, leads to talent loss, and soon enough, when it comes to build vs. buy your only option is to buy, because your build-capable people are gone.) At some point, however, you end up with a large amount of nearly-organic legacy complexity in your system that no one really understands.

Although it's not limited to one language or culture, this is one of my main beefs with Java culture. It has thoroughly given up on reading code. Don't get me wrong: reading code (at least, typical code, not best-of-class code) is difficult, unpleasant, and slow and, because of this, you invariably have to trust a lot of code without manually auditing it. But I like having the idea that I can. The cultures of C, OCaml, Haskell, and to a degree Python, all still have this. People still read source code of the infrastructure that they rely upon. But the Java culture is one that has given up on the concept of reading code (except with an IDE that, one hopes, does enough of your thinking for you to get you to the right spot for the bug you are fighting) and understanding anything in its entirety is generally not done.


The problem is you old sysadmins are so passé. Software has replaced you, and you need to get over it. Developers are finally liberated to move at full speed without hearing "NO"


there certainly are sysadmins that build their authority and power only on having exclusive access to root account.


So much truth spoken in the linked text. Thanks.

It has to be said. Damn the containers and windowization of Linux.


To isolate closed-source programs containers are very helpful. I don't like to run steam within my normal debian system.


Except that Docker explicitly allows and encourages signing of core infrastructure containers.


Not even close. Docker now has some terribly attempts at signing images on their registry iirc (docker inc signs them for the docker client).

There is no option for me, as a user, to build and sign my own image with my own pgp key afaik. My organization might already have a chain of trust, and docker is asking me to ignore that and just trust their signatures (which also only work on dockerhub as of docker 1.5... don't know about 1.6 because you can't use docker for at least a month after a release else security holes galore).

Docker did nothing to encourage signing containers. At 1.0 they had no capability to do any signature, verification, whatsoever. It's being added as an afterthought, and poorly.

If you look at the AppContainer specs, signatures (pgp based) were built in from the very beginning, it lets me create my own chain of trust (including incorporating other's keys), sign my own images, trust someone elses signature, does not trust the transport or storage medium, and has integration with the clients.

If you want to convince me docker cares, you're going to have to give me examples of where they didn't fuck up...

Tell me how I can use docker's tools to sign my own images, optionally trust my friend Alice, and securely download images that she uploaded to her own registry or dockerhub but signed with her gpg key without me having to trust docker inc.

To my knowledge, all docker has right now is doing a 'tarsum' of images which assumes the registry is trusted and, even given that, can be downgraded for backwards compatibility reasons fairly trivially.


Docker didn't fuck up when they hired the square guys. http://blog.docker.com/2015/03/secured-at-docker-diogo-monic...

I agree that security and provenance is a real issue in Docker. It is however being worked on, and it will be solved. Presumably we will end up with some sort of app-store like framework with proper signatures and verification.

Docker can't do everything at once. Give them a chance. The new version of the registry is a major step forward in this regard.

In the meantime, what you can do is take redhat's advice. Rather than using a registry to get your images, operate a download site which stores archives of docker images that you can import with `docker load`. You can then also store signatures and check them yourself.


Cool, they hired some people, but I haven't noticed better security for it yet.

Security is a real issue in docker and it is being worked on, but I don't think "give them a chance" is a justifiable response. They're not focusing on it strongly. They already should have focussed on it and didn't. Their entire codebase was written without a security design in place, so there's likely deep-seated refactoring that'll need to be done before any new security-related features should be trusted.

They're working harder on monetizing and pushing docker as a production-ready standard as far as I can tell... I can understand not doing security before functionality, but it absolutely should be there before 1.0 or before you encourage others to use your software.

Docker has already lost any chance of me trusting their security with their lack of focus on it and I don't think it's excusable.

And if I'm doing what you say at the end, why the hell would I be using docker anyways then? I can already turn a tarballed fs into a linux container without docker (ty lxc); I thought the whole point of docker was sharing images and building on them and ... and having massive security flaws. Right.


I agree that security should have been in place before they went 1.0. However, if you look at the work on the version of the registry (docker/distribution on github), they are taking things more seriously and trying to get the basics right.

I find your last point a bit strange. We all know the Docker development experience is a lot better than raw lxc. I'm saying you can (and probably should) be more careful about provenance than the Docker Hub is. Note that there are alternatives to the Hub with better provenance stories e.g: https://access.redhat.com/search/#/container-images (from https://securityblog.redhat.com/2014/12/18/before-you-initia...) This might make things a bit more awkward than it was before, but it's still not the same as raw LXC.

I feel your anger and I think it's understandable, but that doesn't mean things won't get better.


>> Security is a real issue in docker and it is being worked on, but I don't think "give them a chance" is a justifiable response.

For a public facing system I would agree that means it's not ready for production use yet...


haha you're my new hero

YOU ONLY LIVE ONCE MAN! trust the (maven) system


[deleted]


    > As far as I know, it's also still standard practice in
    > most companies to either read the source code of open-
    > source stuff before deploying it to production (binary
    > or build) or get a support contract from someone else
    > who has
I'm afraid I have no better, more cogent response for this than 'lol'.



Thanks for the share. =)

I'll queue this to read later, but just reading the executive statement, it seems to jive with a lot of the discussion here.

The issue isn't that code review /shouldn't/ be taking place, or even that there aren't directives stating that it should be done. It's that it isn't being done, and that's a problem.


At the risk of my karma I'll have to maintain that for companies who are subject to regulation (publicly-traded companies, banks, etc.) what I said is still standard. Unless you have any specific instances to the contrary you're willing to offer?


Coming from my background (DoD, don't laugh.), code review of anything other than in-house developed applications never occurs. In any instance open source is used, it is mandated that it come with a support contract (per DISA STIG) which provides the support and accountability the organization is looking for.

So, with regards to your assertion, the second clause (support contract)? Definitely. the first (code review), never.

That's the view from my side of the fence, anyways.


    > for companies who are subject to regulation (publicly-
    > traded companies, banks, etc.) 
Well for starters, that's not most companies, or even that many companies as a percentage of the whole.

The only place I've ever seen (or even heard of this) being done is banks and defense.

It ain't in ISO27001, and so nobody cares.


In defense, it's too expensive to do in house. However, security software and some operating systems, either closed source or open, are evaluated under Common Criteria (https://www.commoncriteriaportal.org/) But the evaluation process is rather long and often lags behind the current version by a year or more in some cases.


I've never worked in a place that insisted on source code being read for FOSS components.

I've worked in places where there's a restricted list of 'approved' open source stuff, but that's been more to do with licensing.


I've never seen anyone evaluating code in the environments I've been in. In many ways, I run into the reverse, where it's a given that these build processes are obtuse, and so they're simply untrusted as opposed to pulling them apart. Administrators either rely on paid products, or long, manual processes.

I've unfortunately had more than one mind numbing conversation where another administrator will tell me that it's a "script" and therefor, I don't know what it does, and that there "may be" some hidden black magic that will instantly pwn all of our systems. Attempting to explain that I verified the functionality line-by-line brings blank stares, as if it's utterly impossible for someone to derive the dark magics that are code.

Thankfully, I haven't run into many instances where software is blindly installed as the author relates, but his point in that "no one knows how it works" combined with the lack of attribution creates an environment of mistrust which greatly limits our ability to take advantage of open source software.


> As far as I know, it's also still standard practice in most companies to either read the source code of open-source stuff before deploying it to production (binary or build) or get a support contract from someone else who has.

Reading the source code might be the case if you are in a large enterprise that can afford to keep those programmers busy or needs to actually vet the code but I honestly doubt it. Just check the loc count on something as common as Hadoop (close to 1 million [1]) and that is excluding the "stack" described in the article. I wanna bet you, because we all know how volatile the job market is that even the guys building or supporting the software did not write or read even half of that code.

[1] https://www.openhub.net/p/Hadoop/analyses/latest/languages_s...


I suspect this is why when many companies decide they want to use open source stuff they contract with companies like GitHub, who absorb the risk of using git, etc, rather than trying to vet the code themselves. I still have a hard time envisioning anyone in any position of authority in any credible company accepting the idea of installing open source software without vetting it one way or the other. If the shit did hit the fan their career would be pretty much over.


companies like GitHub, who absorb the risk of using git

Except they don't do that at all?

If a bug in git would cost your company a lot of money, why do you think GitHub would be accountable if you have some enterprise deal there?

I still have a hard time envisioning anyone in any position of authority in any credible company accepting the idea of installing open source software without vetting it one way or the other.

Well, that is an entirely different thing. People TEST stuff before deploying it. But that has little to do with code reading or mandatory support contracts.


it's also still standard practice in most companies to either read the source code of open-source stuff before deploying it to production

Eh, no.


I'd love a specific example? Because in many countries if they're publicly-traded or subject to other regulations (such as Basel, etc.) any company that didn't would be breaking the law.


I'd love a specific counter-example. I've never in my life encountered a sysadmin who read all or even part of the source code of any major package before deploying it. I mean, do you read the source code of the Linux kernel, PostgreSQL, nginx, OpenSSL, ... before installing it?

If Basel requires that level of auditing, I'd love to see a cite. There are some pretty heavy-handed regulations out there, but this would be clearly unworkable.


Given the context, it's probably a bad idea to name employers, but I've literally never seen this happen at any company I worked for, and I did consulting, so that's quite a few.

I wonder what laws you're referring to that requires companies to have read the innards of products they deploy.

Even the "getting a support contract from someone who does". Do you think for example Red Hat has read every single line of code shipped in RHEL?


I've worked in the USA, UK and Australia.

I've never seen what you describe. And one of the companies is a huge multinational.


Which laws are you referring to? Is this a strange interpretation of Sarbanes-Oxley or HIPAA?


I'd be interested to hear if you have an example of a company that did read the source code of apps. used?

Also which law do you think they would be breaking by not reading the source code of products used?


Excellence in (open source or otherwise) security practices are the exception, not the norm.

I've worked in multiple large companies and even getting package signing turned on requires a lead pipe. Docker and tools can enable an org to move those types of responsibilities "over/down" to the developer as well, so that now there is no neck-beard encrusted gate at all.


I want to work where you've been working.

I've never, ever seen this happen.


I want to work there too, except if that means that I will be the one having to read through all open-source code before it is deployed.


And would you be liable for any missed bugs that cause production to break?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: