A thing that this article is hinting at that I think might be more fundamental t...

sparkie · on March 11, 2014

The article is hinting at referential transparency for packaging and configuration.

> it's not clear to me how you know which versions of what libs to link to and stuff like that

You'd typically link to the most recent version which you've tested against, and record it's base32 hash in your package definition. That is, a package by default contains exact identities of all of its dependencies - there is no "fuzzy matching" of packages based on a name and version range. The point here is that the packager of the application should know what he is doing, and by specifying exact dependencies, he is removing the "hidden knowledge" that often goes into building software. (in many cases, this is just ./configure && make && make install, but can be massively more difficult to reproduce a build, particularly if the dependencies aren't well specified.

The Nix build system knows which version to build against because there is only one version to build against in the chrooted environment where the build occurs - which is the one whose identity you specified in the nixpkg.

arh68 · on March 11, 2014

> there is only one version to build against in the chrooted environment where the build occurs

This is all rather new to me. Would it be fair to make the analogy? The build process is not a portable/cross-platform event, so you basically distribute a BuildFoo.exe with statically-linked libraries included.

You're roughly guaranteed that the BuildFoo.exe will run (they've got those libraries), and the user gets Foo in the end (either dynamically-linked or statically).

sparkie · on March 11, 2014

Yes, it's a fair analogy. Nix doesn't require static linking, but it does require the exact dependencies to be present for shared libraries. You can run the Nix package manager on top of another system like Debian, but you'll need to build most of the core packages again with Nix, such as glibc, gcc etc. (they live alongside your system's packages in /nix/store, and can be linked in /usr/local). This basically works as long as the kernel you're running supports the features of the packages you install.

With the NixOS you get the additional advantage of configuration management and everything, including the kernel is handled by the package manager, which providers stonger guarantees that things should work as expected.

Goladus · on March 12, 2014

So far, I'm not really seeing the configuration management advantage.

andrewflnr · on March 12, 2014

Everything is reproducible. Things that have no reason to be tangled up are, in fact, not tangled up. If that doesn't sound advantageous, I don't know what else can be said.

Goladus · on March 12, 2014

If that doesn't sound advantageous, I don't know what else can be said

I mean specifically with regards to configuration management: that is, managing the part of software that developers intend to be modified so as to change the behavior of the program.

Maybe I just don't understand, but I don't see how this does anything to advance current config management dilemmas like how to merge a new upstream version of a configuration file with your site-specific changes; or how to deploy similar changes to large numbers of nodes at a time.

Modifying files in a git repo which are deployed to $ETC by ansible where modification triggers versus modifying files in a git repo which are used as "inputs" to a functional operating system seem like a largely cosmetic difference to me.

sillysaurus3 · on March 11, 2014

Offtopic, but: what's an example of a situation where using rm -f is bad compared to rm in practice? That is, an example where rm would save you but rm -f would make your life upsetting?

On topic: idempotency may be a red herring in this context. Unfortunately filesystems are designed with the assumption that every modification is inherently stateful. (It may be possible to design a different type of filesystem without this assumption, but every filesystem currently operates as a sequence of commits that alter state.) So installing a library or a program is necessarily stateful. What do you do if the program fails to install? Trying again probably won't help: the failure is probably due to some other missing or corrupted state. So indempotency won't help you because there's no situation in which a retry loop would be helpful. That is, if something fails, then whatever operation you were trying to accomplish is probably doomed anyway (if it's automated).

I think docker is the right answer. It sidesteps the problem by letting you create containers with guaranteed state. If you perform a sequence of steps, and those steps succeeded once, then they'll always succeed (as long as errors like network connectivity issues are taken into account, but you'd have to do that anyway). EDIT: I disagree with myself. Let's say you write a program to set up a docker container and install a web service. If at some future time some component that the web service relies upon releases an update that changes its API in a way that breaks the web service, then your supercool docker autosetup script will no longer function. The only way around this is to install known versions of everything, but that's a horrible idea because it makes security updates impossible to install.

It's a tough problem in general. Everyone agrees that hiring people to set up and manually configure servers isn't a tenable solution. But we haven't really agreed what should replace an intelligent human when configuring a server.

jes5199 · on March 11, 2014

well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set. I've never actually been bitten by that. In general though, I think this pattern scales poorly - the more complicated your task is, the more like the "force it" mode is going to be more and more dangerous.

---

On the subject of what to do when something goes wrong: Sometimes retrying installing a package does fix the problem: if there was a network error, for example, and you downloaded an incomplete set of files, the next time you run it it will be fine.

If your package manager goes off the rails and gets your system into an inconsistent state, then you have a decision to make. Is this going to happen again? If not, just fix the stupid thing manually: there's no point in automating a one-time task. If it is probably recurring, then, you need to write some code to fix it (and file a bug report to your distro!). I do not believe that there is a safe, sane way to pre-engineer your automation to fix problems you haven't seen yet!

In the meantime maybe your automation framework stupidly tries to run the install script every 20 minutes and reports recurring failure. The cost of that is low.

Docker is awesome, for sure, and I'll definitely use it on my next server-side project. It isn't a magic bullet, though - you still have to configure things, they still have dependencies. Just, hopefully, failures are more constrained.

---

and on the point of upgrading for security fixes: the sad reality is that even critical fixes for security holes must be tested on a staging environment. No upgrade is ever really, truly guaranteed to be safe. I guess if the bug is bad enough you just shut down Production entirely until you can figure out whether you have a fix that is compatible with everything.

thwarted · on March 12, 2014

well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set.

Since you originally outlined the requirements as:

Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone.

then the file should be gone even if "the readonly bit" was set.

This is not only a contrived example, but a bad one, for system management. rm is an interactive command line tool, with a user interface that is meant to keep you from shooting yourself in the foot. rm is polite in that it checks that the file is writable before attempting to remove it and gives a warning. System management tools I would expect to call unlink(2) directly to remove the file, which doesn't have a user-interface, rather than run rm.

However, the system management tool doesn't start with no knowledge of the current state of the system, but rather one that is known (or otherwise discoverable/manageable). And then attempt to transform the system into a target state. They can not be expected to transform any random state into a target state. As such, the result of unlink(2) should be reported, and the operator should have the option of fixing up the corner cases where it is unable to perform as desired. If you've got 100 machines and 99 of them are able to be transformed into the target state by the system management tool and one of them is not, this isn't a deficiency of the system management tool, but most likely a system having diverged in some way. Only the operator can decide if the divergence is something that can/should be handled on a continuous basis, by changing what the tool does (forcing removal of a file that is otherwise unable to be removed, for example), or fixing that system, after investigation.

The other option is to only ever start with a blank slate for each machine and built it from scratch into a known state. If anything diverges, scrap it and start over. This is an acceptable method of attack to keep systems from diverging, but not always the pragmatic one.

rooted · on March 11, 2014

I can't think if a situation rm is safer than rm -f unless you want to have confirmations for each file that's deleted.

I'm a fan of aliasing rm to move files into a trash folder

tjpick · on March 11, 2014

it's probably safer to just remember to use mv instead, because there's a very high chance that you'll do the wrong thing on a terminal that doesn't have that alias available.

wahnfrieden · on March 11, 2014

Just alias a third command that moves to your trash directory and won't accidentally trigger rm when you're on a new machine.

emmelaich · on March 12, 2014

One problem with 'rm -f' is that it returns success even when it fails. The flag actually means two separate things:

    * override normal protection
    * don't return error if it fails.

I'm pretty sure it's this way for the benefit of "make". You typically don't the clean target to ever "fail".

bmelton · on March 12, 2014

> Offtopic, but: what's an example of a situation where using rm -f is bad compared to rm in practice?

Having personally done this, in production, I'll give you my goto example.

set PACKAGE-DIR=/usr/local/bmelton/test/ rm -rf $PACKAGE_DIR

(actually rm -rf /)

nnutter · on March 14, 2014

set -o nounset

vinceguidry · on March 11, 2014

> Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone. What if the file is already gone? Then it throws an error!

Simply rm the file and handle the particular error case of the file existing by ignoring it. Other errors go through fine.

I've been doing this at work to try to wrangle a sense of control out of our various projects. I'm using Sprinkle, which is basically a wrapper around SSH.

What I'm finding is that most decent projects include idempotent ways to configure them. Apache, for instance, at least on Ubuntu allows you to write configs to a directory and then run a command to enable them. Sudo also has the sudo.d directory, cron has cron.d. Just write a file.

> Is there a "don't do anything I didn't explicitly tell you to!" flag for apt-get ?

I would consider this to be overly tight coupling. We should let dpkg manage the OS packages, and if the system's state needs to be changed, you can simply re-build it and run an updated version of your management scripts.

You don't really want to start getting into the game of trying to abstract over the entire domain of systems engineering. CM, in my opinion, should solve one and only one problem, moving system state between the infrastructure/cloud provider defaults and a state where application deployment scripts can take over. Every necessary change to get from point A to point B gets documented in a script. There are only two points on the map, and only one direction to go.

kingraoul3 · on March 12, 2014

So CM is a provisioning tool? I thought of it as being more of "ensure trusted compute environment" tool. But all the existing tool sets require additional engineering to revert changes that aren't in their dynamically rendered file set.

andrewflnr · on March 12, 2014

You just described how GP wants rm to work. The problem they're talking about is that it doesn't work that way today.

copergi · on March 12, 2014

>Simply rm the file and handle the particular error case of the file existing by ignoring it

How do I differentiate to ignore one error and not others? Matching a string? What if this is supposed to be portable? Hard code strings for every version of rm ever made?

>What I'm finding is that most decent projects include idempotent ways to configure them

Those are modifications debian makes. Lots of software supports including files, which lets debian do that easily. But sudo has nothing to do with you having a sudo.d directory, that is entirely your OS vendor. And having that doesn't solve the problem. What happens when I want to remove X and add Y? You need to have the config be a symlink, so you can do the modifications completely offline, then in a single atomic action make it all live.

vinceguidry · on March 12, 2014

Your configuration management is going to have to be OS-dependent. Nothing is going to be so portable that you'll be able to use the same commands on different distros. POSIX is too leaky an abstraction to rely on.

copergi · on March 12, 2014

I'm not sure if you are agreeing or disagreeing. Configuration management tools already exist that work across multiple operating systems. You can't rely on posix, but you also can't rely on anything else. There's no standard, sane way to get "what error happened" information from typical unix tools.

dmritard96 · on March 11, 2014

idempotency is a nice goal, i tend to run into issues where say somebody changes a chef attribute and re runs chef-client (update) on the machine. Say that was a filepath that got changed. Without knowing about the previous filepath, the only thing that can be done is to work with the new path. its technically idempotent in that if i run it twice without a config change it will not change anything on the second run, but unless on every attribute/recipe change i throw away the old machine and provision a new one there is left over state. That being said, i recreate instances fairly regularly as I believe there are always chaos monkeys lurking :)

Pxtl · on March 11, 2014

Ask forgiveness, not permission. If an error is thrown, handle it and move on.