Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Defensive Bash programming (kfirlavi.com)
127 points by urlwolf on May 29, 2014 | hide | past | favorite | 57 comments


The article mentions so many topics, but misses almost all important ones.

* First of all, use proper quoting. There are so many possibilities for file names, command line arguments, etc. that every unquoted usage of a variable is essentially a security risk.

* Then, start your script with "set -e", which stops the script whenever one of the commands fail, instead of blindly continuing and messing things up. This is the most important option for robust shell scripts.

* Also use "set -u" which makes the script stop on undefined variables. This includes $1, $2, etc., so it provides checks for missing arguments for free.

* In addition to "set -e", also set "set -o pipefail", otherwise a pipe will only break if the last command fails, while with "set -o pipefail" the pipe fails whenever any command of the pipe fails.

* After that, you may continue with spacing issues in "for" loops, and that you should not pipe the "find" output directly (instead, use either "-print0" + "xargs -0", or use "-exec"), and similar stuff.

When you got all of this right, and only then!, you may start worrying about the (relatively) minor issues mentioned in the article.


Unfortunately, you cannot depend on `set -e` (or `set -o errexit`) to work as expected in all cases. This usually means any non-trivial bash script is forced to do its own manual error handling, e.g. via ` || die`.

http://mywiki.wooledge.org/BashFAQ/105


That's interesting, but I disagree. The cases in the linked article are uncommon. Most of the time set -e does what you want.

And to the examples in that page: I am unconvinced by examples 1, 3, and 4. In (1), it's your business to know what the exit status of let is, and behave accordingly. In (3), you wrote the function f(), so it's your business to make its exit status correct. Similar for (4).

These are unconvincing examples that seem to deliberately walk into minefields.


FWIW, I can assure you I did not deliberately walk into minefields while developing my recent bash project, and yet I've struggled with `errexit` and `errtrace` to the point where I've ceased to use them, preferring to just ` || die` everything which may possibly fail. Although you may say that simply choosing to use bash is self-defeating already...

Have you reviewed the examples on this page, linked from the FAQ?

http://fvue.nl/wiki/Bash:_Error_handling


Respectfully, this is nuts. You consider it a problem with 'set -e' that, if someone runs your script via

  % bash foo.sh
then a '-e' on the shebang line of foo.sh will be ignored. That's a more general issue, separate from -e. You can say users who do that deserve what they get (not unreasonable IMHO), or you can use a 'set -e' within the script. But don't say it's a problem with -e. It's actually a problem with running command interpreters on arbitrary files.

I can't read any more of these examples. I think you need more experience with shell, or maybe we have incommensurate approaches/expectations.

I'm not of the view that using bash/sh in itself is a bad choice. There are lots of problems that it's well-suited for, and I myself use sh scripts a lot (with a linear flow that can go on for hundreds of lines). In particular, I don't think python is a good replacement for my use cases.


What are you talking about? Are you reading the same page as me?

Let me quote http://fvue.nl/wiki/Bash:_Error_handling Caveat 1:

    #!/bin/bash -e
    echo begin
    (false)
    echo end
Executing the script above gives:

    $ ./caveat1.sh
    begin
    end
    $
This has nothing to do with "ignoring the '-e' on the shebang line."

This is bash ignoring the exit status of the explicit subshell, even though `errexit` is set.


Sorry, my impatience led me to misread the page.


Exactly. It makes you think about what you want to happen if that command fails. Die hard, or try to salvage it. You have to write shell scripts with set -e in mind from the get-go, and that is a good thing.


It would be a good thing if `set -e` worked as expected. The point of the FAQ is that `set -e` does not catch every failure, so you need to do your own checking anyway.

http://mywiki.wooledge.org/BashFAQ/105/Answers


I wrote my first shell script that actually matters a few weeks ago (in other words, I'm a total newb). Its purpose is to validate the configuration of another app. It seems to me that "set -o pipefail" would get in my way, because I'm often doing things like this:

    adapter_properties_home_community_id=$(cat ${EXTERNAL_VARIABLE}/adapter.properties \
    | grep -P '^HomeCommunityId=' | sed 's/HomeCommunityId=//g' \
    || err "line $LINENO: Could not find HomeCommunityId in the adaper.properties")
If I "set -o pipefail", won't any failure in this chain of pipes halt the program? That will prevent the user from seeing the useful error message that says "Could not find HomeCommunityId in the adaper.properties"

I'm also concerned about "set -u". That EXTERNAL_VARIABLE is set external to my validator script and available to the shell. Any appliance with the app should have that variable set. I have a check to make sure it's set and it's valid, but if I had "set -u", wouldn't that prevent me from checking its value because it hasn't been defined in the script? What's the work around to this, because I see how "set -u" would be very useful in most other situations.


Some specific remarks on your shell script at the end...

> If I "set -o pipefail", won't any failure in this chain of pipes halt the program?

No, due to your use of the || operator:

    foo | bar || baz
...as a whole will always exit 0, even if foo or baz exits non-zero. The precedence there is actually

    ( foo | bar ) || baz
I.e, the pipe operator binds tighter than any of the logic operators. Also, `set -o pipefail` doesn't cause your shell script to exit on the first error; you need `set -e` or explicit exit status checking for that. What `set -o pipefail` actually does is change the way exit status for a pipeline as a whole works. Consider the following pipeline:

    foo | bar | baz
Without `set -o pipefail`, if either foo or bar exit non-zero, but baz exits zero, this pipeline will appear to have succeeded, because by default a pipeline gets the exit status of its last element. Ignoring errors is bad. So `set -o pipefail` makes the pipeline take the exit status of the last element that failed (if any failed).

Regarding `set -u`, I don't personally use it, I just check that kind of stuff at the beginning of the script.

Now, on to your shell script example: in practice you'll never see an error because both grep and sed return true even when no matching or substitution is done. You also have a useless use of cat [1] at the beginning; just use grep directly on that file.

I would suggest simply doing

    sed -ne 's/^HomeCommunityId=//p' "${EXTERNAL_VARIABLE}"/adapter.properties
for grabbing that variable. Then see if the variable is empty with a `-z` test.

Hope this helps. Shell is weird but definitely worth learning!

[1] http://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat


I should have mentioned that

  1. I am already using set -e
  2. That `err` function has an `exit 1` as its last step, 
  which forces the program to stop running when combined with the first point.
> Now, on to your shell script example: in practice you'll never see an error because both grep and sed return true even when no matching or substitution is done.

That's not what I'm seeing:

    > echo "hi" | grep "bye"
    > echo $?
    1
I'm typing that into my shell.

Thanks for the reply, it was very helpful.


Ah, you're correct about this:

    echo "hi" | grep "bye" ; echo $?
    1


You're absolutely right, and this very list makes me think: Just program any non-trivial script in Python and use "subprocess" + "argparse". Heck, maybe even use Python if it's trivial -- since things have a tendency to accrue complexity over time you might as well avoid the "management won't let me rewrite in $DIFFERENT_LANGUAGE" trap.


The first rule of defensive bash programming should be: quote everything. Incredibly, the article doesn't mention quoting at all, doesn't even use it silently in examples.


He also doesn't set -eu or -o pipefail.


I always recommend people use zsh for scripting instead of bash--it gets word splitting right. Most of my sysadminny scripts start with #!/bin/zsh these days. It would be really nice if Linux distros came with zsh installed automatically!

For some reason the shell quoting behavior never seems to get brought up when comparing zsh and bash. It is, IMO, the most important distinction between the two shells.


That's the tricky thing about 'sysadminny' scripts is that they generally need to be written for the least common denominator, which usually ends up being a very old version of $(software). I frequently have to use Python 2.4 to reach many of the RHEL-ish 5 machines in our environment.


As someone who's never used zsh - you don't need to quote stuff in zsh?


By default, no. Word splitting only happens when you request it: ${=words}. It reminds me of ,@list in Lisp, which is a warm and fuzzy feeling.


It depends on options set. There is one that causes word splitting only to happen when explicitly asked for.


Actually the first rule of bash programming should be: Don't use bash.

If your script is longer than 10 lines then you should probably write it in a less error prone language.


Generally, bash is for quick and dirty things I want to automate. I'll go to perl or python if I need anything more complex.

The amount of effort put into these examples is already way higher than my personal sniff test for "Should I be doing this in something besides bash?"


That might be chicken-and-egg. By following more rigorous practices, bash might become more acceptable to you for more complex tasks.


Nope. The amount of rigor required to sanitize Bash or any other descendant of sh or csh is not worth it. There are problems such as the lack of real return values from functions that no amount of discipline can solve, and the rigor required to solve things like crazy word splitting is too hard to consistently get right.


Such as managing complex build processes.


If you do have the choice, the first rule of defensive bash programming is to not program in bash. But otherwise it was a great article.


I freaking love "set -x" and wish every language had an equivalent. When I switch from bash to another language I miss "set -x" deep in my soul.



"set -x" enables debug-mode, for anyone else heading off to Google to investigate.

So given the script:

    #!/bin/bash
    set -x
    echo "Hello World"
The output would be:

    + echo 'Hello World!'
    Hello World!
Where the "+ ..." is the command executed. And it can be disabled with "set +x".


I agree, and refer to it as a 'call trace'. Not to be confused with stack traces, which are the same thing inside out.


You can use the pipe | as a continuation at the end of a line. No need to use the backslash escape then. i.e. you never should need to use | \ at the end of a line.


Right, this is a style choice. He explicitly mentions '| \' as a bad example.

https://google-styleguide.googlecode.com/svn/trunk/shell.xml...


Defining a function for is_file as [[ -f $blah ]]; seems like defining a function for var++ as increment_one() ...


I get why that seems silly, and I write shell all day (for provisioning Vagrant machines where I don't need Puppet. I'm still yet to work Ansible properly...), so I know exactly what that command does.

But to some of my coworkers who don't write shell very often at all, it would make the script far clearer (which is useful as they need to be able to edit it on the fly) to those who don't know shell


Encouraging coworkers to learn the proper syntax seems a better choice than hiding it with personally chosen abstractions (albeit more readable).


I don't disagree, but in some contexts that trade off is worth it, in my opinion.


You should strive for better readability for intended audience in every single program you write. It's like writing prose, really. It's ok to write "notes to self" with words shortened to single letters and strange symbols all around the place. It's not ok to do the same in a report from a meeting.

You should always remember who you're writing for and write accordingly. Hint: you almost never write for the computer.


Yeah, definitely better to keep it idiomatic. Also:

$ test -f "$file"

is pretty clear I think.


The "proper" syntax is dumb. I've never considered writing functions to get around it, but seeing this makes me want to contribute some builtins to bash.


The correct answer. ^^^


Then just use test instead.


And yet he pipes from ls...


And all those functions wrapping [[, ugh.

maybe I should start doing this in Python

    def plus (a,b):
        return a+b


What is wrong with piping `ls`?


It's usually redundant. for i in * <--> for i in `ls`


Usually it is more pristine (from a theoretical standpoint) to use filename generation, aka globbing. You don't need a subprocess.



Yes, of course. Use more bashisms, wait for bash to change its behaviour in those bashisms (it already happened and was not that rare) and happy debugging.


Given the current glacial pace of bash development?

The problem is that the only other 'standard' shell scripts can rely on is POSIX/Bourne, which is a bit anemic. Back in the days, this was the realm where the Korn shell was supposed to reign. And amongst commercial unices, it actually did help (the license of ksh88/93 prohibited widespread BSD/Linux use). But sadly most of the features that zsh/bash copied or invented themselves concern the UI, not programming capabilities.

I've seen some pretty big programs in ksh93 and it was half-way decent. Then again, quite often you shelled out to awk/grep/ed and had to fight their incompatibilities... I don't really miss those days.

On the other hand, sometimes I wish Linux scripters/programmers would be forced to use a slightly incompatible system now and then so that they don't assume that the whole world is GNU. I'm not misanthropic enough to say that said system should be IBM AIX, though.


> the only other 'standard' shell scripts can rely on is POSIX/Bourne

Have you seen POSIX? Have you read Bourne shell documentation? I haven't. The former is expensive, the latter is not that easy to get. I have read Single UNIX Specification, though, which is available on-line.

> But sadly most of the features that zsh/bash copied or invented themselves concern the UI, not programming capabilities.

Shell is not intended for regular programming. It's intended for small automation scripts. If you need to write anything bigger, choosing shell over Perl, Python or Ruby is a fundamentally bad idea. BTDTGTT.

And do you know how much of SUS-guaranteed shell syntax you use, anyway? Do you know what is a bashism and what is in the specification, so you can claim SUS-compliant shell has too weak language?


> Have you seen POSIX? Have you read Bourne shell documentation? I haven't. The former is expensive, the latter is not that easy to get. I have read Single UNIX Specification, though, which is available on-line.

What is the distinction you are trying to make?

Single UNIX® Specification, Version 4, 2013 Edition

Technically identical to IEEE Std 1003.1, 2013 Edition and ISO/IEC 9945:2009 including ISO/IEC 9945:2009/Cor 1:2013(E), with the addition of X/Open Curses.

https://www2.opengroup.org/ogsys/catalog/t101

IEEE Std 1003.1 is also known as POSIX.1. You can read it online. It documents the shell command language. http://ddg.gg/?q=1003.1-2013

See also the POSIX FAQ: http://www.opengroup.org/austin/papers/posix_faq.html

EDIT: Source to bourne shell's manual is here, http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1...

There's also a PDF version, look in volume 1: http://plan9.bell-labs.com/7thEdMan/bswv7.html


I thought that UPPER_CASE variables were a bad idea? Doesn't the bash world generally accept that you should use uppercase only for enviornment variables, and lowercase for variables in the script's context?


The author specifies using upper case for global variables – not for local variables. I’d consider that to be a useful technique and practice it myself. I also agree with minimising the use of global variables in the first place.


Is threesome kind of secret code embedded in the typos?


Corporate gateway seems to not like this domain:

This web site ( www.kfirlavi.com ) has been blocked because it has been determined by Web Reputation Filters to be a security threat to your computer or the corporate network. This web site has been associated with malware/spyware. Reputation Score for www.kfirlavi.com: -7.1

Reputation scores can range from -10 (worst) through 10 (best).


I can't see anything there that looks troubling. Maybe it was hacked at some point or the domain used to be used for evil before the current owner got it.

Or your corporate filter is having an off day or is just rubbish...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: