Using different delimiters in sed (2010)

teddyh · on Dec 26, 2018

The common alternate character, for those cases where normal slashes are not suitable, is a comma character; ‘,’. With Sed code, readability is always an issue, and it can be improved by not using just any character and limiting yourself to as few alternative modes of syntax as possible.

With a bit of exposure, commands like

s/foo/bar/

and

s,foo,bar,

can be read with equal ease.

cyphar · on Dec 26, 2018

> The common alternate character, for those cases where normal slashes are not suitable, is a comma character; ‘,’.

I've always used "|" since it's so visually similar, and you always want to quote regular expressions anyway.

teddyh · on Dec 26, 2018

Yes, I’ve seen the vertical bar character used for this too. But in my experience the comma character is more commonly used, and my point was that we should all try to use as commonly used syntax as possible. Therefore I encourage everybody to use the comma character as their usual alternative character in order for the world of Sed code to be slightly less confusing.

ajross · on Dec 26, 2018

My experience, colored more by perl styles than sed per se, is that the vertical bar is the overwhelmingly typical usage, actually. It's what Wall picked for the examples in the camel book, for example.

laumars · on Dec 26, 2018

My experience has been that there isn’t a de facto standard outside of /.

Arguing that people must use some standard you’ve adopted is a little like saying people should use tabs instead of spaces.

Someone · on Dec 26, 2018

Doesn’t that confuse you because it is the alternation symbol (https://regular-expressions.mobi/alternation.html) in many, if not all, regular expression syntaxes?

cyphar · on Dec 27, 2018

Given the whole "sed" vs "sed -E" distinction, I'd say most metacharacters are confusing anyway. Not to mention usually when I'm using "|", I'm going to be replacing paths and don't really use alternation that often.

michaelcampbell · on Dec 26, 2018

It's rare that I need a sedex with both paths and alternation, but if I do I'll use something other than / or |. That's the point of this article; if what you're using is going to hinder obviousness, USE SOMETHING ELSE. Because... you can.

marcosdumay · on Dec 26, 2018

And comma appears in repetition limits (like in /a{1,2}/). I don't see much difference.

bonzini · on Dec 26, 2018

Repetition in sed would be written \{1,2\}, it's unwieldy enough that it's rarely used.

yiyus · on Dec 26, 2018

Using "|" can be confusing when the sed command is part of a pipeline. I agree with the other user that the comma character is more used. It is slightly easier to type too.

falsedan · on Dec 26, 2018

I prefer

because it is so weird and unexpected people are unlikely to misread the command and think it does something with multiple commands

tho perl’s qr (quote regex) operator is miles ahead, it even supports natural pairs of opening/closing delimiters

laumars · on Dec 26, 2018

I really like doing the following in Perl:

    s{/u/r/i}{do something}

Sadly doesn’t support that (or at least none of the sed’s I’ve used). So I fallback to using hash (#) instead (for no particular reason aside that’s what I’ve always used).

agumonkey · on Dec 26, 2018

s,,,g is my favorite because it avoid /\//\\/ epilespy

jolmg · on Dec 26, 2018

I tend to use : as the alternate delimiters.

devy · on Dec 26, 2018

On a related note, I found Bruce Barnett's sed tutorial is one of the best Sed guide out there: http://www.grymoire.com/Unix/Sed.html

mdaniel · on Dec 26, 2018

It was really good, thank you! I learned a ton of new tricks from it

Separately, I feel like I could have a full time job submitting PRs to fix the GNU-ism of using `sed -i"" ...` versus the BSD/macOS syntax of `sed -i "" ...`. I don't know which of those two camps broke from the other, but holy hell it _broke_ for sure.

mmt · on Dec 26, 2018

This strikes me a bit like a "doctor, it hurts when I go like that" problem.

What's the use case [1] for using a zero-length suffix argument (i.e. editing the file "in place" with no backup)?

It's not as if sed is operating on the file actually in place. It uses a temporary file anyway. Varying levels of reliability/portability can be achieved by controlling that temporary file (or subsequent artifacts) oneself, the first level being just to "rm" the backup.

[1] Assuming this is in scripting, since the interactive situation is easily enough adjusted on the fly

mdaniel · on Dec 26, 2018

Heh, if I had a dollar for every time I have encountered "what are you doing" while dealing with shell, or (worse!) Makefile, tomfoolery in open source projects, I'd be retired. So, in some sense you're right, but ultimately in the "but it doesn't matter" way because I'm not the only one writing shell, nor the only machine upon which things have to build.

mmt · on Dec 27, 2018

At some point, though, if you're the one complaining about a specific detail, in a way, you're the (only) one.

I've seen plenty of un-portable shell scripting, too, but my professional experience includes a time where, essentially, no software [1] could be assumed to be portable, and I did get some dollars for every time, since it was part of my job to ensure as consistent a build environment as possible.

In light of your clarification, my question becomes: isn't it actually good to have such assumption-breaking differences in that they call attention to something that is likely to have a broader pattern of non-portability, in which case a broadly effective [2] workaround can be applied?

[1] Even/especially GNU tools, where there was something of an assumption that the OS would provide at least fairly complete BSD-compatibility. The existence, and evolution, of libiberty and the autotools, among others, should be instructive.

[2] e.g. installing (all the) GNU tools and putting them first in the path on a system that otherwise uses "traditional" syntax

thisacctforreal · on Dec 26, 2018

How does the GNU-ism work at a shell level?

In your first example $0 is "sed", $1 is "-i", $2 is "...", no? What signifigance does the -i"" have in bash apart from 'concat -i to ""'?

The BSD syntax is plain, $1 is the option "-i", $2 is the value for that option: "", $3 is "...".

Edit: off by one

mmt · on Dec 27, 2018

Indeed, appending an empty string to -i doesn't modify it in any way, AFAIK.

It's that the GNU version doesn't require an argument, but, if an argument is to be provided, it must be done as part of the flag, not as a separate element of argv. The MacOS version allows either way of providing an argument, so -i.orig tends to be portable (assuming the -i flag is supported in the first place).

Differences in how "traditional" (be they sysv or bsd) versus gnu utilities handle flags and arguments [1] is very well rooted in history, and is hardly unique to sed.

I suspect the main reason this has been forgotten is that Linux, which ships with gnu utilities, has been so dominant for so long, though, even before that, it was difficult to find a then-current unix on which gnu utilities couldn't be installed.

[1] As the sibling comment points out, the source of difference is getopt, of which there were more than just two versions.. including not even using a library

mdaniel · on Dec 26, 2018

I don't believe it's a bash-ism, I would guess it's GNU getopt defining the `-i` as only optionally requiring an argument, versus the built-in getopt that BSD is using:

https://git.savannah.gnu.org/cgit/sed.git/tree/sed/sed.c?h=v...

https://github.com/freebsd/freebsd/blob/release/7.0.0/usr.bi...

So, while your interpretation of the shell splitting is correct, it's the different getopt declarations that cause this pain

SubiculumCode · on Dec 26, 2018

For one offs, I like to change up my delimiter so that I a) remind myslef I can and b)think the slashes are ugly dont like escaping. I prefer pipes.

giobox · on Dec 26, 2018

I’ve seen so many hilariously bad workarounds for this in old shell scripts from lazy engineers who apparently can’t read a man page and don’t realise the delimiter character is substitutable with more or less any character you like.

This is one of this extremely useful features that a surprisingly large number of users don’t seem to know about.

michaelcampbell · on Dec 26, 2018

Indeed. `/` covers 99% of my cases, `|` covers 99% of the remaining. I think i might have had to use something "exotic" once or twice in 30+ years now.

keyle · on Dec 26, 2018

Well, I have some bash scripts to improve... great post, another useful fact on HN.

finnh · on Dec 26, 2018

tilda has always been my go-to; in fact i tend to use it in preference to / even if there is no literal / in my statement.

s~foo~bar~

I still use / when pointing out typos in PRs, tho :)

s/calulate/calculate/

_ugfj · on Dec 26, 2018

The address one is really useful I wish I knew it this very week. Thanks!