Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using different delimiters in sed (2010) (backreference.org)
59 points by navigaid on Dec 26, 2018 | hide | past | favorite | 29 comments


The common alternate character, for those cases where normal slashes are not suitable, is a comma character; ‘,’. With Sed code, readability is always an issue, and it can be improved by not using just any character and limiting yourself to as few alternative modes of syntax as possible.

With a bit of exposure, commands like

s/foo/bar/

and

s,foo,bar,

can be read with equal ease.


> The common alternate character, for those cases where normal slashes are not suitable, is a comma character; ‘,’.

I've always used "|" since it's so visually similar, and you always want to quote regular expressions anyway.


Yes, I’ve seen the vertical bar character used for this too. But in my experience the comma character is more commonly used, and my point was that we should all try to use as commonly used syntax as possible. Therefore I encourage everybody to use the comma character as their usual alternative character in order for the world of Sed code to be slightly less confusing.


My experience, colored more by perl styles than sed per se, is that the vertical bar is the overwhelmingly typical usage, actually. It's what Wall picked for the examples in the camel book, for example.


My experience has been that there isn’t a de facto standard outside of /.

Arguing that people must use some standard you’ve adopted is a little like saying people should use tabs instead of spaces.


Doesn’t that confuse you because it is the alternation symbol (https://regular-expressions.mobi/alternation.html) in many, if not all, regular expression syntaxes?


Given the whole "sed" vs "sed -E" distinction, I'd say most metacharacters are confusing anyway. Not to mention usually when I'm using "|", I'm going to be replacing paths and don't really use alternation that often.


It's rare that I need a sedex with both paths and alternation, but if I do I'll use something other than / or |. That's the point of this article; if what you're using is going to hinder obviousness, USE SOMETHING ELSE. Because... you can.


And comma appears in repetition limits (like in /a{1,2}/). I don't see much difference.


Repetition in sed would be written \{1,2\}, it's unwieldy enough that it's rarely used.


Using "|" can be confusing when the sed command is part of a pipeline. I agree with the other user that the comma character is more used. It is slightly easier to type too.


I prefer

  @
because it is so weird and unexpected people are unlikely to misread the command and think it does something with multiple commands

tho perl’s qr (quote regex) operator is miles ahead, it even supports natural pairs of opening/closing delimiters


I really like doing the following in Perl:

    s{/u/r/i}{do something}
Sadly doesn’t support that (or at least none of the sed’s I’ve used). So I fallback to using hash (#) instead (for no particular reason aside that’s what I’ve always used).


s,,,g is my favorite because it avoid /\//\\/ epilespy


I tend to use : as the alternate delimiters.


On a related note, I found Bruce Barnett's sed tutorial is one of the best Sed guide out there: http://www.grymoire.com/Unix/Sed.html


It was really good, thank you! I learned a ton of new tricks from it

Separately, I feel like I could have a full time job submitting PRs to fix the GNU-ism of using `sed -i"" ...` versus the BSD/macOS syntax of `sed -i "" ...`. I don't know which of those two camps broke from the other, but holy hell it _broke_ for sure.


This strikes me a bit like a "doctor, it hurts when I go like that" problem.

What's the use case [1] for using a zero-length suffix argument (i.e. editing the file "in place" with no backup)?

It's not as if sed is operating on the file actually in place. It uses a temporary file anyway. Varying levels of reliability/portability can be achieved by controlling that temporary file (or subsequent artifacts) oneself, the first level being just to "rm" the backup.

[1] Assuming this is in scripting, since the interactive situation is easily enough adjusted on the fly


Heh, if I had a dollar for every time I have encountered "what are you doing" while dealing with shell, or (worse!) Makefile, tomfoolery in open source projects, I'd be retired. So, in some sense you're right, but ultimately in the "but it doesn't matter" way because I'm not the only one writing shell, nor the only machine upon which things have to build.


At some point, though, if you're the one complaining about a specific detail, in a way, you're the (only) one.

I've seen plenty of un-portable shell scripting, too, but my professional experience includes a time where, essentially, no software [1] could be assumed to be portable, and I did get some dollars for every time, since it was part of my job to ensure as consistent a build environment as possible.

In light of your clarification, my question becomes: isn't it actually good to have such assumption-breaking differences in that they call attention to something that is likely to have a broader pattern of non-portability, in which case a broadly effective [2] workaround can be applied?

[1] Even/especially GNU tools, where there was something of an assumption that the OS would provide at least fairly complete BSD-compatibility. The existence, and evolution, of libiberty and the autotools, among others, should be instructive.

[2] e.g. installing (all the) GNU tools and putting them first in the path on a system that otherwise uses "traditional" syntax


How does the GNU-ism work at a shell level?

In your first example $0 is "sed", $1 is "-i", $2 is "...", no? What signifigance does the -i"" have in bash apart from 'concat -i to ""'?

The BSD syntax is plain, $1 is the option "-i", $2 is the value for that option: "", $3 is "...".

Edit: off by one


Indeed, appending an empty string to -i doesn't modify it in any way, AFAIK.

It's that the GNU version doesn't require an argument, but, if an argument is to be provided, it must be done as part of the flag, not as a separate element of argv. The MacOS version allows either way of providing an argument, so -i.orig tends to be portable (assuming the -i flag is supported in the first place).

Differences in how "traditional" (be they sysv or bsd) versus gnu utilities handle flags and arguments [1] is very well rooted in history, and is hardly unique to sed.

I suspect the main reason this has been forgotten is that Linux, which ships with gnu utilities, has been so dominant for so long, though, even before that, it was difficult to find a then-current unix on which gnu utilities couldn't be installed.

[1] As the sibling comment points out, the source of difference is getopt, of which there were more than just two versions.. including not even using a library


I don't believe it's a bash-ism, I would guess it's GNU getopt defining the `-i` as only optionally requiring an argument, versus the built-in getopt that BSD is using:

https://git.savannah.gnu.org/cgit/sed.git/tree/sed/sed.c?h=v...

https://github.com/freebsd/freebsd/blob/release/7.0.0/usr.bi...

So, while your interpretation of the shell splitting is correct, it's the different getopt declarations that cause this pain


For one offs, I like to change up my delimiter so that I a) remind myslef I can and b)think the slashes are ugly dont like escaping. I prefer pipes.


I’ve seen so many hilariously bad workarounds for this in old shell scripts from lazy engineers who apparently can’t read a man page and don’t realise the delimiter character is substitutable with more or less any character you like.

This is one of this extremely useful features that a surprisingly large number of users don’t seem to know about.


Indeed. `/` covers 99% of my cases, `|` covers 99% of the remaining. I think i might have had to use something "exotic" once or twice in 30+ years now.


Well, I have some bash scripts to improve... great post, another useful fact on HN.


tilda has always been my go-to; in fact i tend to use it in preference to / even if there is no literal / in my statement.

s~foo~bar~

I still use / when pointing out typos in PRs, tho :)

s/calulate/calculate/


The address one is really useful I wish I knew it this very week. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: