Hacker News new | past | comments | ask | show | jobs | submit login

Which version of RegEx? I've "learned" RegEx two or three times and then switched language/platform and had everything I previously learned no longer work reliably.

You might think I am just talking about Microsoft's quirky implementation but even in the Linux-sphere it isn't consistent see:

http://www.greenend.org.uk/rjk/tech/regexp.html

You take a complex format string which was design to use the fewest characters instead of with clarity in mind, you then have every major application and library diverge on basic support and spec for features, and then you have all of them hack on support for UNICODE in their own unique way.

Regular Expressions likely won't ever die, but I for one would happily switch to an alternative with better readability, UNICODE support from day zero, and fewer niche features to keep things uniform. I'm tired of re-learning RegEx only to have everything I've learned either be forgot or not work the second I app switch.




That's an overstatement of the differences between various regex engines. They all follow the basic standards, with [] being character classes, () being submatches, * being "0 or more", + being "1 or more", etc.

The two main differences between various engines are which characters are "literal" and which characters are "magic" (Vim's engine is particularly annoying here), and how to write the "convenience character classes" (like what the shorthand for "alphanumeric character class" is). But these are minor issues, once you've learned how to write a regex, these are trivial to look up.

Knowledge of regular expressions transfer from one engine to another just fine.


I generally include either \v or \V in my vim regex, at which point I no longer have to think about which characters are magic. I suppose this means that I agree that vim's default is annoying here, but imho vim more than makes up for that by making magic configurable.


> They all follow the basic standards, with [] being character classes, () being submatches

You've already described a feature which has different syntax in one of the primary regex dialects I use (Emacs).


That's the syntactic differences, but there are also semantic ones.

Most notably, the choice operator can either be ordered like in PEGs (if the first branch matches, the other isn't evaluated) or pick the branch that produces the longest match, CFG-like.


For the most part, it's just a matter of knowing if you're using POSIX Basic Regular Expressions (BRE), POSIX Extended Regular Expressions (ERE), or Perl regular expressions.

Learn those, or at least the main differences between them, and the vast majority of the regular expression engines in software you use will become more recognizable.


You wouldn't be programming in Regex and for small things a google search for the platform quirks is usually faster than writing a parser, isn't it?

I'd agree with you if everything weren't so easy to look up.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: