These regexps might be useful as first approximations - but they are not as robust as the article suggest they are (for example numbers can contain decimal point or decimal comma depending on your language, parsing email addresses is a subject for an essay and I've heard the correct Perl regexp for that takes about a page of text). This article is harmful - because those who can benefit from this kind of basic regexp examples are also those who will not understand the limitations.
> because those who can benefit from this kind of basic regexp examples are also those who will not understand the limitations.
I share your sentiment. The article is certainly useful for learning regexps. But IMHO it should also point to the correct way of doing things - often, the correct way is using a module and thus the resulting code is not much longer than the code in the article. For email address validation:
use Email::Valid;
print (Email::Valid->address('john@example.com') ? 'valid' : 'invalid');
Point taken - what I primarily wanted to show is that the correct way is not much longer than the "looks like" solution. Yes, the Regexp is long, but it is nicely encapsulated in the Email::Valid module.
Yeah - I've read only the first sentence. I think many of those that will find that article from google and even use that code will also not pay much attention to that weak disclaimer. Also these were not the only problems with his code - see the comments at that page (in particular: http://www.catonmat.net/c/35784).
But the more important point is that an article that sounds so authoritative should present much higher quality.
But I doubt someone new to regexps would understand what "looks like" means in that context. For instance they might think "ok, so something like 'abc@efg.xyz' matches, even though it's not a real email address." They might not think to consider that a full sentence like "hey, I'll see you tomorrow @ 2. Can't wait!" also matches.
That said, perl one-liners are certainly useful so thanks to the OP for putting this together. I just think it would add a lot of value to include examples of where one is likely to go wrong.
Mail::RFC822::Address is a Perl module to validate email addresses according to the RFC 822 grammar. (...) Implementing validation with regular expressions somewhat pushes the limits of what it is sensible to do with regular expressions, although Perl copes well:
I think this collection clearly shows the simplicity and utility of short, potentially imperfect regular expressions. Reading and writing expressions like this--even if you never use them in your code--is a skill that almost every programmer would benefit from.
I approach these articles as a way to show off the expressive abilities of a language, not to provide code snippets for cut-and-paste.
As others have mentioned (and as the articles themselves repeatedly mention), most of the time, if you have a common problem, you should find and use a CPAN module already built to solve it.
I have never used a code snippet from a one-liner compilation in one of my programs, but I have learned a lot of new constructs by reading them.
Man, I like perl, and learning how to operate these sorts of regular expressions is useful, but a lot of the the "one-liners" are infested with magic variables and the sort of freakish syntactical constructs which give the language a bad name.
Good Perl is basically 72% of Ruby. (Less syntactic sugar. Less-structured reflection. And mildly crufty sigils - not that you'll notice those after your second week, though.)