A GPT 3.5 powered tool for generating regex

gregmac · on March 30, 2023

Pretty neat, and it definitely can generate some complex regex.. But I have to say I don't trust it!

When I add a regex to something, I pretty much always also add a bunch of unit test cases for it, and I request the same if I see one in a PR. It's much easier to just see a bunch of test cases that help validate than it, and frankly, half the time when I write my own cases I think of situations that could be better handled.

Human- or GPT-generated doesn't really matter IMHO; it still needs tests.

Interestingly, chatGPT is pretty good at generating test cases for regex. It would be really cool to see that functionality integrated to this tool.

l33t233372 · on March 30, 2023

> chatGPT is pretty good at generating test cases for regex

I think this is the biggest part. Automatically generated test cases for regex will make verifying regex way easier.

Even better, when you come across some ridiculous unknown legacy regex you can ask for an explanation of it and for test cases to verify that explanation.

BMc2020 · on March 30, 2023

Rate limit reached for default-gpt-3.5-turbo in organization org-cy3MEIpOsyQxMokN4SQON5gb on requests per min. Limit: 20 / min. Please try again in 3s. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

Ironically, it showed us the real future of AI

jwestbury · on March 31, 2023

I used to work for a tech company which ran our incident management stack on Google Cloud. One day, our incident management tool was unavailable due to some billing issue.

Turns out, someone during the test phase set up a limit of $1/day usage, and nobody ever changed it. We'd just finally hit that limit.

What I'm saying is, this isn't an AI-only problem, but it's comforting to know it's already impacting AI-backed tools.

williamstein · on March 30, 2023

ChatGPT is also very useful for just explaining what a regexp you already have actually does.

abracadaniel · on March 30, 2023

There are some neat tools for visualizing a regex like https://regexper.com/

vector_spaces · on March 30, 2023

There's also regexr.com which explains in more or less plain English what's happening

imilk · on March 30, 2023

Personally a fan of https://regex101.com/

thomasahle · on March 30, 2023

This is what is gives to select names: /^[A-Z][a-z]+(?: [A-Z][a-z]+)*$/

I'm pretty sure quite a lot of people's names wouldn't be accepted by that. It helps if you add "even weird and foreign names", but who knows if that's actually enough to capture everything.

zamnos · on March 31, 2023

It's not. I'll be the first one to link to the list of falsehoods programmers believe about names: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

It should be required reading before being allowed to write code that'll operate on user profiles, but there's no such thing as a programming license, so this link will have to do.

rjh29 · on March 31, 2023

There is basically no regex to validate names except for maybe "is not an empty string".

travisjungroth · on March 30, 2023

It might be good to say what you want in natural language or have it come up with that from your goal.

A regex that worked for all names would be extremely weak. I can’t imagine it being much more than testing valid characters.

totoglazer · on March 30, 2023

Have you tried GPT-4? I suspect it would do better for these types of characters.

themineraria · on March 31, 2023

I think I broke it pretty badly asking for "A ll(1) grammar validator" as it answered `^(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:[A-Z]|[a-z])(?:[A-Z]|[a-z]|[0-9])):(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:[A-Z]|[a-z])(?:[A-Z]|[a-z]|[0-9])):)+(?:(?:(?

lol

id: chatcmpl-6zxdwNH3hW7DuZqnCkx4Ukx9rXiuA token usage: 367 date: 1680225296

mike_hock · on March 30, 2023

> /^(?:\/(?:\\\/|[^\/\n])+\/[gimuy]{0,5}|\\[^\n]|[^\n\\\/])+$/

What would you say was the query for this?

thamer · on March 31, 2023

The regular version of ChatGPT can explain it pretty well. I asked it about your regex and it gave me a detailed breakdown; it's a bit too long to post it here but just to quote the [gimuy] part:

> [gimuy]{0,5}: This part of the pattern matches optional regex flags that can follow the closing forward slash of the regex pattern. The allowed flags are g (global), i (case-insensitive), m (multiline), u (unicode), and y (sticky). This part of the pattern matches zero to five of these flags, without repetition.

Full response: https://gist.github.com/nicolasff/3bbfb4cb8a514f58e140b887bf...

TradingPlaces · on March 31, 2023

Finally, something useful to me. If it worked. My experience was spotty in that regard.

sw1sh · on March 30, 2023

Is it time to revive HTML regex meme yet?

sv123 · on March 30, 2023

I once had a problem I solved with regex, now I have two problems.

Shindi · on March 30, 2023

If you're using a regex it's a smell that there is a better tool for what you need to do. A ton of if statements is a much more readable way of writing code than regex.

zamnos · on March 30, 2023

Respectfully, no. If you're trying to write a full blown language parser then regexps are the wrong tool, but a 50 line function doing the job of a regexp has a really funky code smell. Comment your regexps*, and use a parser when your usecase outgrows regexps, but a blanket ban on regexps smells of "I'm not smart enough to understand them".

* https://www.oreilly.com/library/view/regular-expressions-coo...

encryptluks2 · on March 30, 2023

Unfortunately regex is the best we have. A bunch of if statements is also a whole less efficient and prone to errors than a regex statement. Once you start to learn regex, it actually isn't that bad. The challenging part is that a lot of people cut and paste not really understanding it and you end up with a bunch of bad regex examples.

jwestbury · on March 31, 2023

Eh, I think the challenging part is understanding a complex regex that's already been written. You really do need good comments and unit tests to go along with a regex in order to understand the intent, otherwise they rapidly become unmaintainable.