Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A GPT 3.5 powered tool for generating regex (fuckregex.dev)
38 points by DefineOutside on March 30, 2023 | hide | past | favorite | 23 comments


Pretty neat, and it definitely can generate some complex regex.. But I have to say I don't trust it!

When I add a regex to something, I pretty much always also add a bunch of unit test cases for it, and I request the same if I see one in a PR. It's much easier to just see a bunch of test cases that help validate than it, and frankly, half the time when I write my own cases I think of situations that could be better handled.

Human- or GPT-generated doesn't really matter IMHO; it still needs tests.

Interestingly, chatGPT is pretty good at generating test cases for regex. It would be really cool to see that functionality integrated to this tool.


> chatGPT is pretty good at generating test cases for regex

I think this is the biggest part. Automatically generated test cases for regex will make verifying regex way easier.

Even better, when you come across some ridiculous unknown legacy regex you can ask for an explanation of it and for test cases to verify that explanation.


Rate limit reached for default-gpt-3.5-turbo in organization org-cy3MEIpOsyQxMokN4SQON5gb on requests per min. Limit: 20 / min. Please try again in 3s. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

Ironically, it showed us the real future of AI


I used to work for a tech company which ran our incident management stack on Google Cloud. One day, our incident management tool was unavailable due to some billing issue.

Turns out, someone during the test phase set up a limit of $1/day usage, and nobody ever changed it. We'd just finally hit that limit.

What I'm saying is, this isn't an AI-only problem, but it's comforting to know it's already impacting AI-backed tools.


ChatGPT is also very useful for just explaining what a regexp you already have actually does.


There are some neat tools for visualizing a regex like https://regexper.com/


There's also regexr.com which explains in more or less plain English what's happening


Personally a fan of https://regex101.com/


This is what is gives to select names: /^[A-Z][a-z]+(?: [A-Z][a-z]+)*$/

I'm pretty sure quite a lot of people's names wouldn't be accepted by that. It helps if you add "even weird and foreign names", but who knows if that's actually enough to capture everything.


It's not. I'll be the first one to link to the list of falsehoods programmers believe about names: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

It should be required reading before being allowed to write code that'll operate on user profiles, but there's no such thing as a programming license, so this link will have to do.


There is basically no regex to validate names except for maybe "is not an empty string".


It might be good to say what you want in natural language or have it come up with that from your goal.

A regex that worked for all names would be extremely weak. I can’t imagine it being much more than testing valid characters.


Have you tried GPT-4? I suspect it would do better for these types of characters.


I think I broke it pretty badly asking for "A ll(1) grammar validator" as it answered `^(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:[A-Z]|[a-z])(?:[A-Z]|[a-z]|[0-9])):(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:[A-Z]|[a-z])(?:[A-Z]|[a-z]|[0-9])):)+(?:(?:(?

lol

id: chatcmpl-6zxdwNH3hW7DuZqnCkx4Ukx9rXiuA token usage: 367 date: 1680225296


> /^(?:\/(?:\\\/|[^\/\n])+\/[gimuy]{0,5}|\\[^\n]|[^\n\\\/])+$/

What would you say was the query for this?


The regular version of ChatGPT can explain it pretty well. I asked it about your regex and it gave me a detailed breakdown; it's a bit too long to post it here but just to quote the [gimuy] part:

> [gimuy]{0,5}: This part of the pattern matches optional regex flags that can follow the closing forward slash of the regex pattern. The allowed flags are g (global), i (case-insensitive), m (multiline), u (unicode), and y (sticky). This part of the pattern matches zero to five of these flags, without repetition.

Full response: https://gist.github.com/nicolasff/3bbfb4cb8a514f58e140b887bf...


Finally, something useful to me. If it worked. My experience was spotty in that regard.


Is it time to revive HTML regex meme yet?


I once had a problem I solved with regex, now I have two problems.


If you're using a regex it's a smell that there is a better tool for what you need to do. A ton of if statements is a much more readable way of writing code than regex.


Respectfully, no. If you're trying to write a full blown language parser then regexps are the wrong tool, but a 50 line function doing the job of a regexp has a really funky code smell. Comment your regexps*, and use a parser when your usecase outgrows regexps, but a blanket ban on regexps smells of "I'm not smart enough to understand them".

* https://www.oreilly.com/library/view/regular-expressions-coo...


Unfortunately regex is the best we have. A bunch of if statements is also a whole less efficient and prone to errors than a regex statement. Once you start to learn regex, it actually isn't that bad. The challenging part is that a lot of people cut and paste not really understanding it and you end up with a bunch of bad regex examples.


Eh, I think the challenging part is understanding a complex regex that's already been written. You really do need good comments and unit tests to go along with a regex in order to understand the intent, otherwise they rapidly become unmaintainable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: