Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey, awesome work. Do you have any generated regexs? It would be nice to see some examples, especially if surprising in some way.


Sure!

Some positive ones:

1) Spot-on prediction:

  PROMPT: lines with 3 or more characters or lower-case letters

  PRED: ((.)|([a-z])){3,}

  GOLD: ((.)|([a-z])){3,}
2) Learned to generalize and produced a simpler regex:

  PROMPT: lines with a character and the string 'dog'

  PRED: .*(.)&(dog).*

  GOLD: .*((.)+)&(dog).*
3) Also learned to generalize and produced simpler regex without duplicate logic:

  PROMPT: lines not containing a letter

  PRED: .*~(([A-z])+).*

  GOLD: (.*)(.*~([A-z]).*)
4) Handling multiple references correctly:

  PROMPT: lines using 'su' after 'sun' or 'soon'.

  PRED: .*(sun|soon).*su.*

  GOLD: .*(sun|soon).*su.*
Though I find the mistakes interesting as well!

1) Issues counting properly:

  PROMPT: lines containing a 5 letter word beginning with 't'

  PRED: .*\bt[A-z]{5}\b.*

  GOLD: .*\bt[A-z]{4}\b.*
2) Misallocation of parenthesis (to be fair, the prompt is slightly ambiguous):

  PROMPT: lines with 'dog' follwed by 'truck' and a lower-case

  PRED: (dog).*((truck)&([a-z])).*

  GOLD: (dog.*truck.*)&(.*[a-z].*)


Mistake 1 - Looks like the classic off-by-one! Definitely the boundary point for a Chomsky Grammar type. Modifications to the code for processing problems like the Sorites paradox would be interesting.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: