I kind of agree. I’ve had very mixed experiences with LLMs and DSLs.
I was writing an NRQL query (New Relic’s log query language) and wanted to essentially do a GROUP BY date_trunc. It kept giving me options that I was eager for, and then the functions it gave me just didn’t exist. After like four back and forths of me telling it that the functions it was giving me didn’t exist - it worked.
Then I needed it to split on the second forward slash of a string and just give me the first piece. It gave me the foundation to fill in the gaps of the function, but the LLM never got it.
In that case, I assume it’s a lack of training data since NRQL is pretty niche.
I catch myself swinging from “holy shit this is impressive” to “wow this sucks” and back regularly for code.
This is similar to my experience with LLMs and DSLs. They tend to hallucinate functions that will magically work in the situation you are describing. My pet theory here is that they are fooled by many forum posts/issues "why doesn't a function exist in this DSLs called ABC that does this?"
I was writing an NRQL query (New Relic’s log query language) and wanted to essentially do a GROUP BY date_trunc. It kept giving me options that I was eager for, and then the functions it gave me just didn’t exist. After like four back and forths of me telling it that the functions it was giving me didn’t exist - it worked.
Then I needed it to split on the second forward slash of a string and just give me the first piece. It gave me the foundation to fill in the gaps of the function, but the LLM never got it.
In that case, I assume it’s a lack of training data since NRQL is pretty niche.
I catch myself swinging from “holy shit this is impressive” to “wow this sucks” and back regularly for code.