That's a valid concern, I think just like with any other software you need to wr...

jaredsohn · on May 15, 2023

The differences here compared to unit tests are that the breaking is outside of your control and the updating process is tedious. Also, testing requires making real API calls rather than using stubs so it requires additional infrastructure.

tiborsaas · on May 15, 2023

That's all true, but sometimes things break because some package is bumped. I'd still like to know if my app is basically broken if an LLM has changed somehow.

You are right that fine tuning would probably help to minimize the risks, but it probably never can be zero. New tests will also be needed when customers find new edge cases that break our assumptions.

Testing LLM prompts is a new paradigm that we'll have to learn to deal with.