The article doesn’t provide any alternatives? I think there are indeed many chal...

The article doesn’t provide any alternatives?

I think there are indeed many challenges when evaluating Compound AI Systems (http://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems...)

But evals in complex systems are the best we have at the moment. It’s a “best-practice” just like all the forms of testing in the “test pyramid” (https://martinfowler.com/articles/practical-test-pyramid.htm...)

Nothing is a silver bullet. Just hard won, ideally automated, integrated quality and verification checks, built deep into the system and SDLC.