Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems useful for issues early in the convo but what if the AI responses diverge from the recorded convo prior to the issue being hit?


That’s a great point! While we do our best to simulate an identical case, if the agent responds differently, our focus is on whether the key evaluator or goal for that replay set passes or fails. We use that as the source of truth and flag the exact moment where the conversation diverges from the expected flow.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: