Hacker News new | past | comments | ask | show | jobs | submit login
We need better LLM evaluations (generatingconversation.substack.com)
2 points by mehulashah 11 months ago | hide | past | favorite | 1 comment



I’ve been thinking about this. We seem to be overfitting LLMs to benchmarks that represent “intelligence”, but I’m not sure that they represent the kinds of tasks that small business and enterprises want done. Is time for another Anon et al. that sets up a benchmark and council ala TPC? Hopefully, without the politics.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: