I’ve been thinking about this. We seem to be overfitting LLMs to benchmarks that represent “intelligence”, but I’m not sure that they represent the kinds of tasks that small business and enterprises want done. Is time for another Anon et al. that sets up a benchmark and council ala TPC? Hopefully, without the politics.