People are trying to use gen AI in more and more use-cases, it used to fall flat on its face at trivial stuff, now it got past trivial stuff but still scratching the boundaries of being useful. And that is not an attempt to make the gen AI tech look bad, it is really amazing what it can do - but it is far from delivering on hype - and that is why people are providing critical evaluations.
Lets not forget the OpenAI benchmarks saying 4.0 can do better at college exams and such than most students. Yet real world performance was laughable on real tasks.
> Lets not forget the OpenAI benchmarks saying 4.0 can do better at college exams and such than most students. Yet real world performance was laughable on real tasks.
That's a better criticism of college exams than the benchmarks and/or those exams likely have either the exact questions or very similar ones in the training data.
The list of things that LLMs do better than the average human tends to rest squarely in the "problems already solved by above average humans" realm.
Lets not forget the OpenAI benchmarks saying 4.0 can do better at college exams and such than most students. Yet real world performance was laughable on real tasks.