I suspect people on LLM Arena don't ask complex questions too often, and reasoning models seem to perform worse than simple models when the goal is just casual conversation or retrieving embedded knowledge. Reasoning models probably 'overthink' in such cases. And slower, too.
The LLM Arena deletes your prompt when you restart so what's the point in trying to write a complicated prompt and testing an exhaustive number of pairs?
It's easy to pin this on the users, but that website is hostile to putting in any effort.
This is something I've noticed a lot actually. A lot of AI projects just give you an input field and call it a day. Expecting the user to do the heavy lifting.