LM arena public voting is not objective for LLM evaluation

ClassyJacket · 2025-01-24T00:00:06 1737676806

The post has been deleted.

lostmsu · 2025-01-24T05:58:52 1737698332

TL;DR; A Reddit user claimed to have run a script that pretended to be a human on Chatbot Arena, upvoted Gemini thousands of times and similarly downvoted OpenAI's rival model. The script would detect models by recognizing model specific responses to artificially limited topics and certain forms of gibberish.