> And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4.
What do you mean by "don't buy"? You think lmsys is lying and the leaderboard do not reflect the results? Or that google is lying to lmsys and have a better model to serve exclusively to lmsys but not to others? Or something else?
Most likely the latter. Either Google has a better model which they disguise as Bard to make up for the bad press Bard has received, or Google doesn't really have a better model—just a Gemini Pro fine tuned on GPT-4 data to sound like GPT-4 and rank high in the leaderboard.
> Either Google has a better model which they disguise as Bard
Why wouldn't they use this model in bard then?
Anyway this is easily verifiable claim, are there any prompts that consistently work at lmsys but not at bard interface?
> fine tuned on GPT-4 data to sound like GPT-4 and rank high
This I don't get. Why would many different random people rank bad model that sounds like gpt4 higher than good model that doesn't? What is even the meaning of "better model" in such settings if not user preference?
What do you mean by "don't buy"? You think lmsys is lying and the leaderboard do not reflect the results? Or that google is lying to lmsys and have a better model to serve exclusively to lmsys but not to others? Or something else?