I did this with Claude over the holidays. Putting Claude in the role as a guesse...

suveen_ellawela · 2025-01-22T17:19:43 1737566383

That's a nice experiment! I think codenames could definietly be an evaluation method for LLMs.

gordonhart · 2025-01-25T12:50:19 1737809419

Elo on different card games/board games would be a great eval metric now that the systems are general enough to play Codenames, chess, poker…

suveen_ellawela · 2025-01-25T23:38:00 1737848280

totally agree!

__MatrixMan__ · 2025-01-25T14:59:40 1737817180

It would be fun to build one, perhaps mediated by an app, where you have to guess whether your spymaster is a human or an AI based on the quality of their choices.

zeroonetwothree · 2025-01-25T15:05:48 1737817548

The average human is quite bad. It really works well when the spymaster is (a) experienced and (b) familiar with the other players.

__MatrixMan__ · 2025-01-25T15:21:58 1737818518

It's the (b) case I'm interested in. Like the spymaster loses if they can't subtly indicate to their friends that they're the real deal. Otherwise the robots win.

suveen_ellawela · 2025-01-25T23:39:41 1737848381

i thought of adding a feature where you can get your own spy master. you can give it all your personal info and the clues would be customized. the botteleneck is the other human spymaster has to help with updating the game state cus I(guesser) can't look at the spy master view.