Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did this with Claude over the holidays. Putting Claude in the role as a guesser and comparing the guess to another experience human player. It turns out they both matched each other.


That's a nice experiment! I think codenames could definietly be an evaluation method for LLMs.


Elo on different card games/board games would be a great eval metric now that the systems are general enough to play Codenames, chess, poker…


totally agree!


It would be fun to build one, perhaps mediated by an app, where you have to guess whether your spymaster is a human or an AI based on the quality of their choices.


The average human is quite bad. It really works well when the spymaster is (a) experienced and (b) familiar with the other players.


It's the (b) case I'm interested in. Like the spymaster loses if they can't subtly indicate to their friends that they're the real deal. Otherwise the robots win.


i thought of adding a feature where you can get your own spy master. you can give it all your personal info and the clues would be customized. the botteleneck is the other human spymaster has to help with updating the game state cus I(guesser) can't look at the spy master view.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: