Leela played stockfish 200 games and won with 106 - 94 [1]. Not sure which versi...

dmurray · on Jan 13, 2021

It's difficult to organise Leela-Stockfish as a fair fight, because they run on different hardware (CPU vs GPU) and both get substantial improvements by playing on better hardware.

Traditionally this wasn't a big problem as every engine was more or less optimised for a fast Intel CPU with a moderate to large amount of RAM. The organisers would decree the specs of the championship hardware some time in advance. Now, (at least for TCEC, the other major engine tournament) they pick two hardware configurations, one CPU-heavy and one GPU-heavy, and give each team the choice.

How do you balance those? It's been suggested you should make them equal in terms of watts of power, or dollar cost to buy, but neither of those are obviously best. In practice the TCEC organisers pick something close to what they picked last time but shade it against the winning engines, making the contest more even. Chess.com likely do something similar though they're less rigorous about the details.

criddell · on Jan 13, 2021

> It's been suggested you should make them equal in terms of watts of power, or dollar cost to buy, but neither of those are obviously best.

Do they give these computers a rating like they do human players?

dmurray · on Jan 13, 2021

Yes, there are several computer chess rating lists, but you run into the same problems once you want to compare a CPU engine with a GPU one: how do you account for the hardware? Really a rating, as a measure of playing strength, should apply to a hardware/software combination, but we almost always want to compare the software. The most respected lists used to be the SSDF [0] and CCRL [1] lists but they both ignore the GPU issue. There's now the CEGT list [2] which I don't know anything about, but does seem to use heterogenous hardware.

Note that the numbers aren't calibrated to human players, so there's no claim that an engine with a rating of 3100 on some particular hardware should score 85% against a human rated 2800, or whatever.

[0] http://ssdf.bosjo.net/list.htm [1] https://ccrl.chessdom.com/ccrl/404FRC/ [2] http://www.cegt.net/40_40%20Rating%20List/40_40%20SingleVers...

zone411 · on Jan 13, 2021

This match was played before the NNUE version of Stockfish was introduced. Stockfish NNUE beat LC0 in TCEC season 19: https://www.chessprogramming.org/TCEC_Season_19

thomasahle · on Jan 13, 2021

The chess.com version is the old Stockfish from mid 2020. The NUE architecture was only put in place around August. If you see https://github.com/glinscott/fishtest/wiki/Regression-Tests you'll notice that gave a very significant (100+ ELO) boost.

There is a current tournament going on at tcec-chess.com/ which stockfish has been leading so far, but I see Leela has just caught up in the head to head.

Of course Leela also keeps evolving.

kohlerm · on Jan 13, 2021

FYI Leela is in the lead ATM