The Pounding Ends: Stockfish Beats Houdini 59-41 in the TCEC Season 11 Superfinal
That's 20-2 in decisive games. One of Houdini's two wins came in the revenge game for a 3...Be7 Tarrasch French, so one can blame that on the opening, and the other win was a 95-move win on the white side of a Dutch...so you can also blame that one on the opening. (No doubt the Stockfish programmers have been hard at work figuring out how they didn't manage to win their white game against the Dutch.)
Those were Houdini's high points, all two of them. Stockfish won 20 games, 13 with White and an impressive 7 with Black. (My two favorites were games 31, its white win in the 3...Be7 Tarrasch French; and game 81, which bore a little resemblance to the sorts of games it lost to Alpha Zero a few months back.) There were many other games where Stockfish had the upper hand, and it wasn't clear whether Houdini would manage to save them. So as convincing as the victory was, it could have been even worse.
Congratulations to the Stockfish team, and I hope Houdini's programmer Robert Houdart doesn't wait a year or two to make improvements, but springs back into action as soon as real life permits. (Likewise, I hope the Komodo team makes some big improvements as well.) Stockfish is doing great, but if they don't feel any pressure they might slacken the pace.
TCEC archives (the games can be downloaded from here).
Reader Comments (4)
Did you look at the 7 Black wins? The question is whether the openings chosen were 'unfair' to White e.g. gave Black a slight advantage. It looks like by the engines' own evaluations, the engines thought so straight out of the book for two KGA games at least. Maybe others too...
Then it'd be a bit like flipping the colors for the pair of games with the opening.
[DM: A good point - and I did note the two KGAs when perusing the games, thinking back to ChessBase's April Fool's joke several years ago in which they claimed that some engine had worked out the KG to a loss. The other Black wins were in a KIA vs. the French, a Nimzo-English with 4.g4, a Classical KID (it also won with White), a Hedgehog, and a 4.Qc2 Nimzo-Indian. Real openings!]
Dennis thanks for the coverage. While the score translates to a ELO difference of only 65. Which is about the difference between Magus Carlsen and Karjakin. I have to agree with you that Stockfish played a whole lot stronger. I would have preferred a more serious selection of openings. They played a Bird's opening, the Polish and a French Wing Gambit (sic) for example. Maybe fun in a human speed game, but rather silly when the tactical skills of both engines made it clear why these openings are not played at the highest levels. This was a Championship, I would have liked to have seen them stick to openings the top masters play when going for a win.
[DM: I don't mind the occasional odd opening - it's instructive to see how they handle such positions, which real people do play (I've faced and/or played them all in serious games against opponents rated 2200-2500, and the "bad" opening wasn't always punished!), and you'll note that those games wound up as draws. But I'm glad that most of the games featured fairly conventional lines. (Besides, everyone and his second is analyzing the snot out of the main lines already.)]
It also was clear that Stockfish used its time very wisely. Houdini was nearly always way behind on time. Even though Houdini was looking faster (nodes/sec) and wider (total nodes) on each move, it was taking longer to select a move. And as the results showed it was not the best one. It would be interesting to see if the losing move was made during the later stages of the game, when Houdini was getting short on time.
[DM: Its time usage was very efficient, but we're mostly able to say that because it played better. If Houdini was making better moves and whipping Stockfish, we might complain about Stockfish's haste. It should have played more slowly! It's hard to avoid anthropomorphizing computer chess.]
An ELO difference of 65 is actually a steep hill to climb for engines today where programmers are struggling mightily to come up with improvements that pass the test to achieve any ELO gain at all. Most attempted improvements fail.
[DM: Actually, Stockfish has been gaining rating points hand over first lately.]
[Snip!]
I'm guessing Stockfish's jump in performance is largely due to the stacks of computer time donated by a mystery source in China recently (mentioned on TCEC chat). If that continues, it'll also encourage them not to slacken the pace, because they'll want to have new things to test so they don't waste it.
Who has the most computer power for testing wins maybe.