Maybe AlphaZero Isn't *Quite* As Strong As It Seems?
It's still a great achievement, especially for four hours' work on the engine's part, but a commenter to the Chess24 story thinks that AlphaZero's +28=72-0 crush of Stockfish 8 isn't nearly as bad as it seems on the surface. Here's what "maelic" writes:
It is a nice step different direction, perhaps the start if the revolution but Alpha Zero is not yet better than Stockfish and if you keep up with me I will explain why. Most of the people are very excited now and wishing for sensation so they don't really read the paper or think about what it says which leads to uninformed opinions.
The testing conditions were terrible. 1min/move is not really suitable time for any engine testing but you could tolerate that. What is intolerable though is the hashtable size - with 64 cores Stockfish was given, you would expect around 32GB or more otherwise it fills up very quickly leading to markant reduce in strenght - 1GB was given and that far from ideal value! Also SF was now given any endgame tablebases which is current norm for any computer chess engine.
The computational power behind each entity was very different - while SF was given 64 CPU threads (really a lot I've got to say), Alpha Zero was given 4 TPUs. TPU is a specialized chip for machine learning and neural network calculations. It's estimated power compared to classical CPU is as follows - 1TPU ~ 30xE5-2699v3 (18 cores machine) -> Aplha Zero had at it's back power of ~2000 Haswell cores. That is nowhere near fair match. And yet, eventhough the result was dominant, it was not where it would be if SF faced itself 2000cores vs 64 cores, It that case the win percentage would be much more heavily in favor of the more powerful hardware.
From those observations we can make an conclusion - Alpha Zero is not so close in strenght to SF as Google would like us to believe. Incorrect match settings suggest either lack of knowledge about classical brute-force calculating engines and how they are properly used, or intention to create conditions where SF would be defeted.
With all that said, It is still an amazing achievement and definitively fresh air in computer chess, most welcome these days. But for the new computer chess champion we will have to wait a little bit longer.
Computer specialists, what say you?
Reader Comments (4)
This is really an apples and oranges question. The TPU is a neural net dedicated hardware that couldn't run SF, so it's not easy to make a "fair" comparison. The truly amazing thing here is that the AlphaZero approach appears to work across different games with virtually no domain knowledge added. This is basically the reason why chess programs were never really considered intelligent; they could not generalize to anything outside of their original narrow scope. AlphaZero is nowhere near General AI, of course, but it does represent a significant step in that direction.
Indeed, it seems rather meaningless to compare hardware in this case, when the two AI programs are based on such totally different architectures; note that the paper indicates a search speed for AlphaZero only at 80 thousand positions per second versus Stockfish's 70 million. On the other hand, I also did get the feeling that the paper was written a little hastily, maybe. I think DeepMind is also aware of it: Hassabis already tweeted that the "full paper" will appear soon.
Take figure 1, for example, which is quite mysterious if you only read the caption. AlphaZero's ELO increases as a function of training time, until it catches and exceeds Stockfish slightly (after around 300K steps, which took 4 hours), and then curiously, as soon as its strength passes Stockfish just a tiny bit, it flatlines there for another 400K steps of training. You look at this figure, and then you look at the match result, and you scratch your head ... Fortunately, elsewhere in the paper, we learn that AlphaZero played 1 sec per move games against Stockfish after each iteration during training to estimate the improvement in ELO. This solves part of the mystery. It’s interesting, though, that Stockfish must have been holding its own throughout the training phase with 1 sec/move games. But why continue the training, when you see that AlphaZero is not getting any stronger (that’s what figure 1 shows). Probably, because you know that it must be getting stronger, even though that that is not reflected from the 1 sec/move games. And sure enough, after the training phase is over, during which AlphaZero has been exhibiting no increase in strength for the last 5 hours of a 9 hour training time, you start a match with longer time controls against the same opponent that was used for ELO benchmarking, and now a “fully-trained” AlphaZero crushes this opponent. Why was the 700K-step duration considered “fully trained”? Maybe simply arbitrarily, but it leaves a lot of interesting questions open regarding the self-improvement that AlphaZero managed during training phase. The point is, it’s not the version at the 300K steps (i.e. 4 hours) point that crushes Stockfish in the match, but the one after 700K steps (9 hours), and we don’t know how the one at 300K would perform in 1 min/move play (or the one at 500K). The ELO graph in figure 1 seems to be misleading, at least to me.
Anyway, these are only details related to the initial short paper, and I am sure the upcoming full paper will shed more light into a lot of open questions. It would be fascinating to have a totally transparent match between a strongly configured asmFish with tablebases, etc. and an AlphaZero with very long training time. It almost has the allure of the man vs. machine matches of the past :)
I would summarize my understanding of AlphaZero's play as below:
Alpha Zero uses a Policy Chain nueral network to shortlist the most promising positions that need to be analysed. In this match, the developers mentioned that it analyzed 80,000 moves per minute, while stockfish analysed 1000 times more positions
Alpha Zero also has another Value Chain neural net, that can judge a position as won/lost/drawn just by "looking" at it and without it playing thru to to the end.
In short, the Policy Network acts like a Opening Book of traditional engines, and the Value Network acts like the endgame Table Base of traditional engines.
But Alpha Zero needs something to connect the opening to the ending, and for this it uses Monte Carlo analysis, where it randomly plays many games for each position, till it obtains a clear value network judgement.
In this way, Alpha Zero is able to "see" 20-30 moves ahead for each position it analyzes, whereas traditional engines can only analyze 9-15 moves ahead for each position they analyze.
In other words Narrow-but-Deep analysis as opposed to Broad-but-Shallow analysis.
The most mind boggling thing for me here is the Monte Carlo simulation analysis, which means AlphaZero might be playing any Sample Games (say 100 for each position) when it is analyzing its next move. So 80,000*100 = 8 million chess games every minute or approximately, 1 full chess game, every 100 Microseconds. Mind-boggling!
AlphaZero must be reevaluated:
https://medcraveonline.com/OAJMTP/OAJMTP-01-00005.pdf