Wednesday
Aug042010
Want to Replace the Elo System? Here's Your Chance
Wednesday, August 4, 2010 at 6:52PM
From Adrian Petrescu:
Thought you mind find the following link interesting. The guy behind ChessMetrics is apparently hosting a competition to devise a replacement for ELO. You write up your algorithm and train it on several thousand historical games, and then you upload it to his server where it will automatically be "tested" based on 7,000 other games with known results, to see how your predictions do.
I'm skeptical that (even if something good comes out of this) the inertia of ELO will ever be overcome. But I thought I'd let you know about it :)
Some relevant links are:
http://games.slashdot.org/story/10/08/04/2014202/Chess-Ratings-mdash-Move-Over-Elo
Reader Comments (2)
There have been plenty of developments in rating systems - TrueSkill from Microsoft Cambridge and both BayesELO and 'Whole History Rating' from Remi Coulom since.
ELO takes no account of the time dimension; WHR recognises that a player's performance-level is more uncertain if the player has been inactive for a long time.
I don't know of any system that actually assumes perforamnce declines with inactivity. More accurate dating of games would be a great help here and maybe there should be some crowd-sourcing to improve CHESSBASE etc on this score.
TrueSkill, BayesELO and WHR seems to have a well-argued theoretical basis: I'm not sure what the theoretical justification of ChessMetrics actually is.
Guy Haworth has proposed that players should be rated on the basis of their moves, i.e. their competence in absolute terms, rather than on the basis of their results see for example 'Performance and Prediction'' and 'Gentlemen Stop Your Engines' at: http://centaur.reading.ac.uk/view/creators/90000763.html
It will be interesting to see if anything comes sout of this public competition.
As one of Guy Haworth's partners in the effort you mention, let me note one important provision of the Elo competition terms (bolding mine):
"Competitors train their rating systems using a training dataset of over 65,000 recent results for 8,631 top players. Participants then use their method to predict the outcome of a further 7,809 games."
Thus the algorithms are not being trained on the games, but only on the results together with player data. Hence, as Guy and I surmised by e-mail today, this seems not to concern us.
As for training on the games, to make judgments about "top players", IMHO one needs to spend at least 1 hour per processor core per game.. On a single quad-core 64-bit PC, this means you can get 100 games per day. Thus to analyze almost 70,000 games in a week, one needs a team with 100 quad-core PCs. Or over the 100-day period of the competition, one needs 7 such PC's. If spending 6-7 hours per processor-per game (i.e. more time than the players took) is needed, you're talking 50 such PC's over the 100 days. Considering unplayed alternatives fully could multiply that by another factor of 10---what I'm saying is that blitzing thru games like Guid-Bratko with Crafty to depth (only) 12 is just not gonna cut it for predictions. Anyone using that kind of resources?