lichess.org
Donate

Exact Ratings for Everyone on Lichess

I have to push back a bit on this.
@MyPoorRook said in #7:
> I believe the author had good intentions when he wrote this article.
>
> "An annoying aspect of online play, especially at very high levels, is when players care too much about ratings. "
>
> I know a much more effective and simpler method that allows you to not worry too much about rankings. You need to go to your profile settings and select “do not show ratings.” Simple and effective.

This makes you not care about your ratings, it doesn't change anything about the other players. The point of this post was to get more accurate ratings for the general population since that can be distorted by the issues mentioned in the article.

>
> "It instead uses Glicko-2 which produces more accurate ratings."
>
> The system may be better in some aspects, but it doesn't seem like using it on lichess will produce so good results. Sometimes lichess ratings looks like a joke. For example, I know two players: IM with a stable lichess rating of about 2500 and CM with a stable lichess rating of about 2750. The first of them plays not just stronger, but significantly stronger. Their FIDE rating reflects this, the lichess rating does not. I suspect that the desire to make Glicko-2's rating converge quickly inevitably results in its increased vulnerability to farming (though the validity of this expression needs to be explored).
>

The fact that for chess generally speaking Glicko-2 is more accurate than Elo is pretty not really disputed as far as I am aware so I am not sure what the claim here is. These two players don't get different ratings because of some mysterious flaw in the Glicko system but because the CM scores better on lichess than the IM, simple as that. Why do they score better? I don't know, that depends on the players. Maybe the IM doesn't play seriously on lichess and just for fun and therefore doesn't focus on the games, hides his true opening repertoire to not give OTB opponents a clear view of it. Maybe they simply aren't as good at blitz as the CM. Can you farm weaker players? Yes, you can, in the FIDE Elo system it's actually easier than in Glicko because of the 400 points difference rule (which is about to go back to it's full effect), people have done it before. Though in online blitz it is easier in so far, as that you can easily play hundreds or even thousands of games. Nothing to do with the rating system however.

Now it is worth noting that Lichess does not use the original Glicko-2 algorithm but rather a modified version of it. So it is possible that there is some inaccuracy added in there. Certainly not making for a 250 rating point swing, however.

> The big problem with both Elo and Glicko-2 (and, as I understand from your description, Ordo is no better) is that neither of these systems takes into account the simple fact that in a chess game there are three possible outcomes, not two. A draw is a natural outcome of the game, and the higher the level of the opponents, the higher the probability of a draw. That is why those with high FIDE ratings are forced to avoid relatively low-rated opponents. That is why it is almost impossible to achieve a super grandmaster rating (2700+) without playing in elite super tournaments (I do not consider the possibility of doing this by cheating). A GM with a rating of 2500 can draw twice with a FM with a rating of 2300, who was desperately trying to dry up the position from the very first moves, and will be punished for this as if he lost one game with one win, despite the fact that he may have in both of these games he was not in a worse position at any point and, perhaps, in a hypothetical 10-game match his opponent would not have been able to achieve a single victory. It is pointless. In theory, this should work, but in practice, the curve of the dependence of the average result of a game on the difference in rating between opponents differs significantly from the exponential formula at the heart of the rating system.

This is a fair complaint, however, I think this is more a problem of the game of chess. In Go you do not have this sort of deflation at the top naturally, as there are no draws in Go. It's not really the fault of the rating system that chess is a drawish game. If you set the value of a draw at 0.5 and count them just the same, i.e. count two draws the same as a win and a loss, then a player scoring 50% draws, 50% losses will and should get the same rating as one that scores 25% wins and 75% losses. If you define that to lead to a 200 rating point difference, then you will get that here.
Now, I am not sure if the claim is instead that at 200 rating points this works fine but that two such 200 rating point steps lead to a less decisive score than a 400 rating point difference would imply. For FIDE's Elo implementation that might be true considering them feeling the need to introduce that 400 point difference rule, however, for Glicko I would first like to see some evidence of that.
I mean bro If only Krammink would put so much effort into his research as this fellow just did... :)
@korobok3 said in #33:
> Axes text aren't readable
Sorry, will do better graphs for December. Lichess really compressed those
my ordo rating is lower than my rating on lichess
This is one of the coolest blog posts on rating statistics I have seen. One reason is because it goes beyond the ratings themselves to examine the defects with the current system and proposes a very sensible looking solution which addresses most of the issues which we are all concerned about.

For an organization like FIDE which publishes ratings at long intervals this is really the solution to a lot of problems.

An interesting case in point is Alireza's current efforts to bump up his rating for the candidates. I am guessing this would be more difficult because of the averaging effect which would make the ratings change more slowly. If the goal is accurate ratings then this is the way to go. If you want to have an entertaining controversy ridden system then the current system is better,

It seems like ORDO or something equivalent needs to become more efficient computationally. But even so for FIDE calculating such an "optimal rating" once a month for the rated players ought to be doable. They could optimize it to make some approximations for players who play few games (their rating should hardly change from the previous moving average).

One thing that confused me about your article:

" I compared Lichess and Ordo ratings for the Blitz games of November. Lichess ratings predicted 58.423% of decisive game outcomes correctly, Ordo predicted 59.527% correctly. Remember than a coin flip can predict 50% of decisive outcomes correctly, so 58% is already very good. If Lichess was using something as archaic and unreliable as FIDE's Elo version, I doubt the predictions would be accurate more than 53% of the time, pretty much random. Yet Ordo performed 13.1% better than Glicko-2."

Probably i misunderstand something, but you say lihcess ratins are 58.4% accurate and predicting game outcomes and the ORDO ratings are 59.5% accurate. The difference seems quite modest, only about 1% improvement for ORDO. But then you say at the end of the paragraph that ORDO performd 13.1% better than Glicko-2. I assume by "Glicko-2" you meen the lichess rating.

Could you explain that?

And a minor criticism. It applies to all blog posts about ratings i have seen (not just yours). The plot axes numbers and labels are way too small to read. They should be readable at a scaling which allows viewing the entire plot, and best would be to label it so its readable as it appears within the blog.
I looked at your Makefile, it looked pretty good. One idea is you could try -Ofast instead of -O3, which will give less accurate (incorrect for some purposes) floating point math with a potential big gain in performance.

Of course, you'd need to check its chess accuracy, but it might even improve (less overfitting) so it might be interesting.
> I got tired of caring about ratings. So I spent a whole week analyzing all ratings on Lichess.
Lol! This has to be weirdest reason to do all that stuff!
I think the clouds are actually really conveying well the density of the points. Don't worry.

Always good to take a peek from outside the arena, I say. No need for justification, just hygienic thought in the morning.