lichess.org
Donate

Exploring how Lichess' players spend their playing time, Part 1 of 2

Interesting data, thanks for that!

I have a question about the section "Are active players versatile or specialised?". If for example someone plays one game of bullet and one game of rapid, they would be included in the squares (2,4) and (4,2) of figure 3, correct?

If so I think it would be interesting to create the same matrix with B only including players who play a certain TC the most (measured by no idea what, maybe just number of games) and looking at how many of them also play other TCs. That way each player only shows up in one square and you can maybe see what TC leads to the most variety and which one is the most specialized.

As it is right now I feel like we can not see if playing less popular TCs leads to those players also playing in mainstream ones or the other way around. So if we get to request something for part 2, that is my wish :)

Thanks again for your work!
I hope part 3 compares ratings across time controls. Would be interesting to quantify the average rating in rapid of a 1700 blitz player, for instance. Or the average rating in bullet of a 2000 classical player.

If you want to go a lot deeper, I think it would be super interesting to see how players do in 3+0 vs 3+2 or 15+10 vs 10+0. I could imagine the different pools show different distributions even within one "category." Showing that 2000 classical players do better at 5+3 than 3+2 than 3+0 for instance would be enlightening if true.

Great post!
@Toscani said in #5:
> Just because players have an hour each does not mean they will be playing for 2 hours. It's the total time to complete one game that should have been looked at and not each category of clock settings.

I think the database has the post-game move times with the games. Summing those up should be a good estimate.
@NoseKnowsAll said in #13:
> I hope part 3 compares ratings across time controls. Would be interesting to quantify the average rating in rapid of a 1700 blitz player, for instance. Or the average rating in bullet of a 2000 classical player.
>

And others. I think this lay of the land with only one month of data could allow such tool-set to connect to rating evolution questions. one month at learning time scale for such a complex art as chess player (or science-ing), is like instantaneous rating snapshot, given the pool size and human actual learning limits (for play expression or performance, many types of learning and stages for same things learned, rating being a coarse tool, but the only one we can trust so far..).

I would give this project some slack to make sure that the ground level quantities that can be used from the database be first seen, and if wanting the community to be part of the project, then keep the comment coming in. I hope to NOT be interfering with the op intent. That would be my approach. Blogs not being peer reviewed papers, more like proposals, and I would add my own favourite idea sharing method, proposal for discussion... make blogs compensate for the loss of community visibility, on the lichess site landing page called the lobby.
This is really an interesting assay and a great effort.

Even though in between I had the feeling to not not yet fully understand the criteria used to form the different groups.

"Players of a time control are defined as all players that have played at least one game in that specific TC."

Without being an expert in statistics I felt that selection may bring a substantial bias to the data. This may be supported by one of the graphs presented (no. of players vs. no. of games): A substantial amount may just have tried one time control once or twice, but that does not essentially mean they are regularily active in this TC no?

Therefore I wondered, why the data were selected as presented and how the results would look like with a more stringent selection. Would it make sense to use a higher "min number of games"-cutoff to have a more obust baseline?

Any feedback on my thoughts will be appreciated.
I did not see two simple figures which are of interest to me, and the reason why I read this post:

1.How much time does the average player spend playing for a time period (month/year).

2.How many games does the average player complete for a time period (month/year).
Can you let us know what is the ratio among all games between time + 0 seconds Vs Time + X seconds. Meaning, how many games are played with incremental?

3 + 0 vs 3 +2
5 + 0 vs 5+ 3
10 + 0 vs 10 + 5

Thank you.
May I know How you got this data, is there any API to access lichess database?
Thanks for the comments! Will try to address them all.

@Toscani said in #5:
> A blitz game can last a few seconds to it's total time control. Even if one player has 3 minutes+2 seconds and the other has 3 minutes + 2 seconds, nothing is stopping the game from lasting 6 minutes and some seconds. It's the total time to complete one game that should have been looked at and not each category of clock settings.

That's exactly "real time spent playing" has been computed! (written in the first paragraph)

> The "Real time spent", which is the total time they spent playing this TC, based on the time left on the clock at the end of the game, and taking into account increment and berserk.

In more mathematical terms the formula used was `base time - finish time + increment * nb_plies`

@ForeverBetter said in #8:
> I'd love to see data on what proportion of games are analyzed, whether players are more likely to analyze their wins or their losses, what percentage of the time they are able to correct their blunders in one or two attempts using the 'Learn from your mistakes' tab, what percentage of moves are blunders in each time control, average accuracy in each time control, whether quality of play in the opening, middlegame, or endgame is hurt the most by time pressure, etc.

Interesting indeed, but it would not work for all players as titled players have all their games analysed (notably to generate puzzles).
In total about 6% of all rated games are analysed, data from: database.lichess.org/#notes
The rest of the questions are also interesting, and while I don't have the data for the general population, one can find it for a particular using Lichess insights. For example to get the accuracy by time control: lichess.org/insights/ForeverBetter/acpl/variant/variant:ultraBullet,bullet,blitz,rapid,classical
(replace the username in the URL by yours to see it).
For a more general blog about ACLP by rating, there's a great blog post about it (written for antichess, but the methodology could be applied to chess, being extra careful about different stockfish versions resulting in different ACPL): lichess.org/@/ErinYu/blog/are-antichess-ratings-deflating/J64MfIqX)
Awesome job, congrats!

What steps did you follow, starting from the donwload of the raw data until obtaining the clean dataframe for which you showed a few rowes? I looked into the GitHub repo and failed to find it–apologies!

I think that writing about these steps could inspire others to squeeze a little bit more these data :D