Understanding the Elo rating system in chess

The Elo rating system is a method used in chess and other competitive games to calculate players’ skill levels based on game outcomes. Named after its creator, Arpad Elo, a Hungarian-American physics professor, this system offers a dynamic way to track a player’s relative strength as they win or lose matches. Unlike static ranking systems, the Elo rating adjusts after each game, providing a constantly updated measure of a player’s performance compared to others.

How Elo Rating Works in chess

The Elo system begins by assigning a numerical rating to each player, typically around 1200 for beginners. As players compete, the ratings adjust based on game outcomes, taking into account both the players’ ratings before the match and the expected result. When a player wins, they gain points, and the opponent loses points; if the player loses, they lose points and the opponent gains.

The Elo system calculates rating changes based on expected results. For example, if a higher-rated player defeats a much lower-rated player, their gain in points is relatively small, as they were expected to win. Conversely, if an underdog defeats a much stronger opponent, the underdog gains a higher number of points because the victory was less likely.

Formula and Calculation

The change in a player’s Elo rating is determined by a formula, which includes:

K-factor: a constant that determines the maximum number of points a player can gain or lose in a single game. Higher K-factors (like 40) are often used for new players, allowing their ratings to change more rapidly as they develop their skill. For experienced players, a lower K-factor (e.g., 10 or 20) provides more stable ratings.
Expected score: Calculated based on the difference in ratings between two players, the expected score reflects the probability of each player winning.

The exact formula for expected score and rating adjustment might vary slightly depending on the chess organization, but the underlying principles remain consistent.

Interpreting Elo Ratings

An Elo rating provides an approximate idea of a player’s skill level. Here are some general benchmarks:

Beginner: Ratings under 1200
Intermediate: Ratings between 1200–1600
Advanced: Ratings between 1600–2000
Expert: 2000–2200
Master: 2200 and above
Grandmaster (GM): Typically over 2500; the title is also awarded based on specific achievements rather than just rating.

A player’s Elo rating also indicates the approximate odds in a match. A difference of 200 rating points means the higher-rated player is expected to score 75% of the points (roughly three wins to one loss).

Advantages and limitations of Elo

The Elo system has proven effective for ranking players of similar skill levels and is widely used in online platforms and organizations like FIDE (the International Chess Federation). However, it has some limitations. For example, ratings are less accurate when players have few games recorded, and the system assumes that players’ skills don’t fluctuate greatly over short periods.

Elo rating in modern chess

The Elo system is essential in the competitive chess world. Websites like Chess.com and Lichess use Elo-based systems to match players of comparable skill levels for a fair and engaging experience. As players continue to win, their Elo ratings rise, helping them compete against stronger opponents, while losses encourage them to improve against players of similar skill.

Overall, the Elo system provides a robust, adaptable way to evaluate chess skills, promoting fair competition and motivating players to improve.

While Elo is the most well-known rating system in chess, other systems have emerged to address its limitations and to provide alternatives for rating players across different levels, formats, or contexts. Here are some of the main alternatives:

1. Glicko and Glicko-2 Systems

Glicko and its updated version Glicko-2 were created by Mark Glickman and are popular in online chess. These systems improve on Elo by introducing a rating deviation (RD), which measures the reliability of a player’s rating. For example, a player who hasn’t played in a while will have a high RD, meaning their rating could be more volatile as they play more games. Glicko-2 also includes a rating volatility factor, adjusting for how quickly a player’s skill level is changing. Websites like Chess.com use versions of the Glicko system, as it helps match players of more similar current skill levels.

2. Chessmetrics

Chessmetrics, developed by Jeff Sonas, is an alternative system used mainly for historical comparisons. Instead of updating after every game, Chessmetrics ratings are calculated on a rolling basis over set timeframes (e.g., every month). This system accounts for both the performance level over time and the frequency of games, which provides a consistent view of players’ peak performance periods. It’s particularly helpful in comparing players from different eras.

3. TrueSkill

TrueSkill is a system developed by Microsoft and widely used in online gaming. Similar to Glicko, it assesses both skill level and confidence in that skill (akin to Glicko’s RD). TrueSkill is particularly effective in multiplayer or team settings, but it has been adapted for some chess use cases, especially in experimental platforms. TrueSkill’s emphasis on match quality and skill variance makes it more adaptable than Elo in games involving multiple players.

4. Universal Rating System (URS)

The Universal Rating System (URS) was introduced by Chess.com, the Saint Louis Chess Club, and other partners, aiming to bridge classical, rapid, and blitz formats into one comprehensive rating. URS uses a statistical approach similar to Elo but adjusts for the time control, giving each player a single rating that reflects their overall performance across formats. This system has gained popularity in recent years as rapid and blitz have become more prominent.

5. Performance Ratings

Performance ratings don’t track a player over time but instead provide a temporary rating for specific events or tournaments. For instance, if a player defeats several highly-rated opponents in a tournament, their performance rating for that event will be high, even if their overall rating remains lower. This measure is useful for comparing how players perform in specific contexts rather than as a long-term rating.

These alternative systems each bring unique benefits, such as more accurately reflecting a player’s current skill (Glicko), providing historical comparisons (Chessmetrics), or unifying formats (URS). However, Elo rating remains the primary standard in chess due to its simplicity and established reputation.