A data-driven exploration of the evolution of chess: Popularity of openings over time

For the 3rd installment in my series of blog posts exploring a data set of over 650,000 chess tournament games ranging back to the 15th century, I wanted to look at how chess openings have grown and waned in popularity over time. Again, I only have reliable data on chess games back to 1850, so 1850 will be my starting point.

The first few moves of a chess game, known as the chess opening, are one of the most-studied aspects of the game, largely because of how important they can be. If you don’t start off with a good opening, you could doom yourself to defeat before the game really even begins. It’s therefore no surprise that one of the key steps to becoming a skilled chess player is studying and memorizing the many varieties of openings. Hundreds of openings have been developed since 1850, so it should make for an interesting exercise to see how these openings have evolved since then.

Each chess game is recorded in PGN format, which means that it stores every move each player made, the outcome of the game, etc. Here’s an example game in PGN format:

[Event "Hoogovens A Tournament"]
[Site "Wijk aan Zee NED"]
[Date "1999.01.20"]
[EventDate "?"]
[Round "4"]
[Result "1-0"]
[White "Garry Kasparov"]
[Black "Veselin Topalov"]
[ECO "B06"]
[WhiteElo "2812"]
[BlackElo "2700"]
[PlyCount "87"]

1. e4 d6 2. d4 Nf6 3. Nc3 g6 4. Be3 Bg7 5. Qd2 c6 6. f3 b5
7. Nge2 Nbd7 8. Bh6 Bxh6 9. Qxh6 Bb7 10. a3 e5 11. O-O-O Qe7
12. Kb1 a6 13. Nc1 O-O-O 14. Nb3 exd4 15. Rxd4 c5 16. Rd1 Nb6
17. g3 Kb8 18. Na5 Ba8 19. Bh3 d5 20. Qf4+ Ka7 21. Rhe1 d4
22. Nd5 Nbxd5 23. exd5 Qd6 24. Rxd4 cxd4 25. Re7+ Kb6
26. Qxd4+ Kxa5 27. b4+ Ka4 28. Qc3 Qxd5 29. Ra7 Bb7 30. Rxb7
Qc4 31. Qxf6 Kxa3 32. Qxa6+ Kxb4 33. c3+ Kxc3 34. Qa1+ Kd2
35. Qb2+ Kd1 36. Bf1 Rd2 37. Rd7 Rxd7 38. Bxc4 bxc4 39. Qxh8
Rd3 40. Qa8 c3 41. Qa4+ Ke1 42. f4 f5 43. Kc1 Rd2 44. Qa7 1-0

With a bit of text parsing, I can count the number of times each chess opening was used on a per-game basis. For this analysis, I’ll look at the openings in four classes: White’s first move, Black’s first move, White’s second move, and Black’s second move.

White’s first move

It’s a well-known fact that White has a small advantage at the beginning of the game. To maintain this advantage, White should press their advantage to take over the middle of the board as quickly as possible. The most popular first White moves from 1850-2014 are shown below. Note that all of these are fairly aggressive openings that build toward control of the middle of the board.

w-first-moves-over-time

In 1850, White openings were fairly homogeneous: Most chess experts played King’s Pawn. Chess players didn’t begin to explore variants of the King’s Pawn in earnest until the 1890s, when Queen’s Pawn (moving a Pawn to d4) started to replace King’s Pawn in some player’s repertoires. The 1920s saw another burst of innovation with the rising popularity of the Zukertort Opening (moving the Knight to f3) and the English Opening (moving a Pawn to c4), which completed the set of staple first-turn openings that are really ever used nowadays.

Black’s first move

Many of Black’s opening moves are more defensive in nature and attempt to undermine White’s initial advantage. In 1850, it was standard fare for Black to match the ever-popular King’s Pawn by moving a Pawn to e5 (the Open Game). Although I typically group unpopular openings into the “Other” category, I wanted to point out the short-lived spike in popularity of the Pirc Defence in the 1850s. Though the Pirc Defence is typically thought of as a relatively new opening, Moheschunder Bannerjee used this opening almost exclusively in his 50+ games against John Cochrane, winning 40% of the games (far above his overall 24% win rate as Black).

wb-first-moves-over-time

Moreover, the rise of the Queen’s Pawn in the 1890s resulted in the rise of the Closed Game in the 1890s. Black openings similarly saw a burst of innovation in the 1920s, with the development of the Indian Defence in response to the Queen’s Pawn, and the introduction of the ever-popular Sicilian Defence in response to the standard King’s Pawn. By 2014, the Open Game is well past its glory days, and seems to be on its way out.

The French Defence seems to have been a staple Black opening for the past 164 years, consistently comprising 5%-10% of all chess games. Amusingly, the French Defence has a reputation for solidity and resilience, which is also reflected in its historical usage.

White’s second move

Here’s where things get complicated. I noted in the first section that the most popular first moves for White have historically been King’s and Queen’s Pawn, so that’s why the more popular second moves for White exclusively start with them. The Zukertort and English Openings simply haven’t become popular enough yet for their followup moves to show up here.

wbw-second-move-over-time

With the waning popularity of the Open Game over time, it’s no surprise that the responses to it have similarly declined. By 2014, the typical response to the Open Game is to play the King’s Knight, with the once-popular King’s Gambit and Vienna Game becoming all but extinct. The Sicilian Defence’s explosive rise to popularity is again reflected here, with the Open Sicilian (Knight to f3) becoming White’s standard response. Again, White’s response to Black’s French Defence (moving a Pawn to d4) has remained consistently popular over time, rarely dropping below 5% of the games played each year.

To avoid being overly wordy here, I’ll allow the visualization to speak for itself and leave the reader to explore the remaining trends as they please.

Black’s second move

If you’re familiar with chess, you know how quickly the set of possible moves grows with each move a player makes. After White and Black’s first turn, the board will be in one of 400 unique positions. After their second turn, there are 197,742 possible positions. And after only 3 turns, 121 million possible positions. This means that if you play enough chess, it’s highly likely that you will play a game that no one has ever played in the history of our universe. You can only imagine how difficult it would be to visualize all possible chess moves even up to the third turn.

Despite the infinite possibility in chess, there appears to be a strong bias toward a small subset of openings. In this data set, there were roughly 4,000 unique openings, and the 30 most popular ones comprise 70% of all chess games. Below is a visualization of the distribution of those 30 most popular openings from 1850-2014.

(Have any thoughts on a better way to visualize this data? Please leave them in the comments! I’ve already reached the limit of what area charts can effectively visualize by Black’s second move.)

Interestingly, chess appears to be becoming more diverse over time. Whereas there were less than 100 unique openings by the end of both player’s second turn in 1850, there were over 1,000 unique openings by 2014. This may be an artifact of the data set, however, because there are far more games recorded in the 21st century in this data set.

That’s it for today. In the next installment, I’ll be looking at more higher-level features of player strategy over time.

Randy is a PhD candidate in Michigan State University's Computer Science program. As a member of Dr. Chris Adami's research lab, he studies biologically-inspired artificial intelligence and evolutionary processes.

Posted in analysis, data visualization Tagged with: , , ,
  • http://forthgo.com/blog/ Xan Gregg

    Fun database to explore.

    > Have any thoughts on a better way to visualize this data?

    I’d try overlaid smoothed lines of popularity by year of the top 10-15 lines of play. As you saw, stacked area charts start to lose their utility when the baselines get too varied. With the smoother, this may be the case where less detail equals more insight. Possibly you can color the lines by White’s success rate.

    • http://www.randalolson.com Randy Olson

      Interesting idea. My primary concern with using overlaid lines is that many of the openings comprise 1% or less of the data set each year, whereas a select few comprise 15% or more at any given time. Thus the select few would stand out on top, whereas the majority of the openings would be fighting for space in the 0-1% region.

      I could of course solve this by plotting the logged values, but then that would be hard to interpret, e.g., what does -8 in log space mean for the real fraction?

  • Pingback: Blogroll: Randal Olson | Scientific Gems()

  • https://sites.google.com/site/beheim/ Bret Beheim

    Hi Randy,

    Great work! If you are interested, I have a few suggestions and ideas that you might want to try out.

    First and foremost, I recommend exploring the connection between the move’s prevalence and it’s performance – that’s the big message of my analysis of Go openings I think. Players seem to be quite attentive to moves that preceded victory (either their own, or the wins of their colleagues). Whether or not there’s actually a causal connection is outside my expertise, but I can say that from the data it certainly looks like the players think there is.

    As far as move diversity goes, in biology the big problem with doing unique type counts (species, alleles, etc.) is that they miss the relative abundance of each type. A way to reduce the data is to deploy a diversity metric – the two most popular are Shannon’s entropy and Simpson’s diversity index. It would be interesting to see if the Shannon entropy for opening moves was increasing along with unique counts (it does in Go).

    A third aspect, if you can get it, is player metadata like age – are there big differences between young players and older players in their willingness to adopt new openings?

    The fourth is demographic – is a move becoming more popular each year because it’s spreading epidemiologically from player to player, or because players who enter the dataset are already using it more than the population average the previous year?

    Bret

  • Pingback: Évolution des ouvertures aux échecs | L'Endormitoire()

  • Pabitra

    I do not know but in late 60s and early 70s, lot of theoretical work was done in Russia, on some openings like Ruy Lopez and Guioco Piano. These got analysed upto something like 30 moves.
    Those were the years when the literature was available only in Russian and Russia ( or USSR, as it was known) dominated the chess scene. In mid 70s, Fischer brought Sicilian Defence in to popularity.
    Currently, the opening is getting defined by upto 15 moves. Analysing first 4 to 6 moves may not give the true insight, even though, computer chess has resulted in lookup tables upto that many moves.

    By the way, Bret, where is that analysis of Go game moves, you quote?

  • Daniel Gomez

    I was wondering if you could make separate charts for each of the common openings. For example, for white’s second move, I would like to have seen 5 separate charts each with the popularity of the response based on which one of the five main responses black made in the first move. That way you can see what has become popular to do by situation instead of just kind of re-viewing the popularity of the past moves over time. In this same way, you could do a black’s second move based on the 9 common white second moves you have and have 9 charts, each not being all that complicated.

    • http://www.randalolson.com Randy Olson

      I like this idea. I think it would actually work best as an interactive chart. e.g., it starts out with White’s first move. Then you could click in the area where you want to zoom (e.g., King’s Pawn) and it would show a new area chart for Black’s first move in response to King’s Pawn. And so on.

      Any interactive data viz wizards want to help make this happen? :-)

  • http://adabrowka.wordpress.com Andrzej Dąbrówka

    In literary history there is a discussion on the “authenticity” of games described in narratives, like De Cessolis’s treaty Ludus scaccorum translated afterwards in many Europ. languages. It would bve interesing to identify their moves as belonging to the historical types of openings.

  • Alex S

    For how to visualize, check out this visualization from the Times. Similar challenge of showing multiple increasing pathways. But they don’t have to deal with the extra dimension of time.

    http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html?ref=politics

  • Pingback: Evolution of chess: Moves, captures, and checkmates | Randal S. Olson()

  • Pingback: Celebrating 2 years of research blogging by analyzing my blog | Randal S. Olson()

  • Pingback: The Evolution of Chess Openings | Purple Pawn()

  • Pingback: A Visualization Of Popular Opening Chess Moves Over Time | Scientific Beast()

  • http://www.rutmanip.com jeremy rutman

    tree structure would be good for viz. as already pointed out. Maybe change color of nodes to indicate popularity and color of edges to indicate win % for that branch, whole thing evolves in time

  • Pingback: Assorted links()

  • Paul Zrimsek

    I got to this interesting article following a link from an economics blog, and I noticed that this sentence– In this data set, there were roughly 4,000 unique openings, and the 30 most popular ones comprise 70% of all chess games– was in exactly the same form people often use to complain about income or wealth inequality. Plotting a Gini coefficient of opening frequency over time might be a good way of showing how the vast Open Game fortune has gradually trickled down to the have-nots. (It also yields a single number no matter how many moves into the game you look.)

    • http://www.randalolson.com Randy Olson

      Great idea! Plotting the entropy of openings over time would likely also show a similar trend.

  • Chris

    Interesting post. Thank you.

    Could you change the third graph so identical moves up to the second move are grouped together ? It seems to me the French Defense shouldn’t sit between the King’s Knight, King’s Gambit and the Vienna Game because the other three all begin with 1. e4 e5. It’s a small point but since you asked…

    Look forward to reading more of your stuff.

  • Adrian Meli

    This is pretty interesting data. Why do you think in the White’s first move chart the diversity of moves increases for the first few decades and then levels out?

    • http://www.randalolson.com Randy Olson

      Taking a semi-educated guess: I’d imagine that chess theorists have shown that the four 1-move openings are really the only viable moves for developing effective openings after move 1. Now the evolution of moves is taking place more after turns 1 and 2.

  • Cary Utterberg

    The early popularity of the open games (1.e4,e5) is due to the fact that the pieces have a more fluid development in that opening, and forcing tactics happen more easily–this was a result of limited chess understanding, mainly because relatively few games were published until the latter 19th century, so chess was slow to develop. As positional play (long term strategic ideas) developed in the latter 19th-early 20th century, the closed game (1.d4,d5) became more popular. The characteristic of the 20th century opening (the hypermodern masters of the 1920s, the early Soviets in the 1940s, etc.) is that asymmetrical defenses (other than 1.e4,e5 or 1.d4,d5) became much more popular, partly because Black has more hope of counterplay when the position is asymmetrical, but especially because it was gradually discovered that Black need not immediately stake an identical claim in the center by mirroring White’s first move. Regarding your stats, keep in mind that in modern chess, closed games such as the Queen’s Gambit often arise from the English or Zukertort (Reti) opening, e.g. 1.c4,e6 2.Nc3,d5 3.d4, or 1.Nf3,d5 2.c4,e6 3.d4.

  • Pingback: Friday morning links - Maggie's Farm()

  • Stephan

    One thing to keep in mind is that sometimes the same position is reached by a different sequence of moves. For example, some top players like to play 1. Nf3 with the intention of playing an early d4 or c4 later. These games often transpose into positions that could also have started with 1. d4 or 1. c4. Top players often reach the certain positions by a different sequence of moves than the “book” order in order to avoid certain unfavorable openings.

    • http://www.randalolson.com Randy Olson

      That’s correct. The same opening that was reached by different paths would be counted as a different opening. The goal here was to map out specific paths used in openings; I’ve found another method to identify all openings regardless of path taken.

  • Pingback: Friday links: does Gaad exist, stories behind classic ecology papers, evolution of chess, and more | Dynamic Ecology()

  • Larry

    Did your analysis factor out logically equivalent positions at the end of two moves? For example after two moves 1. e4, e5 2. Nf3, Nc6 is equivalent to 1. e4, Nc6 2. Nf3, e5

    • http://www.randalolson.com Randy Olson

      It didn’t, no. Those would be counted as separate moves.

  • Dictatortot

    Virtually none of my games has ever been played before in the history of our universe. And there’s a good reason for that.

  • Pingback: Incognitosis de fin de semana (XVI) | Incognitosis()

  • Pingback: Causas y azares 23 | Error 500()

  • Pingback: Causas y azares 23 | recolector.de {tecnologia}()

  • Nick B

    Here’s a crude version of how you might do nested stacked area charts. Data are of course fake, but it shows over 6 time periods the popularity of the top four opening moves by white, the top four responses to each opening by black, and the top two third moves by white. Better choice of colors etc would probably make it more readable, but even in the crappy excel sketch, you can look into subsets without losing track of the whole. It would be cool to see the whole dataset in such a format.

    http://i.imgur.com/JwGSXc7.png

  • Pingback: Popularidad de las aperturas a través del tiempo (EN)()

  • Pingback: Evolution of the Popularity of Opening Chess Moves for the Last 150 Years | prettyawfulthings()

  • Pingback: History of Chess Openings | Barnhard Blog()

  • Pingback: Changing chess openings | Blogsfera()

About this blog

The data visualizations on this blog are the result of my “data tinkering” hobby, where I tackle a new data analysis problem every week. If I find something interesting, I report my findings here to share with the world.

If you would like to use one of my graphs on your website or in a publication, please email me.

Archives

Enter your email address to subscribe to this blog and receive notifications of new posts by email.