Improving ranking quality and fairness in Swiss-system chess tournaments

Pascal Sauer; Ágnes Cseh; Pascal Lenzner

doi:10.1515/jqas-2022-0090

Artikel

Improving ranking quality and fairness in Swiss-system chess tournaments

Pascal Sauer , Ágnes Cseh und Pascal Lenzner

Veröffentlicht/Copyright: 19. Januar 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Quantitative Analysis in Sports Band 20 Heft 2

Abstract

The International Chess Federation (FIDE) imposes a voluminous and complex set of player pairing criteria in Swiss-system chess tournaments and endorses computer programs that are able to calculate the prescribed pairings. The purpose of these formalities is to ensure that players are paired fairly during the tournament and that the final ranking corresponds to the players’ true strength order. We contest the official FIDE player pairing routine by presenting alternative pairing rules. These can be enforced by computing maximum weight matchings in a carefully designed graph. We demonstrate by extensive experiments that a tournament format using our mechanism (1) yields fairer pairings in the rounds of the tournament and (2) produces a final ranking that reflects the players’ true strengths better than the state-of-the-art FIDE pairing system.

Keywords: swiss system; tournaments; fairness; ranking; matching; chess

Corresponding author: Ágnes Cseh, Institute of Economics, HUN-REN Centre for Economic and Regional Studies, Budapest, Hungary; and Department of Mathematics, University of Bayreuth, Bayreuth, Germany, E-mail: agnes.cseh@uni-bayreuth.de

A 2-page abstract of this work appeared at the 23rd ACM Conference on Economics and Computation (EC’22). Ágnes Cseh was supported by the János Bolyai Research Fellowship.

Funding source: Magyar Tudományos Akadémia

Award Identifier / Grant number: Janos Bolyai Research Fellowship

Funding source: Nemzeti Kutatási Fejlesztési és Innovációs Hivatal

Award Identifier / Grant number: K128611

Research ethics: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors state no conflict of interest.
Research funding: Agnes Cseh was supported by the János Bolyai Research Fellowship.
Data availability: The raw data can be obtained on request from the corresponding author.

Appendix A: Ranking quality

In the following we discuss additional simulation experiments that measure the obtained ranking quality for various parameter settings.

A.1 Different tournament sizes

We start with experimental results demonstrating that our findings on the ranking quality remain valid for tournaments of different sizes in terms of number of players and number of rounds.

Usually it is expected that a player who wins all matches also wins the tournament, without being tied for the first place. This can be ensured by playing at least ⌈log₂n⌉ rounds: four rounds for 16 players, five rounds for 32 players and six rounds for 64 players. Most tournaments are five or seven rounds long, according to data from chess-results.com (Herzog 2020a).

In general, more rounds lead to higher ranking quality, although with diminishing effect, as Figure A1 shows. In terms of the achieved ranking quality, the MWM engine with Burstein outperforms Dutch BBP in all cases, except for the unrealistic case of a tournament with only two rounds.

Figure A1:

Ranking quality after 1–9 rounds, 32 or 64 players with strength range 1400–2200. Results for Burstein are shown in blue, Dutch BBP results are shown in orange.

A.2 Different strength range sizes

Here we vary the used strength range size, i.e., we sample the player strengths from different intervals. A smaller strength range size corresponds to a tournament among players with similar strength and larger strength range sizes model tournaments with more heterogeneous players. The results depicted in Figure A2 show that also for different strength range sizes the MWM engine with Burstein or Random2 outperforms Dutch BBP in terms of ranking quality and that Dutch is on a par with Dutch BBP.

Figure A2:

Ranking quality measured by normalized Kendall τ for different strength range sizes.

A higher strength range size results in higher ranking quality and less variance. The increasing ranking quality can be explained by a higher mean strength difference, which results from a larger strength range size. Variance decreases, because match results become more predictable.

The difference in ranking quality between Burstein and Dutch BBP is much higher for a strength range size of 400 compared to 800 and 1200. For small strength range sizes in all Dutch BBP paired matches it is more likely that a weaker player wins against a stronger opponent, while for Burstein at least some matches are still predictable.

A.3 Different player strength distributions

We provide additional experimental results that indicate that our findings hold independently of the employed player strength distributions, i.e., we get the same behavior also for non-uniform distributions. Since no data is available that let’s us estimate how realistic player strength distributions look like, we focus on several natural candidates that deviate strongly from uniform distributions.

First, we considered player strength distributions that are derived from exponential distributions. For this, we consider in Figure A3 a case with many strong players and only a few weak players and in Figure A4 a case with many weak players and only a few strong players within the given strength range size. We also considered player strength distributions derived from a normal distribution with a mean exactly in the middle of the strength range size and a standard deviation of a fourth of the strength range size. See Figure A5 for the corresponding results.

Figure A3:

Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 2000.

Figure A4:

Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 1600.

Figure A5:

Ranking quality measured by normalized Kendall τ for 32 players with a normally distributed player strength distribution in the range [1400, 2200] with mean at 1800 and standard deviation of 200.

Finally, we investigated a player strength distribution that is derived from uniformly sampling player strengths from the real-world distribution of Elo scores of all 363,275 players listed by FIDE,^[2] restricted to the desired strength range. Figure A6 shows also very similar results for this case.

Figure A6:

Ranking quality measured by normalized Kendall τ for 32 players uniformly sampled from the real-world distribution of Elo scores restricted to the range [1400, 2200].

A.4 Ranking quality via spearman ρ and NDCG

For comparison reasons, we provide an evaluation of the achieved ranking quality via the Spearman ρ and the normalized discounted cumulative gain (NDCG) measures.

Besides Kendall τ, Spearman ρ is commonly used for comparing rankings. Here, we use a normalized variant of Spearman ρ, similar to the normalized Kendall τ.

The NDCG measure is not commonly used for comparing rankings. It is used to evaluate search engines, by assigning a relevance rating to documents and awarding a higher score if highly relevant documents are listed early. Applied to our case, NDCG puts an emphasis on ranking the top players correctly, while ranking the lowest ranked players correctly is basically irrelevant.

As shown in Figures A7 and A8, the results with normalized Spearman ρ and NDCG look almost identical to the results for normalized Kendall τ in Figure 7. Also, for different strength ranges or range sizes we get consistent results, see Figures A9, A10, A11 and A12.

Figure A7:

Ranking quality measured by normalized Spearman ρ.

Figure A8:

Ranking quality measured by the normalized discounted cumulative gain (NDCG).

Figure A9:

Ranking quality measured by normalized Spearman ρ.

Figure A10:

Ranking quality measured by the normalized discounted cumulative gain (NDCG).

Figure A11:

Ranking quality measured by normalized Spearman ρ.

Figure A12:

Ranking quality measured by the normalized discounted cumulative gain (NDCG).

Appendix B: Fairness

Here we present additional simulation results that measure the achieved fairness, i.e., results regarding the compliance with the quality criteria (Q1) and (Q2).

B.1 Number of float pairs

We consider the obtained number of float pairs for different strength ranges and different strength range sizes. Figures B1 and B2 show that we get consistent results for different strength ranges and different strength range sizes. Burstein has by far the lowest number of float pairs, but also Random2 and Dutch perform slightly better than Dutch BBP. Figure B3 shows a direct comparison of the obtained number of float pairs for Burstein and Dutch BBP for different numbers of players and different tournament lengths.

Figure B1:

Number of float pairs for different strength ranges.

Figure B2:

Number of float pairs for different strength range sizes.

Figure B3:

Number of float pairs for different tournament sizes and lengths. The results for Burstein are shown in blue, results for Dutch BBP in orange.

Also here we consistently get that Burstein achieves much fewer float pairs than Dutch BBP.

B.2 Absolute color difference

The measured absolute color difference increases slightly with the number of rounds and also with the number of players, as Figure B4 shows.

Figure B4:

Absolute color difference in rounds 1–9, 16–64 players with strength range 1400–2200. Results for Burstein are shown in blue, Dutch BBP results are shown in orange.

Note that in every odd round, the absolute color difference must be at least n, which can also be seen. All investigated pairing systems almost always meet this lower bound for odd rounds. Interestingly, Dutch BBP seems to perform slightly better in tournaments with at most 4 rounds compared to Burstein, but this tiny advantage vanishes for at least six rounds. We get similar results when comparing with Random2, Dutch, Random, and Monrad.

References

Appleton, D. R. (1995). The best man win? J. R. Stat. Soc. – Ser. D Statistician 44: 529–538.10.2307/2348901Suche in Google Scholar

Beutel, A., Chen, J., Doshi, T., Qian, H., Wei, L., Wu, Y., Heldt, L., Zhao, Z., Hong, L., Chi, E. H., et al.. (2019) Fairness in recommendation ranking through pairwise comparisons. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’19). Association for Computing Machinery, New York, pp. 2212–2220.10.1145/3292500.3330745Suche in Google Scholar

Bierema, J. (2017). BBP pairings, a swiss-system chess tournament engine, Available at: <https://github.com/BieremaBoyzProgramming/bbpPairings> (Accessed 17 May 2022).Suche in Google Scholar

Bimpikis, K., Ehsani, S., and Mostagir, M. (2019). Designing dynamic contests. Oper. Res. 67: 339–356.10.1287/opre.2018.1823Suche in Google Scholar

Biró, P., Fleiner, T., and Palincza, R. (2017) Designing chess pairing mechanisms. In: 10th Japanese–Hungarian symposium on discrete mathematics and its applications. Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Budapest, pp. 77–86.Suche in Google Scholar

Brandt, F., Brill, M., Seedig, H.G., and Suksompong, W. (2018). On the structure of stable tournament solutions. Econ. Theor. 65: 483–507. https://doi.org/10.1007/s00199-016-1024-x.Suche in Google Scholar

Brandt, F., Conitzer, V., Endriss, U., Lang, J., and Procaccia, A.D. (2016). Introduction to computational social choice. Cambridge University Press, Cambridge.10.1017/CBO9781107446984.002Suche in Google Scholar

Brandt, Felix and Fischer, Felix A. (2007) PageRank as a weak tournament solution. In: Deng, X. and Graham, F.C. (Eds.), Internet and network economics, third international workshop, WINE (Lecture Notes in Computer Science), Vol. 4858. Springer, San Diego, pp. 300–305.10.1007/978-3-540-77105-0_30Suche in Google Scholar

Castaño, F. and Velasco, N. (2020). Exact and heuristic approaches for the automated design of medical trainees rotation schedules. Omega 97: 102107, https://doi.org/10.1016/j.omega.2019.102107.Suche in Google Scholar

Chatterjee, K., Ibsen-Jensen, R., and Tkadlec, J. (2016). Robust draws in balanced knockout tournaments. In: Proceedings of the 25th international joint conference on artificial intelligence. AAAI Press, New York, pp. 172–179.Suche in Google Scholar

Chen, X., Bennett, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In: Proceedings of the Sixth ACM international conference on web search and data mining (WSDM ’13). Association for Computing Machinery, New York, pp. 193–202.10.1145/2433396.2433420Suche in Google Scholar

Csató, L. (2013). Ranking by pairwise comparisons for swiss-system tournaments. Cent. Eur. J. Oper. Res. 21: 783–803. https://doi.org/10.1007/s10100-012-0261-8.Suche in Google Scholar

Csató, L. (2017). On the ranking of a Swiss system chess team tournament. Ann. Oper. Res. 254: 17–36. https://doi.org/10.1007/s10479-017-2440-4.Suche in Google Scholar

Csató, L. (2021). Tournament design: how operations research can improve sports rules. Springer Nature, Switzerland.10.1007/978-3-030-59844-0Suche in Google Scholar

Dagaev, D. and Suzdaltsev, A. (2018). Competitive intensity and quality maximizing seedings in knock-out tournaments. J. Combin. Optim. 35: 170–188. https://doi.org/10.1007/s10878-017-0164-7.Suche in Google Scholar

Dezső, B., Jüttner, A., and Kovács, P. (2011). LEMON–an open source C++ graph template library. Electron. Notes Theor. Comput. Sci. 264: 23–45, https://doi.org/10.1016/j.entcs.2011.06.003.Suche in Google Scholar

Dirac, G.A. (1952). Some theorems on abstract graphs. Proc. Lond. Math. Soc. 3: 69–81, https://doi.org/10.1112/plms/s3-2.1.69.Suche in Google Scholar

Edmonds, J. (1965). Paths, trees, and flowers. Can. J. Math. 17: 449–467, https://doi.org/10.4153/CJM-1965-045-4.Suche in Google Scholar

Elmenreich, W., Ibounig, T., and Fehérvári, I. (2009). Robustness versus performance in sorting and tournament algorithms. Acta Polytech. Hungar. 6: 7–18.Suche in Google Scholar

Elo, A.E. (1978). The rating of chessplayers, past and present. Arco Pub., London.Suche in Google Scholar

Fehérvári, I. and Elmenreich, W. (2009). Evolutionary Methods in Self-Organizing System Design. In: Arabnia, H.R., and Solo, A.M.G. (Eds.). Proceedings of the 2009 international conference on genetic and evolutionary methods, GEM 2009, July 13–16, 2009, Las Vegas Nevada, USA. CSREA Press, pp. 10–15.Suche in Google Scholar

FIDE (2020). FIDE handbook. Available at: <https://handbook.fide.com/> (Accessed 17 May 2022).Suche in Google Scholar

FIDE (2023). FIDE handbook, D. regulations for specific competitions/02. Chess Olympiad. Available at: <https://handbook.fide.com/chapter/OlympiadPairingRules2022> (Accessed 26 July 2023).Suche in Google Scholar

FIDE SPP Commission (2020). Probability for the outcome of a chess game based on rating. Available at: <https://spp.fide.com/2020/10/23/probability-for-the-outcome-of-a-chess-game-based-on-rating/>.Suche in Google Scholar

Friendly, M. and Denis, D. (2005). The early origins and development of the scatterplot. J. Hist. Behav. Sci. 41: 103–130. https://doi.org/10.1002/jhbs.20078.Suche in Google Scholar PubMed

GitHub (2022). Suboptimal exchange in Remainder #7. Available at: <https://github.com/BieremaBoyzProgramming/bbpPairings/issues/7>.Suche in Google Scholar

Glickman, M.E. and Jensen, S.T. (2005). Adaptive paired comparison design. J. Stat. Plann. Inference 127: 279–293. https://doi.org/10.1016/j.jspi.2003.09.022.Suche in Google Scholar

Gupta, S., Roy, S., Saurabh, S., and Zehavi, M. (2018) When rigging a tournament, let greediness blind you. In: Proceedings of the 27th international joint conference on artificial intelligence. IJCAI, Stockholm, pp. 275–281.10.24963/ijcai.2018/38Suche in Google Scholar

Guse, J., Schweigert, E., Kulms, G., Heinen, I., Martens, C., and Guse, A.H. (2016). Effects of mentoring speed dating as an innovative matching tool in undergraduate medical education: a mixed methods study. PLoS One 11: e0147444. https://doi.org/10.1371/journal.pone.0147444.Suche in Google Scholar PubMed PubMed Central

Harbring, C. and Irlenbusch, B. (2003). An experimental study on tournament design. Lab. Econ. 10: 443–464. https://doi.org/10.1016/s0927-5371(03)00034-4.Suche in Google Scholar

Henery, R.J. (1992). An extension to the Thurstone-Mosteller model for chess. J. R. Stat. Soc. – Ser. D Statistician 41: 559–567. https://doi.org/10.2307/2348921.Suche in Google Scholar

Heinz Herzog. 2020a. Chess-Results.com, the international chess-tournaments-results-server. Available at: <https://chess-results.com/> (Accessed 07 December 2021).Suche in Google Scholar

Heinz Herzog. 2020b. Swiss-Manager. Available at: <http://www.swiss-manager.at/> (Accessed 07 December 2021).Suche in Google Scholar

Hintze, J.L. and Nelson, R.D. (1998). Violin plots: a box plot-density trace synergism. Am. Statistician 52: 181–184. https://doi.org/10.1080/00031305.1998.10480559.Suche in Google Scholar

Heike Hofmann, Hadley Wickham, and Karen Kafadar. (2017). Value plots: boxplots for large data. J. Comput. Graph Stat. 26: 469–477. https://doi.org/10.1080/10618600.2017.1305277.Suche in Google Scholar

Hoshino, R. (2018). A recursive algorithm to generate balanced weekend tournaments. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 32. AAAI Press, New Orleans, pp. 6195–6201.10.1609/aaai.v32i1.12076Suche in Google Scholar

Hudry, O. (2009). A survey on the complexity of tournament solutions. Math. Soc. Sci. 57: 292–303. https://doi.org/10.1016/j.mathsocsci.2008.12.002.Suche in Google Scholar

Irving, R. (1985). An efficient algorithm for the “stable roommates” problem. J. Algorithm. 6: 577–595. https://doi.org/10.1016/0196-6774(85)90033-1.Suche in Google Scholar

Karpov, A. (2018). Generalized knockout tournament seedings. Int. J. Comput. Sci. Sport 17: 113–127. https://doi.org/10.2478/ijcss-2018-0006.Suche in Google Scholar

Kendall, M.G. 1945. The treatment of ties in ranking problems. Biometrika 33: 239–251. https://doi.org/10.1093/biomet/33.3.239.Suche in Google Scholar PubMed

Kim, M.P., Suksompong, W., and Williams, V.V. (2017). Who can win a single-elimination tournament? SIAM J. Discrete Math. 31: 1751–1764. https://doi.org/10.1137/16m1061783.Suche in Google Scholar

Kim, M.P and Williams, V.V. (2015). Fixing tournaments for kings, chokers, and more. In: Proceedings of the 24th international joint conference on artificial intelligence. AAAI Press, New Orleans, pp. 561–567.Suche in Google Scholar

Kolmogorov, V. (2009). Blossom V: a new implementation of a minimum cost perfect matching algorithm. Math. Programm. Comput. 1: 43–67. https://doi.org/10.1007/s12532-009-0002-8.Suche in Google Scholar

Korte, B. and Vygen, J. (2011). Combinatorial optimization: theory and algorithms. Springer, Berlin.10.1007/978-3-642-24488-9Suche in Google Scholar

Kujansuu, E., Lindberg, T., and Mäkinen, E. (1999). The stable roommates problem and chess tournament pairings. Divulgaciones Matemáticas 7: 19–28.Suche in Google Scholar

Lambers, R., Goossens, D., and Spieksma, F.C.R. (2023). The flexibility of home away pattern sets. J. Sched. 26: 413–423. https://doi.org/10.1007/s10951-022-00734-w.Suche in Google Scholar

Larson, J., Johansson, M., and Carlsson, M. (2014) An integrated constraint programming approach to scheduling sports leagues with divisional and round-robin tournaments. In: Simonis, Helmut (Ed.). Integration of AI and OR techniques in constraint programming. Springer International Publishing, Cham, pp. 144–158.10.1007/978-3-319-07046-9_11Suche in Google Scholar

Laslier, J.-F. (1997). Tournament solutions and majority voting. Springer, Berlin.10.1007/978-3-642-60805-6Suche in Google Scholar

Milvang, O. (2016). Probability for the outcome of a chess game based on rating, Available at: <https://pairings.fide.com/images/stories/downloads/2016-probability-of-the-outcome.pdf>.Suche in Google Scholar

Moulin, H. (1986). Choosing from a tournament. Soc. Choice Welfare 3: 271–291. https://doi.org/10.1007/bf00292732.Suche in Google Scholar

Muurlink, O. and Poyatos Matas, C. (2011). From romance to rocket science: speed dating in higher education. High Educ. Res. Dev. 30: 751–764. https://doi.org/10.1080/07294360.2010.539597.Suche in Google Scholar

Ólafsson, 1990 Ólafsson, S. (1990). Weighted matching in chess tournaments. J. Oper. Res. Soc. 41: 17–24. https://doi.org/10.1057/jors.1990.3.Suche in Google Scholar

Paraschakis, D. and Nilsson, B.J. (2020). Matchmaking under fairness constraints: a speed dating case study. In: Boratto, L., Faralli, S., Marras, M., and Stilo, G. (Eds.). Bias and social aspects in search and recommendation. Springer International Publishing, Cham, pp. 43–57.10.1007/978-3-030-52485-2_5Suche in Google Scholar

Saile, C. and Suksompong, W. (2020). Robust bounds on choosing from large tournaments. Soc. Choice Welfare 54: 87–110, https://doi.org/10.1007/s00355-019-01213-6.Suche in Google Scholar PubMed PubMed Central

Scarf, P., Yusof, M.M., and Bilbao, M. (2009). A numerical study of designs for sporting contests. Eur. J. Oper. Res. 198: 190–198, https://doi.org/10.1016/j.ejor.2008.07.029.Suche in Google Scholar

Sinuany-Stern, Z. 1988. Ranking of sports teams via the AHP. J. Oper. Res. Soc. 39: 661–667. https://doi.org/10.2307/2582188.Suche in Google Scholar

Spearman, C. (1904). The proof and measurement of association between two things. Am. J. Psychol. 15: 72–101. https://doi.org/10.2307/1412159.Suche in Google Scholar

Stanton, I. and Williams, V.V. 2011. Manipulating stochastically generated single-elimination tournaments for nearly all players. In Chen, N., Elkind, E., and Koutsoupias, E. (Eds.), Internet and network economics – 7th international workshop, WINE (Lecture Notes in Computer Science), Vol. 7090. Springer, Singapore, pp. 326–337.10.1007/978-3-642-25510-6_28Suche in Google Scholar

Sziklai, B.R., Biró, P., and Csató, L. (2022). The efficacy of tournament designs. Comput. Oper. Res. 144: 105821, https://doi.org/10.1016/j.cor.2022.105821.Suche in Google Scholar

Van Bulck, D. and Goossens, D. (2019). Handling fairness issues in time-relaxed tournaments with availability constraints. Comput. Oper. Res. 115: 104856, https://doi.org/10.1016/j.cor.2019.104856.Suche in Google Scholar

Voong, T.M. and Oehler, M. (2019). Auditory spatial perception using bone conduction headphones along with fitted head related transfer functions. In 2019 IEEE conference on virtual reality and 3D user interfaces (VR). IEEE, pp. 1211–1212.10.1109/VR.2019.8798218Suche in Google Scholar

Wei, L., Tian, Y., Wang, Y., and Huang, T. (2015). Swiss-system based cascade ranking for gait-based person re-identification. In: Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence (AAAI’15). AAAI Press, pp. 1882–1888.10.1609/aaai.v29i1.9454Suche in Google Scholar

Wikipedia. 2023. Swiss-system tournament – pairing procedure. Available at: <https://en.wikipedia.org/wiki/Swiss-system_tournament> (Accessed 08 June 2023).Suche in Google Scholar

Received: 2022-10-21

Accepted: 2023-12-14

Published Online: 2024-01-19

Published in Print: 2024-06-25

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/jqas-2022-0090

Schlagwörter für diesen Artikel

swiss system; tournaments; fairness; ranking; matching; chess