Abstract
The International Chess Federation (FIDE) imposes a voluminous and complex set of player pairing criteria in Swiss-system chess tournaments and endorses computer programs that are able to calculate the prescribed pairings. The purpose of these formalities is to ensure that players are paired fairly during the tournament and that the final ranking corresponds to the players’ true strength order. We contest the official FIDE player pairing routine by presenting alternative pairing rules. These can be enforced by computing maximum weight matchings in a carefully designed graph. We demonstrate by extensive experiments that a tournament format using our mechanism (1) yields fairer pairings in the rounds of the tournament and (2) produces a final ranking that reflects the players’ true strengths better than the state-of-the-art FIDE pairing system.
Funding source: Magyar Tudományos Akadémia
Award Identifier / Grant number: Janos Bolyai Research Fellowship
Funding source: Nemzeti Kutatási Fejlesztési és Innovációs Hivatal
Award Identifier / Grant number: K128611
-
Research ethics: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: Agnes Cseh was supported by the János Bolyai Research Fellowship.
-
Data availability: The raw data can be obtained on request from the corresponding author.
Appendix A: Ranking quality
In the following we discuss additional simulation experiments that measure the obtained ranking quality for various parameter settings.
A.1 Different tournament sizes
We start with experimental results demonstrating that our findings on the ranking quality remain valid for tournaments of different sizes in terms of number of players and number of rounds.
Usually it is expected that a player who wins all matches also wins the tournament, without being tied for the first place. This can be ensured by playing at least ⌈log2n⌉ rounds: four rounds for 16 players, five rounds for 32 players and six rounds for 64 players. Most tournaments are five or seven rounds long, according to data from chess-results.com (Herzog 2020a).
In general, more rounds lead to higher ranking quality, although with diminishing effect, as Figure A1 shows. In terms of the achieved ranking quality, the MWM engine with Burstein outperforms Dutch BBP in all cases, except for the unrealistic case of a tournament with only two rounds.

Ranking quality after 1–9 rounds, 32 or 64 players with strength range 1400–2200. Results for Burstein are shown in blue, Dutch BBP results are shown in orange.
A.2 Different strength range sizes
Here we vary the used strength range size, i.e., we sample the player strengths from different intervals. A smaller strength range size corresponds to a tournament among players with similar strength and larger strength range sizes model tournaments with more heterogeneous players. The results depicted in Figure A2 show that also for different strength range sizes the MWM engine with Burstein or Random2 outperforms Dutch BBP in terms of ranking quality and that Dutch is on a par with Dutch BBP.

Ranking quality measured by normalized Kendall τ for different strength range sizes.
A higher strength range size results in higher ranking quality and less variance. The increasing ranking quality can be explained by a higher mean strength difference, which results from a larger strength range size. Variance decreases, because match results become more predictable.
The difference in ranking quality between Burstein and Dutch BBP is much higher for a strength range size of 400 compared to 800 and 1200. For small strength range sizes in all Dutch BBP paired matches it is more likely that a weaker player wins against a stronger opponent, while for Burstein at least some matches are still predictable.
A.3 Different player strength distributions
We provide additional experimental results that indicate that our findings hold independently of the employed player strength distributions, i.e., we get the same behavior also for non-uniform distributions. Since no data is available that let’s us estimate how realistic player strength distributions look like, we focus on several natural candidates that deviate strongly from uniform distributions.
First, we considered player strength distributions that are derived from exponential distributions. For this, we consider in Figure A3 a case with many strong players and only a few weak players and in Figure A4 a case with many weak players and only a few strong players within the given strength range size. We also considered player strength distributions derived from a normal distribution with a mean exactly in the middle of the strength range size and a standard deviation of a fourth of the strength range size. See Figure A5 for the corresponding results.
![Figure A3:
Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 2000.](/document/doi/10.1515/jqas-2022-0090/asset/graphic/j_jqas-2022-0090_fig_018.jpg)
Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 2000.
![Figure A4:
Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 1600.](/document/doi/10.1515/jqas-2022-0090/asset/graphic/j_jqas-2022-0090_fig_019.jpg)
Ranking quality measured by normalized Kendall τ for 32 players with an exponential player strength distribution in the range [1400, 2200] with mean at 1600.
![Figure A5:
Ranking quality measured by normalized Kendall τ for 32 players with a normally distributed player strength distribution in the range [1400, 2200] with mean at 1800 and standard deviation of 200.](/document/doi/10.1515/jqas-2022-0090/asset/graphic/j_jqas-2022-0090_fig_020.jpg)
Ranking quality measured by normalized Kendall τ for 32 players with a normally distributed player strength distribution in the range [1400, 2200] with mean at 1800 and standard deviation of 200.
Finally, we investigated a player strength distribution that is derived from uniformly sampling player strengths from the real-world distribution of Elo scores of all 363,275 players listed by FIDE,[2] restricted to the desired strength range. Figure A6 shows also very similar results for this case.
![Figure A6:
Ranking quality measured by normalized Kendall τ for 32 players uniformly sampled from the real-world distribution of Elo scores restricted to the range [1400, 2200].](/document/doi/10.1515/jqas-2022-0090/asset/graphic/j_jqas-2022-0090_fig_021.jpg)
Ranking quality measured by normalized Kendall τ for 32 players uniformly sampled from the real-world distribution of Elo scores restricted to the range [1400, 2200].
A.4 Ranking quality via spearman ρ and NDCG
For comparison reasons, we provide an evaluation of the achieved ranking quality via the Spearman ρ and the normalized discounted cumulative gain (NDCG) measures.
Besides Kendall τ, Spearman ρ is commonly used for comparing rankings. Here, we use a normalized variant of Spearman ρ, similar to the normalized Kendall τ.
The NDCG measure is not commonly used for comparing rankings. It is used to evaluate search engines, by assigning a relevance rating to documents and awarding a higher score if highly relevant documents are listed early. Applied to our case, NDCG puts an emphasis on ranking the top players correctly, while ranking the lowest ranked players correctly is basically irrelevant.
As shown in Figures A7 and A8, the results with normalized Spearman ρ and NDCG look almost identical to the results for normalized Kendall τ in Figure 7. Also, for different strength ranges or range sizes we get consistent results, see Figures A9, A10, A11 and A12.

Ranking quality measured by normalized Spearman ρ.

Ranking quality measured by the normalized discounted cumulative gain (NDCG).

Ranking quality measured by normalized Spearman ρ.

Ranking quality measured by the normalized discounted cumulative gain (NDCG).

Ranking quality measured by normalized Spearman ρ.

Ranking quality measured by the normalized discounted cumulative gain (NDCG).
Appendix B: Fairness
Here we present additional simulation results that measure the achieved fairness, i.e., results regarding the compliance with the quality criteria (Q1) and (Q2).
B.1 Number of float pairs
We consider the obtained number of float pairs for different strength ranges and different strength range sizes. Figures B1 and B2 show that we get consistent results for different strength ranges and different strength range sizes. Burstein has by far the lowest number of float pairs, but also Random2 and Dutch perform slightly better than Dutch BBP. Figure B3 shows a direct comparison of the obtained number of float pairs for Burstein and Dutch BBP for different numbers of players and different tournament lengths.

Number of float pairs for different strength ranges.

Number of float pairs for different strength range sizes.

Number of float pairs for different tournament sizes and lengths. The results for Burstein are shown in blue, results for Dutch BBP in orange.
Also here we consistently get that Burstein achieves much fewer float pairs than Dutch BBP.
B.2 Absolute color difference
The measured absolute color difference increases slightly with the number of rounds and also with the number of players, as Figure B4 shows.

Absolute color difference in rounds 1–9, 16–64 players with strength range 1400–2200. Results for Burstein are shown in blue, Dutch BBP results are shown in orange.
Note that in every odd round, the absolute color difference must be at least n, which can also be seen. All investigated pairing systems almost always meet this lower bound for odd rounds. Interestingly, Dutch BBP seems to perform slightly better in tournaments with at most 4 rounds compared to Burstein, but this tiny advantage vanishes for at least six rounds. We get similar results when comparing with Random2, Dutch, Random, and Monrad.
References
Appleton, D. R. (1995). The best man win? J. R. Stat. Soc. – Ser. D Statistician 44: 529–538.10.2307/2348901Suche in Google Scholar
Beutel, A., Chen, J., Doshi, T., Qian, H., Wei, L., Wu, Y., Heldt, L., Zhao, Z., Hong, L., Chi, E. H., et al.. (2019) Fairness in recommendation ranking through pairwise comparisons. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’19). Association for Computing Machinery, New York, pp. 2212–2220.10.1145/3292500.3330745Suche in Google Scholar
Bierema, J. (2017). BBP pairings, a swiss-system chess tournament engine, Available at: <https://github.com/BieremaBoyzProgramming/bbpPairings> (Accessed 17 May 2022).Suche in Google Scholar
Bimpikis, K., Ehsani, S., and Mostagir, M. (2019). Designing dynamic contests. Oper. Res. 67: 339–356.10.1287/opre.2018.1823Suche in Google Scholar
Biró, P., Fleiner, T., and Palincza, R. (2017) Designing chess pairing mechanisms. In: 10th Japanese–Hungarian symposium on discrete mathematics and its applications. Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Budapest, pp. 77–86.Suche in Google Scholar
Brandt, F., Brill, M., Seedig, H.G., and Suksompong, W. (2018). On the structure of stable tournament solutions. Econ. Theor. 65: 483–507. https://doi.org/10.1007/s00199-016-1024-x.Suche in Google Scholar
Brandt, F., Conitzer, V., Endriss, U., Lang, J., and Procaccia, A.D. (2016). Introduction to computational social choice. Cambridge University Press, Cambridge.10.1017/CBO9781107446984.002Suche in Google Scholar
Brandt, Felix and Fischer, Felix A. (2007) PageRank as a weak tournament solution. In: Deng, X. and Graham, F.C. (Eds.), Internet and network economics, third international workshop, WINE (Lecture Notes in Computer Science), Vol. 4858. Springer, San Diego, pp. 300–305.10.1007/978-3-540-77105-0_30Suche in Google Scholar
Castaño, F. and Velasco, N. (2020). Exact and heuristic approaches for the automated design of medical trainees rotation schedules. Omega 97: 102107, https://doi.org/10.1016/j.omega.2019.102107.Suche in Google Scholar
Chatterjee, K., Ibsen-Jensen, R., and Tkadlec, J. (2016). Robust draws in balanced knockout tournaments. In: Proceedings of the 25th international joint conference on artificial intelligence. AAAI Press, New York, pp. 172–179.Suche in Google Scholar
Chen, X., Bennett, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In: Proceedings of the Sixth ACM international conference on web search and data mining (WSDM ’13). Association for Computing Machinery, New York, pp. 193–202.10.1145/2433396.2433420Suche in Google Scholar
Csató, L. (2013). Ranking by pairwise comparisons for swiss-system tournaments. Cent. Eur. J. Oper. Res. 21: 783–803. https://doi.org/10.1007/s10100-012-0261-8.Suche in Google Scholar
Csató, L. (2017). On the ranking of a Swiss system chess team tournament. Ann. Oper. Res. 254: 17–36. https://doi.org/10.1007/s10479-017-2440-4.Suche in Google Scholar
Csató, L. (2021). Tournament design: how operations research can improve sports rules. Springer Nature, Switzerland.10.1007/978-3-030-59844-0Suche in Google Scholar
Dagaev, D. and Suzdaltsev, A. (2018). Competitive intensity and quality maximizing seedings in knock-out tournaments. J. Combin. Optim. 35: 170–188. https://doi.org/10.1007/s10878-017-0164-7.Suche in Google Scholar
Dezső, B., Jüttner, A., and Kovács, P. (2011). LEMON–an open source C++ graph template library. Electron. Notes Theor. Comput. Sci. 264: 23–45, https://doi.org/10.1016/j.entcs.2011.06.003.Suche in Google Scholar
Dirac, G.A. (1952). Some theorems on abstract graphs. Proc. Lond. Math. Soc. 3: 69–81, https://doi.org/10.1112/plms/s3-2.1.69.Suche in Google Scholar
Edmonds, J. (1965). Paths, trees, and flowers. Can. J. Math. 17: 449–467, https://doi.org/10.4153/CJM-1965-045-4.Suche in Google Scholar
Elmenreich, W., Ibounig, T., and Fehérvári, I. (2009). Robustness versus performance in sorting and tournament algorithms. Acta Polytech. Hungar. 6: 7–18.Suche in Google Scholar
Elo, A.E. (1978). The rating of chessplayers, past and present. Arco Pub., London.Suche in Google Scholar
Fehérvári, I. and Elmenreich, W. (2009). Evolutionary Methods in Self-Organizing System Design. In: Arabnia, H.R., and Solo, A.M.G. (Eds.). Proceedings of the 2009 international conference on genetic and evolutionary methods, GEM 2009, July 13–16, 2009, Las Vegas Nevada, USA. CSREA Press, pp. 10–15.Suche in Google Scholar
FIDE (2020). FIDE handbook. Available at: <https://handbook.fide.com/> (Accessed 17 May 2022).Suche in Google Scholar
FIDE (2023). FIDE handbook, D. regulations for specific competitions/02. Chess Olympiad. Available at: <https://handbook.fide.com/chapter/OlympiadPairingRules2022> (Accessed 26 July 2023).Suche in Google Scholar
FIDE SPP Commission (2020). Probability for the outcome of a chess game based on rating. Available at: <https://spp.fide.com/2020/10/23/probability-for-the-outcome-of-a-chess-game-based-on-rating/>.Suche in Google Scholar
Friendly, M. and Denis, D. (2005). The early origins and development of the scatterplot. J. Hist. Behav. Sci. 41: 103–130. https://doi.org/10.1002/jhbs.20078.Suche in Google Scholar PubMed
GitHub (2022). Suboptimal exchange in Remainder #7. Available at: <https://github.com/BieremaBoyzProgramming/bbpPairings/issues/7>.Suche in Google Scholar
Glickman, M.E. and Jensen, S.T. (2005). Adaptive paired comparison design. J. Stat. Plann. Inference 127: 279–293. https://doi.org/10.1016/j.jspi.2003.09.022.Suche in Google Scholar
Gupta, S., Roy, S., Saurabh, S., and Zehavi, M. (2018) When rigging a tournament, let greediness blind you. In: Proceedings of the 27th international joint conference on artificial intelligence. IJCAI, Stockholm, pp. 275–281.10.24963/ijcai.2018/38Suche in Google Scholar
Guse, J., Schweigert, E., Kulms, G., Heinen, I., Martens, C., and Guse, A.H. (2016). Effects of mentoring speed dating as an innovative matching tool in undergraduate medical education: a mixed methods study. PLoS One 11: e0147444. https://doi.org/10.1371/journal.pone.0147444.Suche in Google Scholar PubMed PubMed Central
Harbring, C. and Irlenbusch, B. (2003). An experimental study on tournament design. Lab. Econ. 10: 443–464. https://doi.org/10.1016/s0927-5371(03)00034-4.Suche in Google Scholar
Henery, R.J. (1992). An extension to the Thurstone-Mosteller model for chess. J. R. Stat. Soc. – Ser. D Statistician 41: 559–567. https://doi.org/10.2307/2348921.Suche in Google Scholar
Heinz Herzog. 2020a. Chess-Results.com, the international chess-tournaments-results-server. Available at: <https://chess-results.com/> (Accessed 07 December 2021).Suche in Google Scholar
Heinz Herzog. 2020b. Swiss-Manager. Available at: <http://www.swiss-manager.at/> (Accessed 07 December 2021).Suche in Google Scholar
Hintze, J.L. and Nelson, R.D. (1998). Violin plots: a box plot-density trace synergism. Am. Statistician 52: 181–184. https://doi.org/10.1080/00031305.1998.10480559.Suche in Google Scholar
Heike Hofmann, Hadley Wickham, and Karen Kafadar. (2017). Value plots: boxplots for large data. J. Comput. Graph Stat. 26: 469–477. https://doi.org/10.1080/10618600.2017.1305277.Suche in Google Scholar
Hoshino, R. (2018). A recursive algorithm to generate balanced weekend tournaments. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 32. AAAI Press, New Orleans, pp. 6195–6201.10.1609/aaai.v32i1.12076Suche in Google Scholar
Hudry, O. (2009). A survey on the complexity of tournament solutions. Math. Soc. Sci. 57: 292–303. https://doi.org/10.1016/j.mathsocsci.2008.12.002.Suche in Google Scholar
Irving, R. (1985). An efficient algorithm for the “stable roommates” problem. J. Algorithm. 6: 577–595. https://doi.org/10.1016/0196-6774(85)90033-1.Suche in Google Scholar
Karpov, A. (2018). Generalized knockout tournament seedings. Int. J. Comput. Sci. Sport 17: 113–127. https://doi.org/10.2478/ijcss-2018-0006.Suche in Google Scholar
Kendall, M.G. 1945. The treatment of ties in ranking problems. Biometrika 33: 239–251. https://doi.org/10.1093/biomet/33.3.239.Suche in Google Scholar PubMed
Kim, M.P., Suksompong, W., and Williams, V.V. (2017). Who can win a single-elimination tournament? SIAM J. Discrete Math. 31: 1751–1764. https://doi.org/10.1137/16m1061783.Suche in Google Scholar
Kim, M.P and Williams, V.V. (2015). Fixing tournaments for kings, chokers, and more. In: Proceedings of the 24th international joint conference on artificial intelligence. AAAI Press, New Orleans, pp. 561–567.Suche in Google Scholar
Kolmogorov, V. (2009). Blossom V: a new implementation of a minimum cost perfect matching algorithm. Math. Programm. Comput. 1: 43–67. https://doi.org/10.1007/s12532-009-0002-8.Suche in Google Scholar
Korte, B. and Vygen, J. (2011). Combinatorial optimization: theory and algorithms. Springer, Berlin.10.1007/978-3-642-24488-9Suche in Google Scholar
Kujansuu, E., Lindberg, T., and Mäkinen, E. (1999). The stable roommates problem and chess tournament pairings. Divulgaciones Matemáticas 7: 19–28.Suche in Google Scholar
Lambers, R., Goossens, D., and Spieksma, F.C.R. (2023). The flexibility of home away pattern sets. J. Sched. 26: 413–423. https://doi.org/10.1007/s10951-022-00734-w.Suche in Google Scholar
Larson, J., Johansson, M., and Carlsson, M. (2014) An integrated constraint programming approach to scheduling sports leagues with divisional and round-robin tournaments. In: Simonis, Helmut (Ed.). Integration of AI and OR techniques in constraint programming. Springer International Publishing, Cham, pp. 144–158.10.1007/978-3-319-07046-9_11Suche in Google Scholar
Laslier, J.-F. (1997). Tournament solutions and majority voting. Springer, Berlin.10.1007/978-3-642-60805-6Suche in Google Scholar
Milvang, O. (2016). Probability for the outcome of a chess game based on rating, Available at: <https://pairings.fide.com/images/stories/downloads/2016-probability-of-the-outcome.pdf>.Suche in Google Scholar
Moulin, H. (1986). Choosing from a tournament. Soc. Choice Welfare 3: 271–291. https://doi.org/10.1007/bf00292732.Suche in Google Scholar
Muurlink, O. and Poyatos Matas, C. (2011). From romance to rocket science: speed dating in higher education. High Educ. Res. Dev. 30: 751–764. https://doi.org/10.1080/07294360.2010.539597.Suche in Google Scholar
Ólafsson, 1990 Ólafsson, S. (1990). Weighted matching in chess tournaments. J. Oper. Res. Soc. 41: 17–24. https://doi.org/10.1057/jors.1990.3.Suche in Google Scholar
Paraschakis, D. and Nilsson, B.J. (2020). Matchmaking under fairness constraints: a speed dating case study. In: Boratto, L., Faralli, S., Marras, M., and Stilo, G. (Eds.). Bias and social aspects in search and recommendation. Springer International Publishing, Cham, pp. 43–57.10.1007/978-3-030-52485-2_5Suche in Google Scholar
Saile, C. and Suksompong, W. (2020). Robust bounds on choosing from large tournaments. Soc. Choice Welfare 54: 87–110, https://doi.org/10.1007/s00355-019-01213-6.Suche in Google Scholar PubMed PubMed Central
Scarf, P., Yusof, M.M., and Bilbao, M. (2009). A numerical study of designs for sporting contests. Eur. J. Oper. Res. 198: 190–198, https://doi.org/10.1016/j.ejor.2008.07.029.Suche in Google Scholar
Sinuany-Stern, Z. 1988. Ranking of sports teams via the AHP. J. Oper. Res. Soc. 39: 661–667. https://doi.org/10.2307/2582188.Suche in Google Scholar
Spearman, C. (1904). The proof and measurement of association between two things. Am. J. Psychol. 15: 72–101. https://doi.org/10.2307/1412159.Suche in Google Scholar
Stanton, I. and Williams, V.V. 2011. Manipulating stochastically generated single-elimination tournaments for nearly all players. In Chen, N., Elkind, E., and Koutsoupias, E. (Eds.), Internet and network economics – 7th international workshop, WINE (Lecture Notes in Computer Science), Vol. 7090. Springer, Singapore, pp. 326–337.10.1007/978-3-642-25510-6_28Suche in Google Scholar
Sziklai, B.R., Biró, P., and Csató, L. (2022). The efficacy of tournament designs. Comput. Oper. Res. 144: 105821, https://doi.org/10.1016/j.cor.2022.105821.Suche in Google Scholar
Van Bulck, D. and Goossens, D. (2019). Handling fairness issues in time-relaxed tournaments with availability constraints. Comput. Oper. Res. 115: 104856, https://doi.org/10.1016/j.cor.2019.104856.Suche in Google Scholar
Voong, T.M. and Oehler, M. (2019). Auditory spatial perception using bone conduction headphones along with fitted head related transfer functions. In 2019 IEEE conference on virtual reality and 3D user interfaces (VR). IEEE, pp. 1211–1212.10.1109/VR.2019.8798218Suche in Google Scholar
Wei, L., Tian, Y., Wang, Y., and Huang, T. (2015). Swiss-system based cascade ranking for gait-based person re-identification. In: Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence (AAAI’15). AAAI Press, pp. 1882–1888.10.1609/aaai.v29i1.9454Suche in Google Scholar
Wikipedia. 2023. Swiss-system tournament – pairing procedure. Available at: <https://en.wikipedia.org/wiki/Swiss-system_tournament> (Accessed 08 June 2023).Suche in Google Scholar
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Opponent choice in tournaments: winning and shirking
- Equity, diversity, and inclusion in sports analytics
- Estimating age-dependent performance in paired comparisons competitions: application to snooker
- Improving ranking quality and fairness in Swiss-system chess tournaments
- Fair world para masters point system for swimming
- A multiplicative approach to decathlon scoring based on efficient frontiers
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Opponent choice in tournaments: winning and shirking
- Equity, diversity, and inclusion in sports analytics
- Estimating age-dependent performance in paired comparisons competitions: application to snooker
- Improving ranking quality and fairness in Swiss-system chess tournaments
- Fair world para masters point system for swimming
- A multiplicative approach to decathlon scoring based on efficient frontiers