Abstract
In this work, we deal with the problem of rating in sports, where the skills of the players/teams are inferred from the observed outcomes of the games. Our focus is on the on-line rating algorithms that estimate skills after each new game by exploiting the probabilistic models that (i) relate the skills to the outcome of the game and (ii) describe how the skills evolve in time. We propose a Bayesian approach which may be seen as an approximate Kalman filter and which is generic in the sense that it can be used with any skills-outcome model and can be applied in the individual as well as in the group sports. We show how the well-known Elo, Glicko, and TrueSkill algorithms may be seen as instances of the one-fits-all approach we propose. To clarify the conditions under which the gains of the Bayesian approach over simpler solutions can actually materialize, we critically compare the known and new algorithms by means of numerical examples using synthetic and empirical data.
Funding source: NSERC
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Appendix A: Proof of Proposition 1
Our goal is to find the Gaussian distribution
The gradient of (114) with respect to
μ
is zeroed for
Now, assume that we use the vector-covariance model, i.e., we have to find
where Var[θ m ] is the variance of θ m .
Zeroing the derivative of (115) with respect to v m yields v m = Var[θ m ], that is, v = di(Cov[ θ ]) which proves (21).
Finally, if we adopt the scalar-covariance model
whose derivative with respect to v is zeroed if
Appendix B: Completing the square
Finding the relationship between V t , μ t and Vt,t−1 and μ t,t−1 is a multidimensional version of completing the square (Barber 2012, Section 8.4.1).
Using (32) in (28) yields the following:
where the equality in (119) must hold because Q( θ ) is a quadratic form.
The quadratic coefficient (matrix) is immediately isolated as follows:
where (121) is obtained by the matrix inversion lemma (Moon and Stirling 2000, Section 4.11) and it is the same as (38).
To find μ t we note that it minimizes Q( θ ), and thus zeros its gradient
solving it as
Appendix C: Proof of Proposition 2
For brevity, let us use the symbol
The proof is done by induction: by construction, the initialization satisfied the Proposition, i.e.,
and (44), as
This ends the proof for the KF algorithm. By extension, all other algorithms derived from the KF algorithm must satisfy the claims of Proposition 2, which may also be proven with the steps shown above applied to the vSKF, sSKF, and fSKF algorithms.
Appendix D: Models for multilevel games
The Davidson model (Davidson 1970; Szczecinski and Djebbi 2020) uses y t = 0 (away win), and y t = 1 (draw), and y t = 2 (home win). The likelihood function and the corresponding function g(⋅; ⋅) and h(⋅; ⋅) are given by
where
Note that by setting κ = 0, i.e., removing the possibility of draws, we obtain FL(z) = GDav(z/2), and we recover the equations of the Bradley–Terry model with a halved scale. A simple but less obvious observation is that setting κ = 2, we obtain GDav(z) = FL(z), and thus g(z; y t ) in (75) is half of (128). A direct consequence, observed in Szczecinski and Djebbi (2020), is that even if the Bradley–Terry and Davidson models are different, the algorithm which uses only the gradient (e.g., SG updates (62)) may be identical.
The Skellam model studied in Karlis and Ntzoufras (2008) and Lasek and Gagolewski (2020) models the goal difference y t ∈ {…, −2, −1, 0, 1, 2, …} using the Skellam distribution (Karlis and Ntzoufras 2008, Section 2.2):
with I v (⋅) being the modified Bessel function of order v and
are means of the home- and away- goals (functions of the skills difference z), and c is a constant offset that should be fit to the data.
The negated log-likelihood is then given by
and we easily find the functions required by the Kalman rating algorithms:
Appendix E: Grid-search results
As complementary results, we show in Table 2 examples of the grid search that lead to the choice of the parameter (v0, ɛ) in Table 1.
Grid-search: values of
(a) | |||||||
---|---|---|---|---|---|---|---|
ɛ | |||||||
3.0 ⋅ 10−6 | 6.0 ⋅ 10−6 | 1.5 ⋅ 10−5 | 3.0 ⋅ 10−5 | 6.0 ⋅ 10−6 | 1.5 ⋅ 10−4 | ||
v 0 | 1.5 ⋅ 10−3 | 1.0669 | 1.0663 | 1.0652 | 1.0645 | 1.0650 | 1.0695 |
2.1 ⋅ 10−3 | 1.0658 | 1.0654 | 1.0647 | 1.0643 | 1.0651 | 1.0696 | |
3.0 ⋅ 10 −3 | 1.0650 | 1.0647 | 1.0643 | 1.0643 | 1.0653 | 1.0698 | |
4.5 ⋅ 10−3 | 1.0646 | 1.0645 | 1.0644 | 1.0646 | 1.0657 | 1.0701 | |
6.0 ⋅ 10−3 | 1.0647 | 1.0646 | 1.0647 | 1.0650 | 1.0661 | 1.0703 |
(b) | |||||||
---|---|---|---|---|---|---|---|
ɛ | |||||||
10–11 | 10–10 | 10–9 | 10–9 | 10–7 | 10–6 | ||
v 0 | 0.02 | 0.9761 | 0.9761 | 0.9761 | 0.9761 | 0.9761 | 0.9761 |
0.03 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | |
0.04 | 0.9738 | 0.9738 | 0.9738 | 0.9738 | 0.9738 | 0.9739 | |
0.06 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | 0.9744 | |
0.08 | 0.9754 | 0.9754 | 0.9754 | 0.9754 | 0.9754 | 0.9754 |
References
Agresti, A. 1992. “Analysis of Ordinal Paired Comparison Data.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 41: 287–97. https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2347562.10.2307/2347562Search in Google Scholar
Barber, D. 2012. Bayesian Reasoning and Machine Learning. New York: Cambridge University Press.10.1017/CBO9780511804779Search in Google Scholar
Bishop, C. 2006. Pattern Recognition and Machine Learning. Singapore: Springer.Search in Google Scholar
Boshnakov, G., T. Kharrat, and I. G. McHale. 2017. “A Bivariate Weibull Count Model for Forecasting Association Football Scores.” International Journal of Forecasting 33: 458–66. https://doi.org/10.1016/j.ijforecast.2016.11.006.Search in Google Scholar
Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: 1 the Method of Paired Comparisons.” Biometrika 39: 324–45. https://doi.org/10.2307/2334029.Search in Google Scholar
Cattelan, M. 2012. “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data.” Statistical Science 27: 412–33. https://doi.org/10.1214/12-sts396.Search in Google Scholar
David, H. 1963. The Method of Paired Comparison. Frome and London: Charles Griffin & Co. Ltd.Search in Google Scholar
Davidson, R. R. 1970. “On Extending the Bradley–Terry Model to Accommodate Ties in Paired Comparison Experiments.” Journal of the American Statistical Association 65: 317–28. https://doi.org/10.1080/01621459.1970.10481082.Search in Google Scholar
Elo, A. E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco Publishing Inc.Search in Google Scholar
eloratings.net. 2020. “World Football Elo Ratings.” https://www.eloratings.net/.Search in Google Scholar
Fahrmeir, L. 1992. “Posterior Mode Estimation by Extended Kalman Filtering for Multivariate Dynamic Generalized Linear Models.” Journal of the American Statistical Association 87: 501–9. https://doi.org/10.1080/01621459.1992.10475232.Search in Google Scholar
Fahrmeir, L., and G. Tutz. 1994. “Dynamic Stochastic Models for Time-dependent Ordered Paired Comparison Systems.” Journal of the American Statistical Association 89: 1438–49. https://doi.org/10.1093/biomet/39.3-4.324.Search in Google Scholar
FIDE. 2019. “International Chess Federation: Ratings Change Calculator.” https://ratings.fide.com/calculator_rtd.phtml.Search in Google Scholar
FIFA. 2018. “Revision of the FIFA/Coca-Cola World Ranking.” https://digitalhub.fifa.com/m/f99da4f73212220/original/edbm045h0udbwkqew35a-pdf.pdf.Search in Google Scholar
FIVB. 2020. “New Senior World Rankings.” https://www.fivb.com/en/volleyball/rankings.Search in Google Scholar
FiveThirtyEight. 2020. “How Our NFL Predictions Work.” https://fivethirtyeight.com/methodology/how-our-nfl-predictions-work/.Search in Google Scholar
Glickman, M. E. 1993. Paired Comparison Models with Time-Varying Parameters. PhD thesis. Harvard University.10.21236/ADA272016Search in Google Scholar
Glickman, M. E. 1995. “Chess Rating Systems.” American Chess Journal 3: 59–102.Search in Google Scholar
Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48: 377–94. https://doi.org/10.1111/1467-9876.00159.Search in Google Scholar
Goddard, J. 2005. “Regression Models for Forecasting Goals and Match Results in Association Football.” International Journal of Forecasting 21: 331–40. https://doi.org/10.1016/j.ijforecast.2004.08.002.Search in Google Scholar
Held, L., and R. Vollnhals. 2005. “Dynamic Rating of European Football Teams.” IMA Journal of Management Mathematics 16: 121–30. https://doi.org/10.1093%2Fimaman%2Fdpi004.10.1093/imaman/dpi004Search in Google Scholar
Herbrich, R., and T. Graepel. 2006. “TrueSkill(TM): A Bayesian Skill Rating System.” Technical report. https://www.microsoft.com/en-us/research/publication/trueskilltm-a-bayesian-skill-rating-system-2/.10.7551/mitpress/7503.003.0076Search in Google Scholar
Herbrich, R., T. Minka, and T. Graepel. 2008. “TrueSkill through Time: Revisiting the History of Chess.” In Advances in Neural Information Processing Systems 20, 931–8. MIT Press. https://www.microsoft.com/en-us/research/publication/trueskill-through-time-revisiting-the-history-of-chess/.Search in Google Scholar
Ingram, M. 2021. “How to Extend Elo: A Bayesian Perspective.” Journal of Quantitative Analysis in Sports 17: 203–19. https://doi.org/10.1515/jqas-2020-0066.Search in Google Scholar
Karlis, D., and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20: 133–45. https://doi.org/10.1093/imaman/dpn026.Search in Google Scholar
Király, F. J., and Z. Qian. 2017. “Modelling Competitive Sports: Bradley–Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes,” arXiv e-prints, arXiv:1701.08055.Search in Google Scholar
Knorr-Held, L. 2000. “Dynamic Rating of Sports Teams.” Journal of the Royal Statistical Society. Series D (The Statistician) 49: 261–76. https://doi.org/10.1111/1467-9884.00236.Search in Google Scholar
Koopman, S. J., and R. Lit. 2015. “A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League.” Journal of the Royal Statistical Society A 178: 167–86. https://doi.org/10.1111/rssa.12042.Search in Google Scholar
Koopman, S. J., and R. Lit. 2019. “Forecasting Football Match Results in National League Competitions Using Score-Driven Time Series Models.” International Journal of Forecasting 35: 797–809. https://doi.org/10.1016/j.ijforecast.2018.10.011.Search in Google Scholar
Kuk, A. Y. C. 1995. “Modelling Paired Comparison Data with Large Numbers of Draws and Large Variability of Draw Percentages Among Players.” Journal of the Royal Statistical Society. Series D (The Statistician) 44: 523–8. https://doi.org/10.2307/2348900.Search in Google Scholar
Lasek, J., and M. Gagolewski. 2020. “Interpretable Sports Team Rating Models Based on the Gradient Descent Algorithm.” International Journal of Forecasting 37: 1061–71. https://doi.org/10.1016/j.ijforecast.2020.11.008.Search in Google Scholar
Ley, C., T. Van de Wiele, and H. Van Eetvelde. 2019. “Ranking Soccer Teams on the Basis of Their Current Strength: A Comparison of Maximum Likelihood Approaches.” Statistical Modelling 19: 55–73. https://doi.org/10.1177/1471082X18817650.Search in Google Scholar
Maher, M. J. 1982. “Modelling Association Football Scores.” Statistica Neerlandica 36: 109–18. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9574.1982.tb00782.x.10.1111/j.1467-9574.1982.tb00782.xSearch in Google Scholar
Manderson, A. A., K. Murray, and B. A. Turlach. 2018. “Dynamic Bayesian Forecasting of AFL Match Results Using the Skellam Distribution.” Australian & New Zealand Journal of Statistics 60: 174–87. https://onlinelibrary.wiley.com/doi/abs/10.1111/anzs.12225.10.1111/anzs.12225Search in Google Scholar
Microsoft. 2005. “Trueskill Ratings System.” Technical report. https://www.microsoft.com/en-us/research/project/trueskill-ranking-system/.Search in Google Scholar
Moon, T. K., and W. C. Stirling. 2000. Mathematical Methods and Algorithms for Signal Processing. New Jersey: Prentice Hall.Search in Google Scholar
Paleologu, C., J. Benesty, and S. Ciochină. 2013. “Study of the General Kalman Filter for Echo Cancellation.” IEEE Transactions on Audio Speech and Language Processing 21: 1539–49. https://doi.org/10.1109/tasl.2013.2245654.Search in Google Scholar
Rao, P. V., and L. L. Kupper. 1967. “Ties in Paired-Comparison Experiments: A Generalization of the Bradley–Terry Model.” Journal of the American Statistical Association 62: 194–204. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1967.10482901.10.1080/01621459.1967.10482901Search in Google Scholar
Sayed, A. H. 2008. Adaptive Filters. Hoboken, New Jersey: John Wiley & Sons.Search in Google Scholar
Szczecinski, L. 2022. “G-elo: Generalization of the Elo Algorithm by Modeling the Discretized Margin of Victory.” Journal of Quantitative Analysis in Sports 18 (1): 1–14, https://doi.org/10.1515/jqas-2020-0115.Search in Google Scholar
Szczecinski, L., and A. Djebbi. 2020. “Understanding Draws in Elo Rating Algorithm.” https://www.degruyter.com/document/doi/10.1515/jqas-2019-0102/html.10.1515/jqas-2019-0102Search in Google Scholar
Thurston, L. L. 1927. “A Law of Comparative Judgement.” Psychological Review 34: 273–86. https://doi.org/10.1037/h0070288.Search in Google Scholar
Wheatcroft, E. 2020. “A Profitable Model for Predicting the Over/under Market in Football.” International Journal of Forecasting 36: 916–32. https://doi.org/10.1016/j.ijforecast.2019.11.001.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- A Bayesian analysis of the time through the order penalty in baseball
- Parking the bus
- Bayesian analysis of Formula One race results: disentangling driver skill and constructor advantage
- Simplified Kalman filter for on-line rating: one-fits-all approach
- The evolution of seeding systems and the impact of imbalanced groups in FIFA Men’s World Cup tournaments 1954–2022
Articles in the same Issue
- Frontmatter
- Research Articles
- A Bayesian analysis of the time through the order penalty in baseball
- Parking the bus
- Bayesian analysis of Formula One race results: disentangling driver skill and constructor advantage
- Simplified Kalman filter for on-line rating: one-fits-all approach
- The evolution of seeding systems and the impact of imbalanced groups in FIFA Men’s World Cup tournaments 1954–2022