Simplified Kalman filter for on-line rating: one-fits-all approach

Leszek Szczecinski; Raphaëlle Tihon

doi:10.1515/jqas-2021-0061

Article

Simplified Kalman filter for on-line rating: one-fits-all approach

Leszek Szczecinski and Raphaëlle Tihon

Published/Copyright: June 26, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 19 Issue 4

Abstract

In this work, we deal with the problem of rating in sports, where the skills of the players/teams are inferred from the observed outcomes of the games. Our focus is on the on-line rating algorithms that estimate skills after each new game by exploiting the probabilistic models that (i) relate the skills to the outcome of the game and (ii) describe how the skills evolve in time. We propose a Bayesian approach which may be seen as an approximate Kalman filter and which is generic in the sense that it can be used with any skills-outcome model and can be applied in the individual as well as in the group sports. We show how the well-known Elo, Glicko, and TrueSkill algorithms may be seen as instances of the one-fits-all approach we propose. To clarify the conditions under which the gains of the Bayesian approach over simpler solutions can actually materialize, we critically compare the known and new algorithms by means of numerical examples using synthetic and empirical data.

Keywords: Elo rating; Glicko algorithm; Kalman filter; rating algorithms; TrueSkill algorithm

Corresponding author: Leszek Szczecinski, Institut National de la Recherche Scientifique, Montreal, Canada, E-mail: leszek.szczecinski@inrs.ca

Funding source: NSERC

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix A: Proof of Proposition 1

Our goal is to find the Gaussian distribution f ̃ ( θ ) = N ( θ ; μ , V ) in the form (18) minimizing the KL divergence with a given distribution f( θ )

(113) D KL f ‖ f ̃ = ∫ f ( θ ) ln f ( θ ) f ̃ ( θ ) d θ

(114) ∝ 1 2 ln d e t ( 2 π V ) + 1 2 ∫ f ( θ ) ( θ − μ ) T V − 1 × ( θ − μ ) d θ .

The gradient of (114) with respect to μ is zeroed for μ = E [ θ ] regardless of the form of V; this proves (19). This is a well-known result, as well as the one that says that, to minimize (114), we also have to use V = C o v [ θ ] = E [ ( θ − μ ) T ( θ − μ ) ] , this is the claim in (20).

Now, assume that we use the vector-covariance model, i.e., we have to find f ̃ ( θ ) = N ( θ ; μ , d i a g ( v ) ) . Then, (114) becomes

(115) D KL f ‖ f ̃ ∝ 1 2 ∑ m = 1 M ln v m + ∑ m = 1 M V a r [ θ m ] 2 v m ,

where Var[θ_m] is the variance of θ_m.

Zeroing the derivative of (115) with respect to v_m yields v_m = Var[θ_m], that is, v = di(Cov[ θ ]) which proves (21).

Finally, if we adopt the scalar-covariance model f ̃ ( θ ) = N ( θ ; μ , v I ) , (115) becomes

(116) D KL f ‖ f ̃ ∝ M 2 ln ⁡ v + 1 2 v ∑ m = 1 M V a r [ θ m ] ,

whose derivative with respect to v is zeroed if v = 1 M ∑ m = 1 M V a r [ θ m ] , and this proves (22).

Appendix B: Completing the square

Finding the relationship between V_t, μ _t and V_t,t−1 and μ _t,t−1 is a multidimensional version of completing the square (Barber 2012, Section 8.4.1).

Using (32) in (28) yields the following:

(117) − ln f ̂ ( θ t | y ̲ − t ) ≈ g t s x t T θ − μ t , t − 1 + 1 2 ( θ − μ t , t − 1 ) T × h t s 2 x t x t T + V t , t − 1 − 1 θ − μ t , t − 1

(118) = Q ( θ ) + C o n s t .

(119) = 1 2 ( θ − μ t ) T V t − 1 ( θ − μ t ) + C o n s t . ,

where the equality in (119) must hold because Q( θ ) is a quadratic form.

The quadratic coefficient (matrix) is immediately isolated as follows:

(120) V t = h t s 2 x t x t T + V t , t − 1 − 1 − 1

(121) = V t , t − 1 − V t , t − 1 x t x t T V t , t − 1 h t s 2 + h t x t T V t , t − 1 x t ,

where (121) is obtained by the matrix inversion lemma (Moon and Stirling 2000, Section 4.11) and it is the same as (38).

To find μ _t we note that it minimizes Q( θ ), and thus zeros its gradient

∇ θ Q ( θ ) | θ = μ t = g t s x t + h t s 2 x t x t T μ t − μ t , t − 1 + V t , t − 1 − 1 μ t − μ t , t − 1 = 0 ;

solving it as

(122) μ t = μ t , t − 1 − g t s V t x t

and using (121) yields (37).

Appendix C: Proof of Proposition 2

For brevity, let us use the symbol ( ⋅ ) ̌ to denote only the scaled variables, e.g., V ̌ ≡ V s , s 2 v 0 , s 2 ⁡ ε ; the unscaled variables are used without the symbols, e.g., V ≡V(1, v₀, ɛ).

The proof is done by induction: by construction, the initialization satisfied the Proposition, i.e., V ̌ 0 = s 2 I = s 2 V 0 , and we suppose that V ̌ t − 1 = s 2 V t − 1 and μ ̌ t − 1 = s μ t − 1 hold. Then we must have β t x t T μ ̌ t − 1 = s β t x t T μ t − 1 , and g_t and h_t are not affected by scaling. Then we also have V ̌ t , t − 1 = β t 2 V ̌ t − 1 + s 2 ε t I = s 2 V t , t − 1 and ω ̌ t = s 2 ω t , so (43) can be written as

(123) μ ̌ t = β t μ ̌ t − 1 − V ̌ t , t − 1 x t s g t s 2 + h t ω ̌ t

(124) = β t s μ t − 1 − s 2 V t , t − 1 x t s g t s 2 + s 2 h t ω t = s μ t ,

and (44), as

(125) V ̌ t = V ̌ t , t − 1 − V ̌ t , t − 1 x t x t T V ̌ t , t − 1 h t s 2 + h t ω ̌ t

(126) = s 2 V t , t − 1 − s 4 V t , t − 1 x t x t T V t , t − 1 h t s 2 + s 2 h t ω t = s 2 V t .

This ends the proof for the KF algorithm. By extension, all other algorithms derived from the KF algorithm must satisfy the claims of Proposition 2, which may also be proven with the steps shown above applied to the vSKF, sSKF, and fSKF algorithms.

Appendix D: Models for multilevel games

The Davidson model (Davidson 1970; Szczecinski and Djebbi 2020) uses y_t = 0 (away win), and y_t = 1 (draw), and y_t = 2 (home win). The likelihood function and the corresponding function g(⋅; ⋅) and h(⋅; ⋅) are given by

(127) L ( z ; y t ) = F Dav ( − z ) y t = 0 κ F Dav ( − z ) F Dav ( z ) y t = 1 F Dav ( z ) y t = 2 ,

(128) g ( z ; y t ) = − 2 ⁡ ln ⁡ 10 y ̂ t − G Dav ( z )

(129) h ( z ; y t ) = ln ⁡ 10 2 κ 1 0 z + 4 + κ 1 0 − z ( 1 0 z + κ + 1 0 − z ) 2 ,

where y ̂ t = 1 2 y may be treated as the “score” of the game, and

(130) F Dav ( z ) = 1 0 z 1 0 − z + κ + 1 0 z ,

(131) G Dav ( z ) = 1 0 z + κ / 2 1 0 − z + κ + 1 0 z .

Note that by setting κ = 0, i.e., removing the possibility of draws, we obtain F_L(z) = G_Dav(z/2), and we recover the equations of the Bradley–Terry model with a halved scale. A simple but less obvious observation is that setting κ = 2, we obtain G_Dav(z) = F_L(z), and thus g(z; y_t) in (75) is half of (128). A direct consequence, observed in Szczecinski and Djebbi (2020), is that even if the Bradley–Terry and Davidson models are different, the algorithm which uses only the gradient (e.g., SG updates (62)) may be identical.

The Skellam model studied in Karlis and Ntzoufras (2008) and Lasek and Gagolewski (2020) models the goal difference y_t ∈ {…, −2, −1, 0, 1, 2, …} using the Skellam distribution (Karlis and Ntzoufras 2008, Section 2.2):

(132) L ( z ; y t ) = e − ( μ h + μ a ) μ h μ a y t / 2 I | y _ t | ( 2 μ h μ a ) ,

with I_v(⋅) being the modified Bessel function of order v and

(133) μ h = e c + z , μ a = e c − z ,

are means of the home- and away- goals (functions of the skills difference z), and c is a constant offset that should be fit to the data.

The negated log-likelihood is then given by

(134) ℓ ( z ; y t ) = ( μ h + μ a ) − y t z + C o n s t .

and we easily find the functions required by the Kalman rating algorithms:

(135) g ( z ; y t ) = − ( y t − F ̄ ( z ) ) ,

(136) F ̄ ( z ) = e c ( e z − e − z ) ,

(137) h ( z ; y t ) = e c ( e z + e − z ) .

Appendix E: Grid-search results

As complementary results, we show in Table 2 examples of the grid search that lead to the choice of the parameter (v₀, ɛ) in Table 1.

Table 2:

Grid-search: values of L S ̄ fi n a l obtained for different sets (v₀, ɛ) used in the vSKF rating based on the Davidson model in (a) NHL and (b) EPL. The optimal values of the search (shown in bold) are then used in Table 1. For the EPL we use ɛ = 0.

		(a)
		ɛ
			3.0 ⋅ 10⁻⁶	6.0 ⋅ 10⁻⁶	1.5 ⋅ 10⁻⁵	3.0 ⋅ 10⁻⁵	6.0 ⋅ 10⁻⁶	1.5 ⋅ 10⁻⁴
v ₀	1.5 ⋅ 10⁻³	1.0669	1.0663	1.0652	1.0645	1.0650	1.0695
	2.1 ⋅ 10⁻³	1.0658	1.0654	1.0647	1.0643	1.0651	1.0696
	3.0 ⋅ 10 ⁻³	1.0650	1.0647	1.0643	1.0643	1.0653	1.0698
	4.5 ⋅ 10⁻³	1.0646	1.0645	1.0644	1.0646	1.0657	1.0701
	6.0 ⋅ 10⁻³	1.0647	1.0646	1.0647	1.0650	1.0661	1.0703

		(b)
		ɛ
			10^–11	10^–10	10^–9	10^–9	10^–7	10^–6
v ₀	0.02	0.9761	0.9761	0.9761	0.9761	0.9761	0.9761
	0.03	0.9744	0.9744	0.9744	0.9744	0.9744	0.9744
	0.04	0.9738	0.9738	0.9738	0.9738	0.9738	0.9739
	0.06	0.9744	0.9744	0.9744	0.9744	0.9744	0.9744
	0.08	0.9754	0.9754	0.9754	0.9754	0.9754	0.9754

References

Agresti, A. 1992. “Analysis of Ordinal Paired Comparison Data.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 41: 287–97. https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2347562.10.2307/2347562Search in Google Scholar

Barber, D. 2012. Bayesian Reasoning and Machine Learning. New York: Cambridge University Press.10.1017/CBO9780511804779Search in Google Scholar

Bishop, C. 2006. Pattern Recognition and Machine Learning. Singapore: Springer.Search in Google Scholar

Boshnakov, G., T. Kharrat, and I. G. McHale. 2017. “A Bivariate Weibull Count Model for Forecasting Association Football Scores.” International Journal of Forecasting 33: 458–66. https://doi.org/10.1016/j.ijforecast.2016.11.006.Search in Google Scholar

Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: 1 the Method of Paired Comparisons.” Biometrika 39: 324–45. https://doi.org/10.2307/2334029.Search in Google Scholar

Cattelan, M. 2012. “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data.” Statistical Science 27: 412–33. https://doi.org/10.1214/12-sts396.Search in Google Scholar

David, H. 1963. The Method of Paired Comparison. Frome and London: Charles Griffin & Co. Ltd.Search in Google Scholar

Davidson, R. R. 1970. “On Extending the Bradley–Terry Model to Accommodate Ties in Paired Comparison Experiments.” Journal of the American Statistical Association 65: 317–28. https://doi.org/10.1080/01621459.1970.10481082.Search in Google Scholar

Elo, A. E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco Publishing Inc.Search in Google Scholar

eloratings.net. 2020. “World Football Elo Ratings.” https://www.eloratings.net/.Search in Google Scholar

Fahrmeir, L. 1992. “Posterior Mode Estimation by Extended Kalman Filtering for Multivariate Dynamic Generalized Linear Models.” Journal of the American Statistical Association 87: 501–9. https://doi.org/10.1080/01621459.1992.10475232.Search in Google Scholar

Fahrmeir, L., and G. Tutz. 1994. “Dynamic Stochastic Models for Time-dependent Ordered Paired Comparison Systems.” Journal of the American Statistical Association 89: 1438–49. https://doi.org/10.1093/biomet/39.3-4.324.Search in Google Scholar

FIDE. 2019. “International Chess Federation: Ratings Change Calculator.” https://ratings.fide.com/calculator_rtd.phtml.Search in Google Scholar

FIFA. 2018. “Revision of the FIFA/Coca-Cola World Ranking.” https://digitalhub.fifa.com/m/f99da4f73212220/original/edbm045h0udbwkqew35a-pdf.pdf.Search in Google Scholar

FIVB. 2020. “New Senior World Rankings.” https://www.fivb.com/en/volleyball/rankings.Search in Google Scholar

FiveThirtyEight. 2020. “How Our NFL Predictions Work.” https://fivethirtyeight.com/methodology/how-our-nfl-predictions-work/.Search in Google Scholar

Glickman, M. E. 1993. Paired Comparison Models with Time-Varying Parameters. PhD thesis. Harvard University.10.21236/ADA272016Search in Google Scholar

Glickman, M. E. 1995. “Chess Rating Systems.” American Chess Journal 3: 59–102.Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48: 377–94. https://doi.org/10.1111/1467-9876.00159.Search in Google Scholar

Goddard, J. 2005. “Regression Models for Forecasting Goals and Match Results in Association Football.” International Journal of Forecasting 21: 331–40. https://doi.org/10.1016/j.ijforecast.2004.08.002.Search in Google Scholar

Held, L., and R. Vollnhals. 2005. “Dynamic Rating of European Football Teams.” IMA Journal of Management Mathematics 16: 121–30. https://doi.org/10.1093%2Fimaman%2Fdpi004.10.1093/imaman/dpi004Search in Google Scholar

Herbrich, R., and T. Graepel. 2006. “TrueSkill(TM): A Bayesian Skill Rating System.” Technical report. https://www.microsoft.com/en-us/research/publication/trueskilltm-a-bayesian-skill-rating-system-2/.10.7551/mitpress/7503.003.0076Search in Google Scholar

Herbrich, R., T. Minka, and T. Graepel. 2008. “TrueSkill through Time: Revisiting the History of Chess.” In Advances in Neural Information Processing Systems 20, 931–8. MIT Press. https://www.microsoft.com/en-us/research/publication/trueskill-through-time-revisiting-the-history-of-chess/.Search in Google Scholar

Ingram, M. 2021. “How to Extend Elo: A Bayesian Perspective.” Journal of Quantitative Analysis in Sports 17: 203–19. https://doi.org/10.1515/jqas-2020-0066.Search in Google Scholar

Karlis, D., and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20: 133–45. https://doi.org/10.1093/imaman/dpn026.Search in Google Scholar

Király, F. J., and Z. Qian. 2017. “Modelling Competitive Sports: Bradley–Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes,” arXiv e-prints, arXiv:1701.08055.Search in Google Scholar

Knorr-Held, L. 2000. “Dynamic Rating of Sports Teams.” Journal of the Royal Statistical Society. Series D (The Statistician) 49: 261–76. https://doi.org/10.1111/1467-9884.00236.Search in Google Scholar

Koopman, S. J., and R. Lit. 2015. “A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League.” Journal of the Royal Statistical Society A 178: 167–86. https://doi.org/10.1111/rssa.12042.Search in Google Scholar

Koopman, S. J., and R. Lit. 2019. “Forecasting Football Match Results in National League Competitions Using Score-Driven Time Series Models.” International Journal of Forecasting 35: 797–809. https://doi.org/10.1016/j.ijforecast.2018.10.011.Search in Google Scholar

Kuk, A. Y. C. 1995. “Modelling Paired Comparison Data with Large Numbers of Draws and Large Variability of Draw Percentages Among Players.” Journal of the Royal Statistical Society. Series D (The Statistician) 44: 523–8. https://doi.org/10.2307/2348900.Search in Google Scholar

Lasek, J., and M. Gagolewski. 2020. “Interpretable Sports Team Rating Models Based on the Gradient Descent Algorithm.” International Journal of Forecasting 37: 1061–71. https://doi.org/10.1016/j.ijforecast.2020.11.008.Search in Google Scholar

Ley, C., T. Van de Wiele, and H. Van Eetvelde. 2019. “Ranking Soccer Teams on the Basis of Their Current Strength: A Comparison of Maximum Likelihood Approaches.” Statistical Modelling 19: 55–73. https://doi.org/10.1177/1471082X18817650.Search in Google Scholar

Maher, M. J. 1982. “Modelling Association Football Scores.” Statistica Neerlandica 36: 109–18. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9574.1982.tb00782.x.10.1111/j.1467-9574.1982.tb00782.xSearch in Google Scholar

Manderson, A. A., K. Murray, and B. A. Turlach. 2018. “Dynamic Bayesian Forecasting of AFL Match Results Using the Skellam Distribution.” Australian & New Zealand Journal of Statistics 60: 174–87. https://onlinelibrary.wiley.com/doi/abs/10.1111/anzs.12225.10.1111/anzs.12225Search in Google Scholar

Microsoft. 2005. “Trueskill Ratings System.” Technical report. https://www.microsoft.com/en-us/research/project/trueskill-ranking-system/.Search in Google Scholar

Moon, T. K., and W. C. Stirling. 2000. Mathematical Methods and Algorithms for Signal Processing. New Jersey: Prentice Hall.Search in Google Scholar

Paleologu, C., J. Benesty, and S. Ciochină. 2013. “Study of the General Kalman Filter for Echo Cancellation.” IEEE Transactions on Audio Speech and Language Processing 21: 1539–49. https://doi.org/10.1109/tasl.2013.2245654.Search in Google Scholar

Rao, P. V., and L. L. Kupper. 1967. “Ties in Paired-Comparison Experiments: A Generalization of the Bradley–Terry Model.” Journal of the American Statistical Association 62: 194–204. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1967.10482901.10.1080/01621459.1967.10482901Search in Google Scholar

Sayed, A. H. 2008. Adaptive Filters. Hoboken, New Jersey: John Wiley & Sons.Search in Google Scholar

Szczecinski, L. 2022. “G-elo: Generalization of the Elo Algorithm by Modeling the Discretized Margin of Victory.” Journal of Quantitative Analysis in Sports 18 (1): 1–14, https://doi.org/10.1515/jqas-2020-0115.Search in Google Scholar

Szczecinski, L., and A. Djebbi. 2020. “Understanding Draws in Elo Rating Algorithm.” https://www.degruyter.com/document/doi/10.1515/jqas-2019-0102/html.10.1515/jqas-2019-0102Search in Google Scholar

Thurston, L. L. 1927. “A Law of Comparative Judgement.” Psychological Review 34: 273–86. https://doi.org/10.1037/h0070288.Search in Google Scholar

Wheatcroft, E. 2020. “A Profitable Model for Predicting the Over/under Market in Football.” International Journal of Forecasting 36: 916–32. https://doi.org/10.1016/j.ijforecast.2019.11.001.Search in Google Scholar

Received: 2021-07-07

Accepted: 2023-06-05

Published Online: 2023-06-26

Published in Print: 2023-12-27

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/jqas-2021-0061

Keywords for this article

Elo rating; Glicko algorithm; Kalman filter; rating algorithms; TrueSkill algorithm