How to extend Elo: a Bayesian perspective

Martin Ingram

doi:10.1515/jqas-2020-0066

Artikel

How to extend Elo: a Bayesian perspective

Martin Ingram

Veröffentlicht/Copyright: 7. Januar 2021

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Quantitative Analysis in Sports Band 17 Heft 3

Abstract

The Elo rating system, originally designed for rating chess players, has since become a popular way to estimate competitors’ time-varying skills in many sports. Though the self-correcting Elo algorithm is simple and intuitive, it lacks a probabilistic justification which can make it hard to extend. In this paper, we present a simple connection between approximate Bayesian posterior mode estimation and Elo. We provide a novel justification of the approximations made by linking Elo to steady-state Kalman filtering. Our second key contribution is to observe that the derivation suggests a straightforward procedure for extending Elo. We use the procedure to derive versions of Elo incorporating margins of victory, correlated skills across different playing surfaces, and differing skills by tournament level in tennis. Combining all these extensions results in the most complete version of Elo presented for the sport yet. We evaluate the derived models on two seasons of men’s professional tennis matches (2018 and 2019). The best-performing model was able to predict matches with higher accuracy than both Elo and Glicko (65.8% compared to 63.7 and 63.5%, respectively) and a higher mean log-likelihood (−0.615 compared to −0.632 and −0.633, respectively), demonstrating the proposed model’s ability to improve predictions.

Corresponding author: Martin Ingram, University of Melbourne School of BioSciences, Parkville, Victoria, Australia, E-mail: martin.ingram@gmail.com

Funding source: Melbourne Research Scholarship

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research was supported by Melbourne Research Scholarship.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

A Details of the Taylor series expansion

We already noted that the update is given by:

(35) x = μ δ + t ′ ( μ δ ) − t ″ ( μ δ ) ,

and that

(36) t ′ ( δ ) = − δ − μ δ σ δ 2 + b ( 1 − γ ( b δ ) ) .

The second derivative t ″ ( δ ) is given by:

t ″ ( δ ) = − 1 σ δ 2 − b 2 γ ( b δ ) ( 1 − γ ( b δ ) ) .

And hence the update is:

x = μ δ + b ( 1 − γ ( μ δ b ) ) 1 σ δ 2 + b 2 γ ( μ δ b ) ( 1 − γ ( μ δ b ) ) = μ δ + 2 k ( 1 − γ ( b μ δ ) ) .

B Derivation of margin of victory update

The likelihood and prior are given, respectively, by:

(37) p ( δ ) = N ( δ | μ δ , σ δ 2 ) ,

(38) p ( y = 1 , s | δ ) = p ( y = 1 | δ )

(39) p ( s | y = 1 , δ ) = γ ( ( b δ ) ) N ( s | c 1 δ + c 2 , σ o b s 2 ) .

The log posterior is thus proportional to:

(40) t ( δ ) ∝ log γ ( b δ ) − 1 2 ( δ − μ δ σ δ 2 ) 2 − 1 2 ( s − ( c 1 δ + c 2 ) σ o b s 2 ) 2 .

Hence

(41) t ′ ( δ ) = b ( 1 − γ ( b δ ) ) − ( δ − μ δ σ δ 2 ) + c 1 ( s − ( c 1 δ + c 2 ) σ o b s 2 ) ,

and

(42) t ″ ( δ ) = − b 2 γ ( b δ ) ( 1 − γ ( b δ ) ) − 1 σ δ 2 − c 1 2 σ o b s 2 .

Evaluating these at the prior mean μ _δ leads to the update:

(43) x = μ δ + b ( 1 − γ ( b μ δ ) + c 1 σ o b s 2 ( s − ( c 1 μ δ + c 2 ) ) ) b 2 γ ( b μ δ ) ( 1 − γ ( b μ δ ) ) − 1 σ δ 2 − c 1 2 σ o b s 2 ,

(44) = μ δ + 2 k s h a r e d ( b ( 1 − γ ( b μ δ ) ) ) + 2 k s h a r e d c 1 σ o b s 2 ( s − s p r e d ) ,

with k s h a r e d and s p r e d defined as in the main text.

C Details of correlated skills derivation

C.1 Win/loss only

As stated in the main text, the log posterior is proportional to:

(45) t w i n ( θ ) ∝ − 1 2 ( θ − μ θ ) ⊺ ∑ θ − 1 ( θ − μ θ ) + log γ ( b a ⊺ θ ) .

Hence the Jacobian and Hessian functions are given, respectively, by:

(46) j win ( θ ) = − ∑ θ − 1 ( θ − μ θ ) + ( 1 − γ ( b a ⊺ θ ) ) b a ,

(47) H win ( θ ) = − Σ θ − 1 − γ ( b a ⊺ θ ) ( 1 − γ ( b a ⊺ θ ) ) b 2 a a ⊺ .

Evaluating these at the prior mean μ _θ yields:

(48) j win ( μ θ ) = ( 1 − γ ( b a ⊺ μ θ ) ) b a ,

(49) H win ( μ θ ) = − ∑ θ − 1 − γ ( b a ⊺ μ θ ) ( 1 − γ ( b a ⊺ μ θ ) ) b 2 a a ⊺ ,

as stated in the main text.

C.2 Including margin of victory

The log posterior including the margin likelihood is proportional to:

(50) t m a r g i n ( θ ) ∝ − 1 2 ( θ − μ θ ) ⊺ ∑ θ − 1 ( θ − μ θ ) + log γ ( b a ⊺ θ ) − 1 2 ( s − ( c 1 a ⊺ θ + c 2 ) σ o b s 2 ) 2 ,

(51) = t w i n ( θ ) − 1 2 ( s − ( c 1 a ⊺ θ + c 2 ) σ o b s 2 ) 2 ,

with t _win defined as in Eq. (45).

Hence the Jacobian and Hessian functions are given by:

(52) j margin ( θ ) = j win ( θ ) + c 1 σ o b s 2 ( s − ( c 1 a ⊺ θ + c 2 ) ) a ,

(53) H margin ( θ ) = H win ( θ ) − c 1 2 σ o b s 2 a a ⊺ .

Evaluating both at the prior mean μ _θ produces the equations stated in the main text.

D Comparison between derived update and Glicko

In (Glickman 1999), Elo is derived as a special case of a more general approximate Bayesian rating system. Here, we compare the equivalent Elo update derived there with the one in this paper.

Equation (13) in (Glickman 1999) recovers Elo as a special case of Glicko under the assumption that the opponents’ skills are known exactly. Considering only a single contest in a period and matching the notation with that used in this paper, the equivalent k becomes:

(54) k = b 1 / δ 2 + 1 / σ 2 ,

(55) w h e r e δ 2 = [ b 2 γ ( b μ δ ) ( 1 − γ ( b μ δ ) ) ] − 1 .

Since σ 2 = σ δ 2 / 2 , this k-factor is:

(56) k = b / 2 1 / σ δ 2 + ( b 2 / 2 ) γ ( b μ δ ) ( 1 − γ ( b μ δ ) ) .

This k-factor is almost exactly the same as the one given in Eq. (17), with the exception of the factor of 1/2 in the second term in the denominator. This factor may be due to the assumption that opponents’ skills are known exactly in the derivation.

E Derivation of approximate log marginal likelihood

The log marginal likelihood for a given match is:

(57) log p ( y = 1 , s ) = log p ( y = 1 ) + log p ( s | y = 1 ) =

(58) log [ ∫ p ( y = 1 | δ ) p ( δ ) d δ ] + log [ ∫ p ( s | y = 1 , δ ) p ( δ ) d δ ] = log [ ∫ γ ( b δ ) N ( δ | μ δ , σ δ 2 ) d δ ] +

(59) log [ ∫ N ( s | c 1 δ + c 2 , σ o b s 2 ) N ( δ | μ δ , σ δ 2 ) d δ ] .

The second integral is analytically tractable and reduces to the log density of s given a normal distribution with mean c 1 μ δ + c 2 and variance σ o b s 2 + c 1 2 σ δ 2 . The first integral is not analytically tractable but can be approximated well by γ ( b μ δ / α ) , where α = 1 + π σ δ 2 b 2 / 8 (Crooks 2009). The log marginal likelihood can thus be approximated by:

(60) log p ( y = 1 , s ) ≈ log γ ( b μ δ / α ) + log N ( s | c 1 μ δ + c 2 , σ o b s 2 + c 1 2 σ δ 2 ) ,

as stated in the main text.

F Quadrature estimate of the posterior mean

As stated in Eq. (13), the posterior distribution is proportional to:

(61) p ( δ | y = 1 ) ∝ p ( δ ) p ( y = 1 | δ ) = N ( δ | μ δ , σ δ 2 ) γ ( b δ ) .

Hence,

(62) p ( δ | y = 1 ) = c × N ( δ | μ δ , σ δ 2 ) γ ( b δ ) ,

where c is an unknown normalising constant.

We seek the posterior mean of δ, that is:

(63) E [ δ | y = 1 ] = ∫ δ p ( δ | y = 1 ) d δ = c ∫ δ N ( δ | μ δ , σ δ 2 ) γ ( b δ ) d δ .

We first compute c using Gauss-Hermite quadrature:

(64) 1 c = ∫ N ( δ | μ δ , σ δ 2 ) γ ( b δ ) d δ .

We then evaluate the integral in Eq. (63) using Gauss-Hermite quadrature a second time and obtain the final result by multiplying its value by c.

G Estimated Grand Slam additions

Table 8 shows the players with the largest and smallest estimated Grand Slam additions, as estimated at the end of the 2019 ATP season. Rafael Nadal has the largest estimated addition, adding around 52 rating points when playing at a Grand Slam. Stan Wawrinka also ranks highly which is consistent with his reputation as a “big match player”.^[7] On the other hand, Juan Monaco, Carlos Berlocq and Alexander Zverev are all estimated to perform worse at Grand Slams than at other events.

Table 8:

Grand Slam additions estimated at the end of the 2019 season.

(a) Players with the ten largest Grand Slam additions.	Slam+
Rafael Nadal	51.73
Fernando Verdasco	33.48
Novak Djokovic	33.19
Stan Wawrinka	32.76
Teimuraz Gabashvili	24.74
Dominic Thiem	22.64
John Millman	20.04
Diego Schwartzman	19.10
Kevin Anderson	18.66
Andy Murray	17.88
(b) Players with the ten smallest Grand Slam additions.	Slam+
Federico Delbonis	−13.55
Potito Starace	−13.77
Florian Mayer	−14.02
Gilles Muller	−14.24
Steve Johnson	−14.56
Jack Sock	−15.88
Pablo Cuevas	−17.69
Alexander Zverev	−22.56
Carlos Berlocq	−23.05
Juan Monaco	−23.63

H Equivalence of EKF and Newton-Raphson means

In this section, we show that the approach proposed in Section 2.5 leads to the same posterior mean estimate as an extended Kalman filter using a particular choice of measurement error covariance and observation function.

We first state the equations used to update the mean estimate in extended Kalman filtering. Writing the state estimate before the update step as N ( μ θ , Σ θ ) , they can be written as (Särkkä 2013, p. 70):

y ˜ = z − h ( μ θ ) , S = G Σ θ G ⊺ + R , K = Σ θ G ⊺ S − 1 , μ θ ′ = μ θ + K y ˜ ,

where z is the vector of observations, g( x ) is a non-linear function from the state vector to the observation vector, R is the measurement error covariance, and G is the Jacobian matrix of g evaluated at x = μ _θ, the prior mean. Note that the equations above are typically written with subscripts, e.g. y ˜ k , to denote the update at time step k, but we drop these since we consider only a single updating step.

To show the equivalence with our derived approach, we now apply these general equations to the win-only update. We set

g ( x ) = γ ( b a ⊺ x ) ,

which implies

G = γ ( b a ⊺ μ θ ) ( 1 − γ ( b a ⊺ μ θ ) ) b a ⊺ = η b a ⊺ ,

where we have defined the scalar quantity η = γ ( b a ⊺ μ θ ) ( 1 − γ ( b a ⊺ μ θ ) ) to simplify notation in the following. Please note that here, to be consistent with the Kalman filtering literature, the Jacobian is written as a row vector, while other Jacobians in this paper are written as column vectors.

The other equations thus become:

S = η 2 b 2 a ⊺ Σ θ a + R

and

K = η b Σ θ a ( η 2 b 2 a ⊺ Σ θ a + R ) − 1 ,

implying the following update to the mean:

(65) μ θ ′ = μ θ + η b Σ θ a ( η 2 b 2 a ⊺ Σ θ a + R ) − 1 ( 1 − γ ( b a ⊺ μ θ ) ) ,

where z has been set to 1, reflecting the winning outcome.

The approach derived in this paper in Section 2.5 has

j win ( μ θ ) = ( 1 − γ ( b a ⊺ μ θ ) ) b a , H win ( μ θ ) = − Σ θ − 1 − γ ( b a ⊺ μ θ ) ( 1 − γ ( b a ⊺ μ θ ) ) b 2 a a ⊺ ,

which are used in a single Newton-Raphson step:

μ θ ′ = μ θ − H − 1 ( μ θ ) j ( μ θ ) ,

yielding

(66) μ θ ′ = μ θ + ( Σ θ − 1 + η b 2 a a ⊺ ) − 1 ( 1 − γ ( b a ⊺ μ θ ) ) b a .

We thus see that for the two updates in Eqs. (65) and (66) to be equivalent, we must have

( Σ θ − 1 + η b 2 a a ⊺ ) − 1 a = η Σ θ a ( η 2 b 2 a ⊺ Σ θ a + R ) − 1 .

We rewrite the matrix inverse term on the left hand side using the Sherman-Morrison formula, yielding:

( Σ θ − 1 + η b 2 a a ⊺ ) − 1 = Σ θ + η b 2 Σ θ ( a ⊺ Σ θ a ) − η b 2 Σ θ a a ⊺ Σ θ η b 2 a ⊺ Σ θ a + 1 .

Multiplying this expression by a and making use of the associativity of matrix multiplication, the second and third terms in the numerator cancel, leaving

Σ θ a ( η b 2 a ⊺ Σ θ a + 1 ) − 1 = η Σ θ a ( η 2 b 2 a ⊺ Σ θ a + η ) − 1 ,

which is identical to the EKF update if

η = R = γ ( b a ⊺ μ θ ) ( 1 − γ ( b a ⊺ μ θ ) ) ,

which is the variance of a Bernoulli random variable with success probability γ ( b a ⊺ μ θ ) and thus is a reasonable choice for the measurement error variance.

References

Assimakis, N., and M. Adam. 2014. “Iterative and Algebraic Algorithms for the Computation of the Steady State Kalman Filter Gain.” International Scholarly Research Notices 2014: 417623, https://doi.org/10.1155/2014/417623.Suche in Google Scholar

Banfield, D., A. P. Ingersoll, and C. L. Keppenne. 1996. “A Steady-State Kalman Filter for Assimilating Data from a Single Polar Orbiting Satellite.” Journal of the Atmospheric Sciences 52: 737–53. https://doi.org/10.1175/1520-0469(1995)052<0737:ASSKFF>2.0.CO;2.10.1175/1520-0469(1995)052<0737:ASSKFF>2.0.CO;2Suche in Google Scholar

Boice, J. 2019. How Our MLB Predictions Work. Also available at https://fivethirtyeight.com/methodology/how-our-mlb-predictions-work/.Suche in Google Scholar

Bradbury, J., R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, and S. Wanderman-Milne. 2018. JAX: Composable Transformations of Python + NumPy Programs. Also available at http://github.com/google/jax.Suche in Google Scholar

Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: The Method of Paired Comparisons.” Biometrika 39: 324–45, https://doi.org/10.1093/biomet/39.3-4.324.Suche in Google Scholar

Carbone, J., T. Corke, and F. Moisiadis. 2016. “The Rugby League Prediction Model: Using an Elo-Based Approach to Predict the Outcome of National Rugby League (NRL) Matches.” International Educational Scientific Research Journal 2: 26–30, https://doi.org/10.21276/2455-295X.Suche in Google Scholar

Crooks, G. E. 2009. Logistic Approximation to the Logistic-Normal Integral. Technical note. available at https://threeplusone.com/pubs/on_logistic_normal.pdf.Suche in Google Scholar

Dangauthier, P., R. Herbrich, T. Minka, and T. Graepel. 2008. “Trueskill through Time: Revisiting the History of Chess.” In Advances in Neural Information Processing Systems, Vol. 20, 337–44. Red Hook, NY: Curran Associates, Inc. Also available at https://papers.nips.cc/paper/3331-trueskill-through-time-revisiting-the-history-of-chess.Suche in Google Scholar

Elo, A. E. 1978. The Rating of Chess Players, Past and Present. Arco Pub.Suche in Google Scholar

Fahrmeir, L., and G. Tutz. 1994. “Dynamic Stochastic Models for Time-Dependent Ordered Paired Comparison Systems.” Journal of the American Statistical Association 89: 1438–49, https://doi.org/10.1080/01621459.1994.10476882.Suche in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian Data Analysis, 3rd ed. Boca Raton, FL, USA: CRC Press. Also available at http://www.stat.columbia.edu/∼gelman/book/.10.1201/b16018Suche in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48: 377–94, https://doi.org/10.1111/1467-9876.00159.Suche in Google Scholar

Gneiting, T., and A. E. Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102: 359–78, https://doi.org/10.1198/016214506000001437.Suche in Google Scholar

Humpherys, J., P. Redd, and J. West. 2012. “A Fresh Look at the Kalman Filter.” SIAM Review 54: 801–23, https://doi.org/10.1137/100799666.Suche in Google Scholar

Hvattum, L. M., and H. Arntzen. 2010. “Using ELO Ratings for Match Result Prediction in Association Football.” International Journal of Forecasting 26: 460–70, https://doi.org/10.1016/j.ijforecast.2009.10.002.Suche in Google Scholar

Ingram, M. 2019. “A Point-Based Bayesian Hierarchical Model to Predict the Outcome of Tennis Matches.” Journal of Quantitative Analysis in Sports 15: 313–25, https://doi.org/10.1515/jqas-2018-0008.Suche in Google Scholar

Karlis, D., and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20: 133–45, https://doi.org/10.1093/imaman/dpn026.Suche in Google Scholar

Kovalchik, S. A. 2016. “Searching for the GOAT of Tennis Win Prediction.” Journal of Quantitative Analysis in Sports 12: 127–38, https://doi.org/10.1515/jqas-2015-0059.Suche in Google Scholar

Kovalchik, S. 2020. “Extension of the Elo Rating System to Margin of Victory.” International Journal of Forecasting 36: 1329–41. https://doi.org/10.1016/j.ijforecast.2020.01.006.Suche in Google Scholar

Kovalchik, S. A., and M. Ingram. 2018. “Estimating the Duration of Professional Tennis Matches for Varying Formats.” Journal of Quantitative Analysis in Sports 14: 13–23, https://doi.org/10.1515/jqas-2017-0077.Suche in Google Scholar

Mangan, S., and K. Collins. 2016. “A Rating System for Gaelic Football Teams: Factors that Influence Success.” International Journal of Computer Science in Sport 15: 78–90, https://doi.org/10.1515/ijcss-2016-0006.Suche in Google Scholar

Minka, T. P. 2001. “Expectation Propagation for Approximate Bayesian Inference.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 362–9.Suche in Google Scholar

Morris, B., C. Bialik, and J. Boice. 2016. How We’re Forecasting the 2016 U.S. Open. Also available at https://fivethirtyeight.com/features/how-were-forecasting-the-2016-us-open/.Suche in Google Scholar

Neumann, C., J. Duboscq, C. Dubuc, A. Ginting, A. M. Irwan, M. Agil, A. Widdig, and A. Engelhardt. 2011. “Assessing Dominance Hierarchies: Validation and Advantages of Progressive Evaluation with Elo-Rating.” Animal Behaviour 82: 911–21. https://doi.org/10.1016/j.anbehav.2011.07.016.Suche in Google Scholar

Särkkä, S. 2013. Bayesian Filtering and Smoothing. Cambridge, UK: Institute of Mathematical Statistics Textbooks, Cambridge University Press.10.1017/CBO9781139344203Suche in Google Scholar

Silver, N., J. Boice, and N. Paine. 2019. How Our NFL Predictions Work. Also available at https://fivethirtyeight.com/methodology/how-our-nfl-predictions-work/.Suche in Google Scholar

Sipko, M., and W. Knottenbelt. 2015. Machine Learning for the Prediction of Professional Tennis Matches. MEng Computing Final Year Project, Imperial College London, London, UK. Also available at https://www.doc.ic.ac.uk/teaching/distinguished-projects/2015/m.sipko.pdf.Suche in Google Scholar

Stefani, R. 2011. “The Methodology of Officially Recognized International Sports Rating Systems.” Journal of Quantitative Analysis in Sports 7: 10, https://doi.org/10.2202/1559-0410.1347.Suche in Google Scholar

Weng, R. C., and C.-J. Lin. 2011. “A Bayesian Approximation Method for Online Ranking.” Journal of Machine Learning Research 12: 267–300. Also available at http://jmlr.org/papers/v12/weng11a.html.Suche in Google Scholar

Wilson, K. C. 1972. “An Optimal Control Approach to Designing Constant Gain Filters.” IEEE Transactions on Aerospace and Electronic Systems AES-8: 836–42, https://doi.org/10.1109/taes.1972.309615.Suche in Google Scholar

Received: 2020-06-05

Accepted: 2020-12-09

Published Online: 2021-01-07

Published in Print: 2021-09-27

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/jqas-2020-0066

Schlagwörter für diesen Artikel

correlated skills; margin of victory; paired comparison modelling; tennis