Abstract
The Elo rating system, originally designed for rating chess players, has since become a popular way to estimate competitors’ time-varying skills in many sports. Though the self-correcting Elo algorithm is simple and intuitive, it lacks a probabilistic justification which can make it hard to extend. In this paper, we present a simple connection between approximate Bayesian posterior mode estimation and Elo. We provide a novel justification of the approximations made by linking Elo to steady-state Kalman filtering. Our second key contribution is to observe that the derivation suggests a straightforward procedure for extending Elo. We use the procedure to derive versions of Elo incorporating margins of victory, correlated skills across different playing surfaces, and differing skills by tournament level in tennis. Combining all these extensions results in the most complete version of Elo presented for the sport yet. We evaluate the derived models on two seasons of men’s professional tennis matches (2018 and 2019). The best-performing model was able to predict matches with higher accuracy than both Elo and Glicko (65.8% compared to 63.7 and 63.5%, respectively) and a higher mean log-likelihood (−0.615 compared to −0.632 and −0.633, respectively), demonstrating the proposed model’s ability to improve predictions.
Funding source: Melbourne Research Scholarship
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This research was supported by Melbourne Research Scholarship.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
A Details of the Taylor series expansion
We already noted that the update is given by:
and that
The second derivative
And hence the update is:
B Derivation of margin of victory update
The likelihood and prior are given, respectively, by:
The log posterior is thus proportional to:
Hence
and
Evaluating these at the prior mean μ δ leads to the update:
with
C Details of correlated skills derivation
C.1 Win/loss only
As stated in the main text, the log posterior is proportional to:
Hence the Jacobian and Hessian functions are given, respectively, by:
Evaluating these at the prior mean μ θ yields:
as stated in the main text.
C.2 Including margin of victory
The log posterior including the margin likelihood is proportional to:
with t win defined as in Eq. (45).
Hence the Jacobian and Hessian functions are given by:
Evaluating both at the prior mean μ θ produces the equations stated in the main text.
D Comparison between derived update and Glicko
In (Glickman 1999), Elo is derived as a special case of a more general approximate Bayesian rating system. Here, we compare the equivalent Elo update derived there with the one in this paper.
Equation (13) in (Glickman 1999) recovers Elo as a special case of Glicko under the assumption that the opponents’ skills are known exactly. Considering only a single contest in a period and matching the notation with that used in this paper, the equivalent k becomes:
Since
This k-factor is almost exactly the same as the one given in Eq. (17), with the exception of the factor of 1/2 in the second term in the denominator. This factor may be due to the assumption that opponents’ skills are known exactly in the derivation.
E Derivation of approximate log marginal likelihood
The log marginal likelihood for a given match is:
The second integral is analytically tractable and reduces to the log density of s given a normal distribution with mean
as stated in the main text.
F Quadrature estimate of the posterior mean
As stated in Eq. (13), the posterior distribution is proportional to:
Hence,
where c is an unknown normalising constant.
We seek the posterior mean of δ, that is:
We first compute c using Gauss-Hermite quadrature:
We then evaluate the integral in Eq. (63) using Gauss-Hermite quadrature a second time and obtain the final result by multiplying its value by c.
G Estimated Grand Slam additions
Table 8 shows the players with the largest and smallest estimated Grand Slam additions, as estimated at the end of the 2019 ATP season. Rafael Nadal has the largest estimated addition, adding around 52 rating points when playing at a Grand Slam. Stan Wawrinka also ranks highly which is consistent with his reputation as a “big match player”.[7] On the other hand, Juan Monaco, Carlos Berlocq and Alexander Zverev are all estimated to perform worse at Grand Slams than at other events.
Grand Slam additions estimated at the end of the 2019 season.
(a) Players with the ten largest Grand Slam additions. | Slam+ |
---|---|
Rafael Nadal | 51.73 |
Fernando Verdasco | 33.48 |
Novak Djokovic | 33.19 |
Stan Wawrinka | 32.76 |
Teimuraz Gabashvili | 24.74 |
Dominic Thiem | 22.64 |
John Millman | 20.04 |
Diego Schwartzman | 19.10 |
Kevin Anderson | 18.66 |
Andy Murray | 17.88 |
(b) Players with the ten smallest Grand Slam additions. | Slam+ |
Federico Delbonis | −13.55 |
Potito Starace | −13.77 |
Florian Mayer | −14.02 |
Gilles Muller | −14.24 |
Steve Johnson | −14.56 |
Jack Sock | −15.88 |
Pablo Cuevas | −17.69 |
Alexander Zverev | −22.56 |
Carlos Berlocq | −23.05 |
Juan Monaco | −23.63 |
H Equivalence of EKF and Newton-Raphson means
In this section, we show that the approach proposed in Section 2.5 leads to the same posterior mean estimate as an extended Kalman filter using a particular choice of measurement error covariance and observation function.
We first state the equations used to update the mean estimate in extended Kalman filtering. Writing the state estimate before the update step as
where
z
is the vector of observations, g(
x
) is a non-linear function from the state vector to the observation vector,
R
is the measurement error covariance, and
G
is the Jacobian matrix of g evaluated at
x
=
μ
θ
, the prior mean. Note that the equations above are typically written with subscripts, e.g.
To show the equivalence with our derived approach, we now apply these general equations to the win-only update. We set
which implies
where we have defined the scalar quantity
The other equations thus become:
and
implying the following update to the mean:
where z has been set to 1, reflecting the winning outcome.
The approach derived in this paper in Section 2.5 has
which are used in a single Newton-Raphson step:
yielding
We thus see that for the two updates in Eqs. (65) and (66) to be equivalent, we must have
We rewrite the matrix inverse term on the left hand side using the Sherman-Morrison formula, yielding:
Multiplying this expression by a and making use of the associativity of matrix multiplication, the second and third terms in the numerator cancel, leaving
which is identical to the EKF update if
which is the variance of a Bernoulli random variable with success probability
References
Assimakis, N., and M. Adam. 2014. “Iterative and Algebraic Algorithms for the Computation of the Steady State Kalman Filter Gain.” International Scholarly Research Notices 2014: 417623, https://doi.org/10.1155/2014/417623.Suche in Google Scholar
Banfield, D., A. P. Ingersoll, and C. L. Keppenne. 1996. “A Steady-State Kalman Filter for Assimilating Data from a Single Polar Orbiting Satellite.” Journal of the Atmospheric Sciences 52: 737–53. https://doi.org/10.1175/1520-0469(1995)052<0737:ASSKFF>2.0.CO;2.10.1175/1520-0469(1995)052<0737:ASSKFF>2.0.CO;2Suche in Google Scholar
Boice, J. 2019. How Our MLB Predictions Work. Also available at https://fivethirtyeight.com/methodology/how-our-mlb-predictions-work/.Suche in Google Scholar
Bradbury, J., R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, and S. Wanderman-Milne. 2018. JAX: Composable Transformations of Python + NumPy Programs. Also available at http://github.com/google/jax.Suche in Google Scholar
Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: The Method of Paired Comparisons.” Biometrika 39: 324–45, https://doi.org/10.1093/biomet/39.3-4.324.Suche in Google Scholar
Carbone, J., T. Corke, and F. Moisiadis. 2016. “The Rugby League Prediction Model: Using an Elo-Based Approach to Predict the Outcome of National Rugby League (NRL) Matches.” International Educational Scientific Research Journal 2: 26–30, https://doi.org/10.21276/2455-295X.Suche in Google Scholar
Crooks, G. E. 2009. Logistic Approximation to the Logistic-Normal Integral. Technical note. available at https://threeplusone.com/pubs/on_logistic_normal.pdf.Suche in Google Scholar
Dangauthier, P., R. Herbrich, T. Minka, and T. Graepel. 2008. “Trueskill through Time: Revisiting the History of Chess.” In Advances in Neural Information Processing Systems, Vol. 20, 337–44. Red Hook, NY: Curran Associates, Inc. Also available at https://papers.nips.cc/paper/3331-trueskill-through-time-revisiting-the-history-of-chess.Suche in Google Scholar
Elo, A. E. 1978. The Rating of Chess Players, Past and Present. Arco Pub.Suche in Google Scholar
Fahrmeir, L., and G. Tutz. 1994. “Dynamic Stochastic Models for Time-Dependent Ordered Paired Comparison Systems.” Journal of the American Statistical Association 89: 1438–49, https://doi.org/10.1080/01621459.1994.10476882.Suche in Google Scholar
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian Data Analysis, 3rd ed. Boca Raton, FL, USA: CRC Press. Also available at http://www.stat.columbia.edu/∼gelman/book/.10.1201/b16018Suche in Google Scholar
Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48: 377–94, https://doi.org/10.1111/1467-9876.00159.Suche in Google Scholar
Gneiting, T., and A. E. Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102: 359–78, https://doi.org/10.1198/016214506000001437.Suche in Google Scholar
Humpherys, J., P. Redd, and J. West. 2012. “A Fresh Look at the Kalman Filter.” SIAM Review 54: 801–23, https://doi.org/10.1137/100799666.Suche in Google Scholar
Hvattum, L. M., and H. Arntzen. 2010. “Using ELO Ratings for Match Result Prediction in Association Football.” International Journal of Forecasting 26: 460–70, https://doi.org/10.1016/j.ijforecast.2009.10.002.Suche in Google Scholar
Ingram, M. 2019. “A Point-Based Bayesian Hierarchical Model to Predict the Outcome of Tennis Matches.” Journal of Quantitative Analysis in Sports 15: 313–25, https://doi.org/10.1515/jqas-2018-0008.Suche in Google Scholar
Karlis, D., and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20: 133–45, https://doi.org/10.1093/imaman/dpn026.Suche in Google Scholar
Kovalchik, S. A. 2016. “Searching for the GOAT of Tennis Win Prediction.” Journal of Quantitative Analysis in Sports 12: 127–38, https://doi.org/10.1515/jqas-2015-0059.Suche in Google Scholar
Kovalchik, S. 2020. “Extension of the Elo Rating System to Margin of Victory.” International Journal of Forecasting 36: 1329–41. https://doi.org/10.1016/j.ijforecast.2020.01.006.Suche in Google Scholar
Kovalchik, S. A., and M. Ingram. 2018. “Estimating the Duration of Professional Tennis Matches for Varying Formats.” Journal of Quantitative Analysis in Sports 14: 13–23, https://doi.org/10.1515/jqas-2017-0077.Suche in Google Scholar
Mangan, S., and K. Collins. 2016. “A Rating System for Gaelic Football Teams: Factors that Influence Success.” International Journal of Computer Science in Sport 15: 78–90, https://doi.org/10.1515/ijcss-2016-0006.Suche in Google Scholar
Minka, T. P. 2001. “Expectation Propagation for Approximate Bayesian Inference.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 362–9.Suche in Google Scholar
Morris, B., C. Bialik, and J. Boice. 2016. How We’re Forecasting the 2016 U.S. Open. Also available at https://fivethirtyeight.com/features/how-were-forecasting-the-2016-us-open/.Suche in Google Scholar
Neumann, C., J. Duboscq, C. Dubuc, A. Ginting, A. M. Irwan, M. Agil, A. Widdig, and A. Engelhardt. 2011. “Assessing Dominance Hierarchies: Validation and Advantages of Progressive Evaluation with Elo-Rating.” Animal Behaviour 82: 911–21. https://doi.org/10.1016/j.anbehav.2011.07.016.Suche in Google Scholar
Särkkä, S. 2013. Bayesian Filtering and Smoothing. Cambridge, UK: Institute of Mathematical Statistics Textbooks, Cambridge University Press.10.1017/CBO9781139344203Suche in Google Scholar
Silver, N., J. Boice, and N. Paine. 2019. How Our NFL Predictions Work. Also available at https://fivethirtyeight.com/methodology/how-our-nfl-predictions-work/.Suche in Google Scholar
Sipko, M., and W. Knottenbelt. 2015. Machine Learning for the Prediction of Professional Tennis Matches. MEng Computing Final Year Project, Imperial College London, London, UK. Also available at https://www.doc.ic.ac.uk/teaching/distinguished-projects/2015/m.sipko.pdf.Suche in Google Scholar
Stefani, R. 2011. “The Methodology of Officially Recognized International Sports Rating Systems.” Journal of Quantitative Analysis in Sports 7: 10, https://doi.org/10.2202/1559-0410.1347.Suche in Google Scholar
Weng, R. C., and C.-J. Lin. 2011. “A Bayesian Approximation Method for Online Ranking.” Journal of Machine Learning Research 12: 267–300. Also available at http://jmlr.org/papers/v12/weng11a.html.Suche in Google Scholar
Wilson, K. C. 1972. “An Optimal Control Approach to Designing Constant Gain Filters.” IEEE Transactions on Aerospace and Electronic Systems AES-8: 836–42, https://doi.org/10.1109/taes.1972.309615.Suche in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research articles
- Winning and losing streaks in the National Hockey League: are teams experiencing momentum or are games a sequence of random events?
- The middle-seed anomaly: why does it occur in some sports tournaments but not others?
- A Skellam regression model for quantifying positional value in soccer
- How to extend Elo: a Bayesian perspective
- A mixed effects multinomial logistic-normal model for forecasting baseball performance
- Distributed lag models to identify the cumulative effects of training and recovery in athletes using multivariate ordinal wellness data
Artikel in diesem Heft
- Frontmatter
- Research articles
- Winning and losing streaks in the National Hockey League: are teams experiencing momentum or are games a sequence of random events?
- The middle-seed anomaly: why does it occur in some sports tournaments but not others?
- A Skellam regression model for quantifying positional value in soccer
- How to extend Elo: a Bayesian perspective
- A mixed effects multinomial logistic-normal model for forecasting baseball performance
- Distributed lag models to identify the cumulative effects of training and recovery in athletes using multivariate ordinal wellness data