Startseite Expected goals under a Bayesian viewpoint: uncertainty quantification and online learning
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Expected goals under a Bayesian viewpoint: uncertainty quantification and online learning

  • Bernardo Nipoti ORCID logo und Lorenzo Schiavon ORCID logo EMAIL logo
Veröffentlicht/Copyright: 4. November 2024
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

While the use of expected goals (xG) as a metric for assessing soccer performance is increasingly prevalent, the uncertainty associated with their estimates is often overlooked. This work bridges this gap by providing easy-to-implement methods for uncertainty quantification in xG estimates derived from Bayesian models. Based on a convenient posterior approximation, we devise an online prior-to-posterior update scheme, aligning with the typical in-season model training in soccer. Additionally, we present a novel framework to assess and compare the performance dynamics of two teams during a match, while accounting for evolving match scores. Our approach is well-suited for graphical representation and improves interpretability. We validate the accuracy of our methods through simulations, and provide a real-world illustration using data from the Italian Serie A league.


Corresponding author: Lorenzo Schiavon, Department of Economics, Ca’ Foscari University of Venice, Venice, Italy, E-mail: 

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: Bernardo Nipoti acknowledges support of MUR - Prin 2022 - Grant no. 2022CLTYP4, funded by the European Union – Next Generation EU.

  7. Data availability: The simulated datasets used in this study are available from the corresponding author upon reasonable request.

Appendix A

A closed-form approximation of the credible intervals for p ̃ l ( t ) can be recovered by exploiting a multivariate Gaussian approximation as in (4) and a suitable approximating function f l , t l , r 1 ( t ) for p l , t l , r 1 ( t ) , which depends on a linear transformation η ̃ l , r of the elements η l , i = x ̃ l , i β , for i such that tl,i ∈ (tl,r−1, tl,r).

In logistic regression, for instance, one may approximate the function p l , t l , r 1 ( t ) from below by using

f l , t l , r 1 ( t ) = max t * ( t l , r 1 , t ) 1 k = 0 n l t l , r 1 , t * n l t l , r 1 , t * k exp k η ̃ l , r ( t * ) n t l , r 1 , t * 1 ,

with η ̃ l , r ( t * ) = i : t l , i t l , r 1 , t * x ̃ l , i β and where n l t l , r 1 , t * is the number of shots of the lth team in the interval t l , r 1 , t * . The precision of such an approximation improves as the set of shots in the interval (tl,r−1, t) is homogeneous with respect to the values of ηl,i. This fact is often observed in the case of soccer matches, where the number of shots in an interval is relatively small and with similar low values of the linear predictor.

We rely on the multivariate Gaussian approximation for the posterior of the vector η resulting at the end of the online learning procedure, specifying η ̂ l , i = x ̃ l , i β ̂ , and s l , i 2 = x ̃ l , i S β x ̃ l , i , with β ̂ and S β being the estimated posterior mean and variance of β after T iterations. Then, the posterior distribution of η ̃ l , r is approximated by the Gaussian distribution

η ̃ l , r N i : t l , i ( t l , r 1 , t l , r ) η ̂ l , i , i : t l , i ( t l , r 1 , t l , r ) s l , i 2 + i < j : t l , i ( t l , r 1 , t l , r ) , t l , j ( t l , r 1 , t l , r ) 2 c i , j ,

where c i , j = x ̃ l , i S β x ̃ l , j is the approximate covariance between the shots i and j of the lth team. Since f l , t l , r 1 ( t ) is a monotonically decreasing function with respect to η ̃ l , r , approximate (1 − α)100 % credible intervals for p l , t l , r 1 ( t ) at time tl,r can be obtained by means of an element-wise transformation of the boundaries of the (1 − α)100 % approximate credible interval for η ̃ l ( t l , r 1 , t l , r ) . Then, the derivation of credible intervals for p ̃ l ( t ) is straightforward. The comparison displayed in Figure 7 is replicated in Figure 8, with the approximate shifted cumulative scoring probability displaying the closed-form approximation discussed in this Appendix.

Figure 8: 
Serie A league dataset. Comparison between the shifted cumulative scoring probability functions for Juventus in the match Genoa-Juventus, played on the 27th of November 2016, computed via Gibbs sampler and by applying a closed-form approximation. Shaded areas denote 95 % credible intervals.
Figure 8:

Serie A league dataset. Comparison between the shifted cumulative scoring probability functions for Juventus in the match Genoa-Juventus, played on the 27th of November 2016, computed via Gibbs sampler and by applying a closed-form approximation. Shaded areas denote 95 % credible intervals.

References

Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88: 669–679. https://doi.org/10.2307/2290350.Suche in Google Scholar

Baumer, B.S., Matthews, G.J., and Nguyen, Q. (2023). Big ideas in sports analytics and statistical tools for their investigation. Wiley Interdiscip. Rev. Comput. Stat. 15: e1612, https://doi.org/10.1002/wics.1612.Suche in Google Scholar

Cavus, M. and Biecek, P. (2022). Explainable expected goal models for performance analysis in football analytics. In: 2022 IEEE 9th international conference on data science and advanced analytics (DSAA). IEEE, pp. 1–9.10.1109/DSAA54385.2022.10032440Suche in Google Scholar

Durante, D. (2019). Conjugate Bayes for probit regression via unified skew-normal distributions. Biometrika 106: 765–779. https://doi.org/10.1093/biomet/asz034.Suche in Google Scholar

Hewitt, J.H. and Karakuş, O. (2023). A machine learning approach for player and position adjusted expected goals in football (soccer). Franklin Open 4: 100034. https://doi.org/10.1016/j.fraope.2023.100034.Suche in Google Scholar

Itti, L. and Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Res. 49: 1295–1306. https://doi.org/10.1016/j.visres.2008.09.007.Suche in Google Scholar PubMed PubMed Central

Lambert, M., Bonnabel, S., and Bach, F. (2022). The recursive variational Gaussian approximation (r-vga). Stat. Comput. 32: 10. https://doi.org/10.1007/s11222-021-10068-w.Suche in Google Scholar

Macdonald, B. (2012). An expected goals model for evaluating NHL teams and players. In: Proceedings of the 2012 MIT sloan sports analytics conference.Suche in Google Scholar

Mead, J., O’Hare, A., and McMenemy, P. (2023). Expected goals in football: improving model performance and demonstrating value. PLoS One 18: e0282295. https://doi.org/10.1371/journal.pone.0282295.Suche in Google Scholar PubMed PubMed Central

Mortelier, A., Rioult, F., and Komar, J. (2023). What data should be collected for a good handball expected goal model? In: International workshop on machine learning and data mining for sports analytics. Springer, pp. 119–130.10.1007/978-3-031-53833-9_10Suche in Google Scholar

Pollard, R. and Reep, C. (1997). Measuring the effectiveness of playing strategies at soccer. J. R. Stat. Soc. Ser. D: Stat. 46: 541–550. https://doi.org/10.1111/1467-9884.00108.Suche in Google Scholar

Polson, N.G., Scott, J.G., and Windle, J. (2013). Bayesian inference for logistic models using Pólya–gamma latent variables. J. Am. Stat. Assoc. 108: 1339–1349. https://doi.org/10.1080/01621459.2013.829001.Suche in Google Scholar

Rathke, A. (2017). An examination of expected goals and shot efficiency in soccer. J. Hum. Sport Exerc. 12: 514–529. https://doi.org/10.14198/jhse.2017.12.proc2.05.Suche in Google Scholar

Robert, C. and Casella, G. (2004). Monte Carlo statistical methods. Springer Verlag, New York.10.1007/978-1-4757-4145-2Suche in Google Scholar

Santos-Fernandez, E., Wu, P., and Mengersen, K.L. (2019). Bayesian statistics meets sports: a comprehensive review. J. Quant. Anal. Sports 15: 289–312. https://doi.org/10.1515/jqas-2018-0106.Suche in Google Scholar

Schiavon, L. and Sartori, N. (2019). Bias reduced estimation of a fixed effects model for expected goals in association football. In: Arbia, G., Peluso, S., Pini, A., and Rivellini, G. (Eds.), Smart statistics for smart application. Pearson, London.Suche in Google Scholar

Scholtes, A. and Karakuş, O. (2024). Bayes-xG: player and position correction on expected goals (xG) using Bayesian hierarchical approach. Front. Sports Act. Living 6: 1348983, https://doi.org/10.3389/fspor.2024.1348983.Suche in Google Scholar PubMed PubMed Central

Spiegelhalter, D.J. and Lauritzen, S.L. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks 20: 579–605. https://doi.org/10.1002/net.3230200507.Suche in Google Scholar

Tadesse, M.G. and Vannucci, M. (2021). Handbook of Bayesian variable selection. Chapman and Hall/CRC, New York.10.1201/9781003089018Suche in Google Scholar

Received: 2024-05-24
Accepted: 2024-10-09
Published Online: 2024-11-04
Published in Print: 2025-03-26

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 1.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jqas-2024-0081/html?lang=de
Button zum nach oben scrollen