Leveraging minute-by-minute soccer match event data to adjust Team’s offensive production for game context

Andrey V. Skripnikov; Ahmet Cemek; David Gillman

doi:10.1515/jqas-2024-0162

Article

Leveraging minute-by-minute soccer match event data to adjust Team’s offensive production for game context

Andrey V. Skripnikov , Ahmet Cemek and David Gillman

Published/Copyright: August 25, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports

Abstract

In soccer, game context can result in skewing offensive statistics in ways that might misrepresent how well a team has played. For instance, in England’s 1–2 loss to France in the 2022 FIFA World Cup quarterfinal, England attempted considerably more shots (16 to France’s 8) and more corners (5–2), potentially suggesting they played better despite the loss. However, these statistics were largely accumulated when France was ahead and more willing to concede offensive initiative to England. To explore how game context influences offensive performance, we analyze minute-by-minute event-sequenced match data from 15 seasons across five major European leagues. Using count-response Generalized Additive Modeling, we consider features such as score and red card differential, home/away status, pre-match win probabilities, and game minute. Moreover, we leverage interaction terms to test several intuitive hypotheses about how these features might cooperate in explaining offensive production. The selected model is then applied to project offensive statistics onto a standardized “common denominator” scenario: a tied home game with even men on both sides. The adjusted numbers – in contrast to regular game totals that disregard game context – offer a more contextualized comparison, reducing the likelihood of misrepresenting the relative quality of play.

Keywords: generalized additive models; model selection; Negative Binomial; sports analytics

Corresponding author: Andrey V. Skripnikov, Department of Natural Science, New College of Florida, Sarasota, FL, USA, E-mail: askripnikov@ncf.edu

Acknowledgments

The authors are grateful to the host institution for providing summer research funding.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: The authors used ChatGPT to edit the text for clarity, grammar, syntax and flow, but made sure to subsequently review the text themselves and confirm the actual meaning was preserved.
Conflict of interest: Authors of this work confirm that there are no known conflicts of interest to disclose.
Research funding: None declared.
Data availability: Will be made publicly available via Github.

References

Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In: Selected papers of hirotugu akaike. Springer, New York, pp. 199–213.10.1007/978-1-4612-1694-0_15Search in Google Scholar

Boshnakov, G., Kharrat, T., and McHale, I. G. (2017). A bivariate Weibull count model for forecasting association football scores. Int. J. Forecast. 33: 458–466, https://doi.org/10.1016/j.ijforecast.2016.11.006.Search in Google Scholar

Cefis, M. and Carpita, M. (2025). A new xG model for football analytics. J. Oper. Res. Soc. 76: 1–13, https://doi.org/10.1080/01605682.2024.2323669.Search in Google Scholar

Cefis, M. (2022). Football analytics: a bibliometric study about the last decade contributions. Electron. J. Appl. Stat. Anal. 15: 232–248.Search in Google Scholar

Edwards, J. and Archambault, D. (1979). The home field advantage. In: Sports, games, and play: Social and psychological viewpoints. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 409–438.Search in Google Scholar

ESPN: France 2-1 England Commentary . (2024). ESPN. https://www.espn.com/soccer/commentary/_/gameId/633846 (Accessed 29 April 2024).Search in Google Scholar

FBref. (2024). Expected Goals (xG) Model Explained. https://fbref.com/en/expected-goals-model-explained/ (Accessed 10 October 2024).Search in Google Scholar

Hartig, F. and Hartig, M. F. (2017). Package ‘dharma’. R Package 531: 532.Search in Google Scholar

Harville, D. (1977). The use of linear-model methodology to rate high school or college football teams. J. Am. Stat. Assoc. 72: 278–289, https://doi.org/10.2307/2286789.Search in Google Scholar

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat.: 65–70.Search in Google Scholar

Kempton, T., Kennedy, N., and Coutts, A. J. (2016). The expected value of possession in professional rugby league match-play. J. Sports Sci. 34: 645–650, https://doi.org/10.1080/02640414.2015.1066511.Search in Google Scholar PubMed

Lopez, M. J., Matthews, G. J., and Baumer, B. S. (2018). How often does the best team win? A unified approach to understanding randomness in North American sport. Ann. Appl. Stat. 12: 2483–2516, https://doi.org/10.1214/18-aoas1165.Search in Google Scholar

Macdonald, B. (2012). An expected goals model for evaluating NHL teams and players. In: Proceedings of the 2012 mit sloan sports analytics conference.Search in Google Scholar

McCullagh, P. (2019). Generalized linear models. Routledge, New York.10.1201/9780203753736Search in Google Scholar

Mead, J., O’Hare, A., and Paul, M. (2023). Expected goals in football: Improving model performance and demonstrating value. In: PLOS ONE 18.4. Public Library of Science, pp. e0282295, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0282295 (Accessed 29 April 2024).10.1371/journal.pone.0282295Search in Google Scholar PubMed PubMed Central

Schwarz, G. (1978). Estimating the dimension of a model. Ann. Stat.: 461–464, https://doi.org/10.1214/aos/1176344136.Search in Google Scholar

Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika 52: 591–611, https://doi.org/10.2307/2333709.Search in Google Scholar

Sosa, F. G. R. (2015). La identidad del italiano en la evolución del catenaccio. Impetus 9: 135–142, https://doi.org/10.22579/20114680.149.Search in Google Scholar

StatsBomb. (2024a). Examples of Possession Value Models. https://statsbomb.com/soccer-metrics/possession-value-models-explained/ (Accessed 10 October 2024).Search in Google Scholar

StatsBomb. (2024b). xPass 360: Upgrading Expected Pass Models. https://statsbomb.com/articles/soccer/xpass-360-upgrading-expected-pass-xpass-models/ (Accessed 10 October 2024).Search in Google Scholar

The Selenium Browser Automation Documentation . (2024). Selenium. https://www.selenium.dev/documentation/ (Accessed 21 June 2024).Search in Google Scholar

Trequattrini, R., Del Giudice, M., Cuozzo, B., and Palmaccio, M. (2016). Does sport innovation create value? The case of professional football clubs. Technol., Innovat. Educ. 2: 1–15, https://doi.org/10.1186/s40660-016-0017-1.Search in Google Scholar

Van Roy, M., Robberechts, P., Decroos, T., and Davis, J. (2020). Valuing on-the-ball actions in soccer: a critical comparison of XT and VAEP. In: Proceedings of the 2020 AAAI Workshop on AI in Team Sports. AAAI (Association for the Advancement of Artificial Intelligence), pp. 1–8.Search in Google Scholar

Whitmore, J. (2023). What is Expected Goals? The Analyst. Available at: https://theanalyst.com/2023/08/what-is-expected-goals-xg/.Search in Google Scholar

Wood, S. N. (2017). Generalized additive models: an introduction with R. Chapman and Hall/CRC, Boca Raton, FL.10.1201/9781315370279Search in Google Scholar

Yurko, R., Ventura, S., and Horowitz, M. (2019). nflWAR: a reproducible method for offensive player evaluation in football. J. Quant. Anal. Sports 15: 163–183, https://doi.org/10.1515/jqas-2018-0010.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/jqas-2024-0162).

Received: 2024-11-14

Accepted: 2025-07-28

Published Online: 2025-08-25

You are currently not able to access this content.

Supplementary Material Details

https://doi.org/10.1515/jqas-2024-0162

Keywords for this article

generalized additive models; model selection; Negative Binomial; sports analytics