Abstract
Soccer is undeniably the most popular sport world-wide and everyone from general managers and coaching staff to fans and media are interested in evaluating players’ performance. Metrics applied successfully in other sports, such as the (adjusted) +/− that allows for division of credit among a basketball team’s players, exhibit several challenges when applied to soccer due to severe co-linearities. Recently, a number of player evaluation metrics have been developed utilizing optical tracking data, but they are based on proprietary data. In this work, our objective is to develop an open framework that can estimate the expected contribution of a soccer player to his team’s winning chances using publicly available data. In particular, using data from (i) approximately 20,000 games from 11 European leagues over eight seasons, and, (ii) player ratings from the FIFA video game, we estimate through a Skellam regression model the importance of every line (attackers, midfielders, defenders and goalkeeping) in winning a soccer game. We consequently translate the model to expected league points added above a replacement player (eLPAR). This model can further be used as a guide for allocating a team’s salary budget to players based on their expected contributions on the pitch. We showcase similar applications using annual salary data from the English Premier League and identify evidence that in our dataset the market appears to under-value defensive line players relative to goalkeepers.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
Bornn, L., D. Cervone, and J. Fernandez. 2018. “Soccer Analytics: Unravelling the Complexity of the Beautiful Game.” Significance 15: 26–9, https://doi.org/10.1111/j.1740-9713.2018.01146.x.Search in Google Scholar
Boshnakov, G., T. Kharrat, and I. G. McHale. 2017. “A Bivariate Weibull Count Model for Forecasting Association Football Scores.” International Journal of Forecasting 33: 458–66, https://doi.org/10.1016/j.ijforecast.2016.11.006.Search in Google Scholar
Cotta, L., P. de Melo, F. Benevenuto, and A. A. Loureiro. 2016. “Using FIFA Soccer Video Game Data for Soccer Analytics.” In Workshop on Large Scale Sports Analytics.Search in Google Scholar
Decroos, T., L. Bransen, J. V. Haaren, and J. Davis. 2019. Actions Speak Louder Than Goals: Valuing Player Actions in Soccer. NY, United States: ACM SIGKDD.10.1145/3292500.3330758Search in Google Scholar
Economist. 2018. How GPS Tracking is Changing Football. https://www.1843magazine.com/technology/how-gps-tracking-is-changing-football.Search in Google Scholar
Fairchild, A., K. Pelechrinis, and M. Kokkodis. 2018. “Spatial Analysis of Shots in MLS: A Model for Expected Goals and Fractal Dimensionality.” Journal of Sports Analytics 4: 165–74, https://doi.org/10.3233/jsa-170207.Search in Google Scholar
Fernandez, J., and L. Bornn. 2018. Wide Open Spaces: A Statistical Technique for Measuring Space Creation in Professional Soccer. In annual MIT Sloan Sports Analytics Conference, 2018. Boston, MA.Search in Google Scholar
Fernández, J., L. Bornn, and D. Cervone. 2019. “Decomposing the immeasurable Sport: A Deep Learning Expected Possession Value Framework for Soccer.” In 13th Annual MIT Sloan Sports Analytics Conference.Search in Google Scholar
Greenhough, J., P. Birch, S. Chapman, and G. Rowlands. 2002. “Football Goal Distributions and Extremal Statistics.” Physica A: Statistical Mechanics and its Applications 316: 615–24, https://doi.org/10.1016/s0378-4371(02)01030-0.Search in Google Scholar
He, M., R. Cachucho, and A. Knobbe. 2015. “Football Players Performance and Market Value.” In Proceedings of the 2nd Workshop of Sports Analytics, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD).Search in Google Scholar
Kaggle. 2016. European Soccer Database. https://www.kaggle.com/hugomathien/soccer/.Search in Google Scholar
Karlis, D., and I. Ntzoufras. 2000. “On Modelling Soccer Data.” Student 3: 229–44.Search in Google Scholar
Karlis, D., and I. Ntzoufras. 2003. “Analysis of Sports Data by Using Bivariate Poisson Models.” Journal of the Royal Statistical Society: Series D (The Statistician) 52: 381–93, https://doi.org/10.1111/1467-9884.00366.Search in Google Scholar
Karlis, D., and I. Ntzoufras. 2005. “Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R.” Journal of Statistical Software 14, https://doi.org/10.18637/jss.v014.i10.Search in Google Scholar
Kharrat, T., I. G. McHale, and J. L. Peña. 2020. “Plus–minus Player Ratings for Soccer.” European Journal of Operational Research 283: 726–36, https://doi.org/10.1016/j.ejor.2019.11.026.Search in Google Scholar
Le, H. M., P. Carr, Y. Yue, and P. Lucey. 2017a. “Data-Driven Ghosting Using Deep Imitation Learning.” In MIT Sloan Sports Analytics Conference.Search in Google Scholar
Le, H. M., Y. Yue, and P. Carr. 2017b. “Coordinated Multi-agent Imitation Learning.” ICML. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1995–2003.Search in Google Scholar
Lee, A. J. 1997. “Modeling Scores in the Premier League: Is Manchester United Really the Best?” Chance 10: 15–9, https://doi.org/10.1080/09332480.1997.10554791.Search in Google Scholar
Lucey, P., A. Bialkowski, M. Monfort, P. Carr, and I. Matthews. 2015. Quality vs Quantity: Improved Shot Prediction in Soccer Using Strategic Features from Spatiotemporal Data. In Annual Mit Sloan Sports Analytics Conference, 2015. Boston, MA.Search in Google Scholar
Lynn, M. 1989. “Scarcity Effects on Desirability: Mediated by Assumed Expensiveness?.” Journal of Economic Psychology 10: 257–74, https://doi.org/10.1016/0167-4870(89)90023-8.Search in Google Scholar
Matano, F., L. F. Richardson, T. Pospisil, C. Eubanks, and J. Qin. 2018. “Augmenting Adjusted Plus–Minus in Soccer with FIFA Ratings.” In Carnegie Mellon Sports Analytics Conference.Search in Google Scholar
McHale, I., and P. Scarf. 2007. “Modelling Soccer Matches Using Bivariate Discrete Distributions with General Dependence Structure.” Statistica Neerlandica 61: 432–45, https://doi.org/10.1111/j.1467-9574.2007.00368.x.Search in Google Scholar
Müller, O., A. Simons, and M. Weinmann. 2017. “Beyond Crowd Judgments: Data-Driven Estimation of Market Value in Association Football.” European Journal of Operational Research 263: 611–24, https://doi.org/10.1016/j.ejor.2017.05.005.Search in Google Scholar
NBCSports. 2018. Best Selling Premier League Player Jerseys Revealed. https://soccer.nbcsports.com/2018/02/15/top-20-premier-league-player-jerseys-revealed/.Search in Google Scholar
Niculescu-Mizil, A., and R. Caruana. 2005. “Predicting Good Probabilities with Supervised Learning.” In Proceedings of the 22nd International Conference on Machine Learning, 625–32.10.1145/1102351.1102430Search in Google Scholar
Noslo, E., P. Lambrix, and N. Carlsson. 2018. “Player Valuation in European Football,” In Workshop on Machine Learning and Data Mining for Sports Analytics (ECML/PKDD).10.1007/978-3-030-17274-9_4Search in Google Scholar
Pelton, K. 2019. How Real Plus–Minus Can Reveal Hidden NBA Stars. https://www.espn.com/nba/story/_/id/28309836/how-real-plus-minus-reveal-hidden-nba-stars.Search in Google Scholar
Pollard, R. 1985. “69.9 Goal-Scoring and the Negative Binomial Distribution.” The Mathematical Gazette 69: 45–7, https://doi.org/10.2307/3616453.Search in Google Scholar
Power, P., H. Ruiz, X. Wei, and P. Lucey. 2017. “Not All Passes are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” In KDD ’17, 1605–13.10.1145/3097983.3098051Search in Google Scholar
Shank, K. 2017. Expected Goal Chains: The Link between Passing Sequences and Shots. https://www.americansocceranalysis.com/home/2017/10/3/expected-goal-chains-the-link-between-passing-sequences-and-shots.Search in Google Scholar
Skellam, J. G. 1946. “The Frequency Distribution of the Difference between Two Poisson Variates Belonging to Different Populations.” Journal of the Royal Statistical Society: Series A 109: 296, https://doi.org/10.2307/2981372.Search in Google Scholar
StatsBomb. 2018. The Dual Life of Expected Goals (Part 1). https://statsbomb.com/2018/05/the-dual-life-of-expected-goals-part-1/.Search in Google Scholar
Stern, H. 1991. “On the Probability of Winning a Football Game.” American Statistician 45: 179–83, https://doi.org/10.2307/2684286.Search in Google Scholar
TheEconomist. 2018. Why Footballs Goalkeepers Are Cheap and Unheralded. https://www.economist.com/game-theory/2018/02/09/why-footballs-goalkeepers-are-cheap-and-unheralded.Search in Google Scholar
Weisheimer, A., and T. Palmer. 2014. “On the Reliability of Seasonal Climate Forecasts.” Journal of the Royal Society Interface 11: 20131162, https://doi.org/10.1098/rsif.2013.1162.Search in Google Scholar PubMed PubMed Central
Woolner, K. 2001a. Introduction to Vorp: Value Over Replacement Player. https://web.archive.org/web/20070928064958/http://www.stathead.com/bbeng/woolner/vorpdescnew.htm.Search in Google Scholar
Woolner, K. 2001b. Vorp: Measuring the Value of a Baseball Player’s Performance. https://web.archive.org/web/20080926233543/http://www.stathead.com/articles/woolner/vorp.htm.Search in Google Scholar
Woolner, K. 2002. “Understanding and Measuring Replacement Level.” Baseball Prospectus 1: 55–66.Search in Google Scholar
WorldAtlas. 2018. The Most Popular Sports in the World. https://www.worldatlas.com/articles/what-are-the-most-popular-sports-in-the-world.html.Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research articles
- Winning and losing streaks in the National Hockey League: are teams experiencing momentum or are games a sequence of random events?
- The middle-seed anomaly: why does it occur in some sports tournaments but not others?
- A Skellam regression model for quantifying positional value in soccer
- How to extend Elo: a Bayesian perspective
- A mixed effects multinomial logistic-normal model for forecasting baseball performance
- Distributed lag models to identify the cumulative effects of training and recovery in athletes using multivariate ordinal wellness data
Articles in the same Issue
- Frontmatter
- Research articles
- Winning and losing streaks in the National Hockey League: are teams experiencing momentum or are games a sequence of random events?
- The middle-seed anomaly: why does it occur in some sports tournaments but not others?
- A Skellam regression model for quantifying positional value in soccer
- How to extend Elo: a Bayesian perspective
- A mixed effects multinomial logistic-normal model for forecasting baseball performance
- Distributed lag models to identify the cumulative effects of training and recovery in athletes using multivariate ordinal wellness data