Home Miss it like Messi: Extracting value from off-target shots in soccer
Article
Licensed
Unlicensed Requires Authentication

Miss it like Messi: Extracting value from off-target shots in soccer

  • Ethan Baron ORCID logo EMAIL logo , Nathan Sandholtz , Devin Pleuler and Timothy C. Y. Chan
Published/Copyright: January 1, 2024

Abstract

Measuring soccer shooting skill is a challenging analytics problem due to the scarcity and highly contextual nature of scoring events. The introduction of more advanced data surrounding soccer shots has given rise to model-based metrics which better cope with these challenges. Specifically, metrics such as expected goals added, goals above expectation, and post-shot expected goals all use advanced data to offer an improvement over the classical conversion rate. However, all metrics developed to date assign a value of zero to off-target shots, which account for almost two-thirds of all shots, since these shots have no probability of scoring. We posit that there is non-negligible shooting skill signal contained in the trajectories of off-target shots and propose two shooting skill metrics that incorporate the signal contained in off-target shots. Specifically, we develop a player-specific generative model for shot trajectories based on a mixture of truncated bivariate Gaussian distributions. We use this generative model to compute metrics that allow us to attach non-zero value to off-target shots. We demonstrate that our proposed metrics are more stable than current state-of-the-art metrics and have increased predictive power.


Corresponding author: Ethan Baron, University of Toronto, Toronto, ON, Canada, E-mail:

Acknowledgments

The authors thank Farah Bastien and MLSE digital labs for their support.

  1. Research ethics: Not applicable.

  2. Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: The authors state no conflict of interest.

  4. Research funding: This work was supported in part by the Natural Sciences and Engineering Research Council of Canada.

  5. Data availability: The raw data is proprietary and was accessed via an academic partnership for the purposes of this research study. However, the software used for our analysis is provided open-source at https://github.com/baronet2/shotmissr.

Appendix A

A.1 Adjusting shot end coordinates

In our data pre-processing, we first applied a small correction to fix an apparent bias in the data collection process affecting shot end coordinates near the goal frame. For a few seasons of data (MLS 2018, Ligue 2 2018–19, 2. Bundesliga 2018–19, Eredivisie 2018–19), the density of shots with end coordinates just outside the posts was low relative to the surrounding regions (see Figure 7). We hypothesize that this effect was caused by StatsBomb’s data collection system, which amplifies the size of the posts so that data collectors are can click on them more easily. To correct for this effect, shot end coordinates were shifted to preserve the actual width of the posts, thereby resulting in a more continuous density of shot end locations. Figure 1 shows the distribution of shot end coordinates for the 2018 MLS season after this correction.

Figure 7: 
Original shot end coordinates for 2018 MLS data. Shots are coloured by their outcome. Note the significantly lower density of shots in the regions just outside the goal frame shown in red.
Figure 7:

Original shot end coordinates for 2018 MLS data. Shots are coloured by their outcome. Note the significantly lower density of shots in the regions just outside the goal frame shown in red.

A.2 Estimating execution error

We fix the component covariances in our saturated mixture model for computational tractability purposes. To inform our choice of S j for each location m j , we fit two bivariate Gaussian distributions to the empirical data in Hunter et al. (2018), one for shots aimed 1.75 yards above the ground and another for shows aimed 0.14 yards above the ground. The end y-coordinates of left-footed shots were reflected prior to fitting, since left-footed shots display a symmetric pattern to right-footed shots with respect to the orientation of the principal axis. The fitted covariance matrices for these two intended shot locations are:

S ( 0.14 y d ) = 0.704 0.157 0.157 0.297 S ( 1.75 y d ) = 0.782 0.442 0.442 0.742

Figure 8 shows the resulting bivariate Gaussian distributions.

Figure 8: 
Bivariate Gaussian fits for shots aimed at the high (top) and low (bottom) red intentions, based on data from Hunter et al. (2018). The cut-off at 0.2 for the end z-coordinates is a result of the computer vision process used by Hunter et al. (2018) to compute shot end coordinates.
Figure 8:

Bivariate Gaussian fits for shots aimed at the high (top) and low (bottom) red intentions, based on data from Hunter et al. (2018). The cut-off at 0.2 for the end z-coordinates is a result of the computer vision process used by Hunter et al. (2018) to compute shot end coordinates.

We assume that the shape of the execution error S changes linearly with respect to the intended shot height, but is unchanged as the intended location moves laterally.

A.3 Sensitivity analysis with respect to data preprocessing choices

In our data pre-processing, we filter the data to only include shots taken from a distance of at least 6 yards. Additionally, we reflect the end y coordinates of left-footed shots in an attempt to exploit symmetry in shooting patterns.

To show that our results are robust to the choice of filtering thresholds and choice to reflect left-footed shots, we repeat our analysis under various data processing settings. Specifically, we generate simplified versions of Figure 5 for nine combinations of filtering thresholds, with or without reflecting left-footed shots. The results are presented in Figure 9.

Figure 9: 
Comparison of results under various data processing settings.
Figure 9:

Comparison of results under various data processing settings.

Based on these results, we see that our proposed metrics outperform the benchmark metrics for nearly all thresholds and conditions. Additionally, we see that our metrics tend to outperform the benchmark metrics more significantly when a larger set of shots is included in the analysis. This figure confirms that our findings are generally robust to the data processing choices made in our analysis.

References

11tegen11 (2014). How to scout a striker? Available at: https://web.archive.org/web/20140707121313/http://11tegen11.net/2014/02/15/how-to-scout-a-striker/ (Accessed 16 June 2022).Search in Google Scholar

Ackerson, K. (2022). Football league rankings, Available at: https://www.globalfootballrankings.com/ (Accessed 23 January 2022).Search in Google Scholar

Anzer, G. and Bauer, P. (2021). A goal scoring probability model for shots based on synchronized positional and event data in football (soccer). Front. Sports Act. Living 3, https://doi.org/10.3389/fspor.2021.624475.Search in Google Scholar PubMed PubMed Central

Brechot, M. and Flepp, R. (2018). Dealing with randomness in match outcomes: how to rethink performance evaluation in European club football using expected goals. J. Sports Econ. 21: 335–362. https://doi.org/10.1177/1527002519897962.Search in Google Scholar

Chan, T.C.Y, Fearing, D.S., Fernandes, C., and Kovalchik, S. (2022). A Markov process approach to untangling intention versus execution in tennis. J. Quant. Anal. Sports 18: 127–145, https://doi.org/10.1515/jqas-2021-0077.Search in Google Scholar

Daly-Grafstein, D. and Bornn, L. (2019). Rao-Blackwellizing field goal percentage. J. Quant. Anal. Sports 15: 85–95, https://doi.org/10.1515/jqas-2018-0064.Search in Google Scholar

Franks, A., D’Amour, A., Cervone, D., and Bornn, L. (2016). Meta-analytics: tools for understanding the statistical properties of sports metrics. J. Quant. Anal. Sports 12: 151–165, https://doi.org/10.1515/jqas-2016-0098.Search in Google Scholar

Goodman, M. (2018). A new way to measure keepers’ shot stopping: post-shot expected goals. StatsBomb, Available at: https://statsbomb.com/2018/11/a-new-way-to-measure-keepers-shot-stopping-post-shot-expected-goals/ (Accessed 16 June 2022).Search in Google Scholar

Grazian, C. and Robert, C.P. (2018). Jeffreys priors for mixture estimation: properties and alternatives. Comput. Stat. Data Anal. 121: 149–163. https://doi.org/10.1016/j.csda.2017.12.005.Search in Google Scholar

Haugh, M.B. and Wang, C. (2022). Play like the pros? Solving the game of darts as a dynamic zero-sum game. Inf. J. Comput. 34: 2540–2551, https://doi.org/10.1287/ijoc.2022.1197.Search in Google Scholar

Hunter, A.H., Angilletta, M.J.Jr., Pavlic, T., Lichtwark, G., Wilson, R.S., Pavlic, T., Lichtwark, G., and Wilson, R.S. (2018). Modeling the two-dimensional accuracy of soccer kicks. J. Biomech. 72: 159–166, https://doi.org/10.1016/j.jbiomech.2018.03.003.Search in Google Scholar PubMed

Lee, G. and Scott, C. (2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal. 56: 2816–2829. https://doi.org/10.1016/j.csda.2012.03.003.Search in Google Scholar

Lucey, P., Bialkowski, A., Monfort, M., Carr, P., and Matthews, I. (2015). Quality vs quantity: Improved shot prediction in soccer using strategic features from spatiotemporal data. In: 9th Annual MIT Sloan Sports Analytics Conference, Available at: https://global-uploads.webflow.com/5f1af76ed86d6771ad48324b/5fee09c092fcdb0989d51ecf%5C%5F1034%5C%5Frppaper%5C%5FSoccerPaper5.pdf.Search in Google Scholar

Mao, L., Peng, Z., Liu, H., and Gómez, M.-A. (2016). Identifying keys to win in the Chinese professional soccer league. Int. J. Perform. Anal. Sport 16: 935–947. https://doi.org/10.1080/24748668.2016.11868940.Search in Google Scholar

McHale, I.G. and Szczepański, Ł. (2014). A mixed effects model for identifying goal scoring ability of footballers. J. Roy. Stat. Soc. Stat. Soc. 177: 397–417. https://doi.org/10.1111/rssa.12015.Search in Google Scholar

Pleuler, D. (2014a). Augmenting free-kick data for more meaningful results. OptaPro, Available at: https://web.archive.org/web/20140326074435/http://www.optasportspro.com/en/about/optapro-blog/posts/2013/augmenting-free-kick-data-for-more-meaningful-results.aspx (Accessed 16 June 2022).Search in Google Scholar

Pleuler, D. (2014b). On the topic of expected goals and the repeatability of finishing skill. OptaPro, Available at: https://web.archive.org/web/20140706142343/http://www.optasportspro.com/en/about/optapro-blog/posts/2014/on-the-topic-of-expected-goals-and-the-repeatability-of-finishing-skill.aspx (Accessed 16 June 2022).Search in Google Scholar

Rathke, A.A.T. (2017). An examination of expected goals and shot efficiency in soccer. J. Hum. Sport Exerc. 12: 514–529. https://doi.org/10.14198/jhse.2017.12.Proc2.05.Search in Google Scholar

Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. Roy. Stat. Soc. B Stat. Methodol. 73: 689–710. https://doi.org/10.1111/j.1467-9868.2011.00781.x.Search in Google Scholar

Rowlinson, A. (2020). Football shot quality: visualizing the quality of soccer/football shots, Master’s thesis. Aalto University School of Business, Available at: http://urn.fi/URN:NBN:fi:aalto-202008234885.Search in Google Scholar

Stan Development Team (2021). RStan: the R interface to stan, R package version 2.21.3, Available at: https://mc-stan.org/ (Accessed 5 April 2022).Search in Google Scholar

Tibshirani, R.J., Price, A., and Taylor, J. (2011). A statistician plays darts. J. Roy. Stat. Soc. Stat. Soc. 174: 213–226. https://doi.org/10.1111/j.1467-985X.2010.00651.x.Search in Google Scholar

Wilhelm, S. and Manjunath, B.G. (2010). tmvtnorm: a package for the truncated multivariate normal distribution. R J. 2: 25–29, https://doi.org/10.32614/RJ-2010-005.Search in Google Scholar

Zhou, C., Zhang, S., Calvo, A.L., and Cui, Y. (2018). Chinese Soccer Association Super League, 2012–2017: key performance indicators in balance games. Int. J. Perform. Anal. Sport 18: 645–656, https://doi.org/10.1080/24748668.2018.1509254.Search in Google Scholar

Received: 2022-06-22
Accepted: 2023-11-08
Published Online: 2024-01-01
Published in Print: 2024-03-25

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 24.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jqas-2022-0107/html
Scroll to top button