Abstract
Measuring soccer shooting skill is a challenging analytics problem due to the scarcity and highly contextual nature of scoring events. The introduction of more advanced data surrounding soccer shots has given rise to model-based metrics which better cope with these challenges. Specifically, metrics such as expected goals added, goals above expectation, and post-shot expected goals all use advanced data to offer an improvement over the classical conversion rate. However, all metrics developed to date assign a value of zero to off-target shots, which account for almost two-thirds of all shots, since these shots have no probability of scoring. We posit that there is non-negligible shooting skill signal contained in the trajectories of off-target shots and propose two shooting skill metrics that incorporate the signal contained in off-target shots. Specifically, we develop a player-specific generative model for shot trajectories based on a mixture of truncated bivariate Gaussian distributions. We use this generative model to compute metrics that allow us to attach non-zero value to off-target shots. We demonstrate that our proposed metrics are more stable than current state-of-the-art metrics and have increased predictive power.
Funding source: Natural Sciences and Engineering Research Council of Canada
Acknowledgments
The authors thank Farah Bastien and MLSE digital labs for their support.
-
Research ethics: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: This work was supported in part by the Natural Sciences and Engineering Research Council of Canada.
-
Data availability: The raw data is proprietary and was accessed via an academic partnership for the purposes of this research study. However, the software used for our analysis is provided open-source at https://github.com/baronet2/shotmissr.
A.1 Adjusting shot end coordinates
In our data pre-processing, we first applied a small correction to fix an apparent bias in the data collection process affecting shot end coordinates near the goal frame. For a few seasons of data (MLS 2018, Ligue 2 2018–19, 2. Bundesliga 2018–19, Eredivisie 2018–19), the density of shots with end coordinates just outside the posts was low relative to the surrounding regions (see Figure 7). We hypothesize that this effect was caused by StatsBomb’s data collection system, which amplifies the size of the posts so that data collectors are can click on them more easily. To correct for this effect, shot end coordinates were shifted to preserve the actual width of the posts, thereby resulting in a more continuous density of shot end locations. Figure 1 shows the distribution of shot end coordinates for the 2018 MLS season after this correction.

Original shot end coordinates for 2018 MLS data. Shots are coloured by their outcome. Note the significantly lower density of shots in the regions just outside the goal frame shown in red.
A.2 Estimating execution error
We fix the component covariances in our saturated mixture model for computational tractability purposes. To inform our choice of S j for each location m j , we fit two bivariate Gaussian distributions to the empirical data in Hunter et al. (2018), one for shots aimed 1.75 yards above the ground and another for shows aimed 0.14 yards above the ground. The end y-coordinates of left-footed shots were reflected prior to fitting, since left-footed shots display a symmetric pattern to right-footed shots with respect to the orientation of the principal axis. The fitted covariance matrices for these two intended shot locations are:
Figure 8 shows the resulting bivariate Gaussian distributions.

Bivariate Gaussian fits for shots aimed at the high (top) and low (bottom) red intentions, based on data from Hunter et al. (2018). The cut-off at 0.2 for the end z-coordinates is a result of the computer vision process used by Hunter et al. (2018) to compute shot end coordinates.
We assume that the shape of the execution error S changes linearly with respect to the intended shot height, but is unchanged as the intended location moves laterally.
A.3 Sensitivity analysis with respect to data preprocessing choices
In our data pre-processing, we filter the data to only include shots taken from a distance of at least 6 yards. Additionally, we reflect the end y coordinates of left-footed shots in an attempt to exploit symmetry in shooting patterns.
To show that our results are robust to the choice of filtering thresholds and choice to reflect left-footed shots, we repeat our analysis under various data processing settings. Specifically, we generate simplified versions of Figure 5 for nine combinations of filtering thresholds, with or without reflecting left-footed shots. The results are presented in Figure 9.

Comparison of results under various data processing settings.
Based on these results, we see that our proposed metrics outperform the benchmark metrics for nearly all thresholds and conditions. Additionally, we see that our metrics tend to outperform the benchmark metrics more significantly when a larger set of shots is included in the analysis. This figure confirms that our findings are generally robust to the data processing choices made in our analysis.
References
11tegen11 (2014). How to scout a striker? Available at: https://web.archive.org/web/20140707121313/http://11tegen11.net/2014/02/15/how-to-scout-a-striker/ (Accessed 16 June 2022).Search in Google Scholar
Ackerson, K. (2022). Football league rankings, Available at: https://www.globalfootballrankings.com/ (Accessed 23 January 2022).Search in Google Scholar
Anzer, G. and Bauer, P. (2021). A goal scoring probability model for shots based on synchronized positional and event data in football (soccer). Front. Sports Act. Living 3, https://doi.org/10.3389/fspor.2021.624475.Search in Google Scholar PubMed PubMed Central
Brechot, M. and Flepp, R. (2018). Dealing with randomness in match outcomes: how to rethink performance evaluation in European club football using expected goals. J. Sports Econ. 21: 335–362. https://doi.org/10.1177/1527002519897962.Search in Google Scholar
Chan, T.C.Y, Fearing, D.S., Fernandes, C., and Kovalchik, S. (2022). A Markov process approach to untangling intention versus execution in tennis. J. Quant. Anal. Sports 18: 127–145, https://doi.org/10.1515/jqas-2021-0077.Search in Google Scholar
Daly-Grafstein, D. and Bornn, L. (2019). Rao-Blackwellizing field goal percentage. J. Quant. Anal. Sports 15: 85–95, https://doi.org/10.1515/jqas-2018-0064.Search in Google Scholar
Franks, A., D’Amour, A., Cervone, D., and Bornn, L. (2016). Meta-analytics: tools for understanding the statistical properties of sports metrics. J. Quant. Anal. Sports 12: 151–165, https://doi.org/10.1515/jqas-2016-0098.Search in Google Scholar
Goodman, M. (2018). A new way to measure keepers’ shot stopping: post-shot expected goals. StatsBomb, Available at: https://statsbomb.com/2018/11/a-new-way-to-measure-keepers-shot-stopping-post-shot-expected-goals/ (Accessed 16 June 2022).Search in Google Scholar
Grazian, C. and Robert, C.P. (2018). Jeffreys priors for mixture estimation: properties and alternatives. Comput. Stat. Data Anal. 121: 149–163. https://doi.org/10.1016/j.csda.2017.12.005.Search in Google Scholar
Haugh, M.B. and Wang, C. (2022). Play like the pros? Solving the game of darts as a dynamic zero-sum game. Inf. J. Comput. 34: 2540–2551, https://doi.org/10.1287/ijoc.2022.1197.Search in Google Scholar
Hunter, A.H., Angilletta, M.J.Jr., Pavlic, T., Lichtwark, G., Wilson, R.S., Pavlic, T., Lichtwark, G., and Wilson, R.S. (2018). Modeling the two-dimensional accuracy of soccer kicks. J. Biomech. 72: 159–166, https://doi.org/10.1016/j.jbiomech.2018.03.003.Search in Google Scholar PubMed
Lee, G. and Scott, C. (2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal. 56: 2816–2829. https://doi.org/10.1016/j.csda.2012.03.003.Search in Google Scholar
Lucey, P., Bialkowski, A., Monfort, M., Carr, P., and Matthews, I. (2015). Quality vs quantity: Improved shot prediction in soccer using strategic features from spatiotemporal data. In: 9th Annual MIT Sloan Sports Analytics Conference, Available at: https://global-uploads.webflow.com/5f1af76ed86d6771ad48324b/5fee09c092fcdb0989d51ecf%5C%5F1034%5C%5Frppaper%5C%5FSoccerPaper5.pdf.Search in Google Scholar
Mao, L., Peng, Z., Liu, H., and Gómez, M.-A. (2016). Identifying keys to win in the Chinese professional soccer league. Int. J. Perform. Anal. Sport 16: 935–947. https://doi.org/10.1080/24748668.2016.11868940.Search in Google Scholar
McHale, I.G. and Szczepański, Ł. (2014). A mixed effects model for identifying goal scoring ability of footballers. J. Roy. Stat. Soc. Stat. Soc. 177: 397–417. https://doi.org/10.1111/rssa.12015.Search in Google Scholar
Pleuler, D. (2014a). Augmenting free-kick data for more meaningful results. OptaPro, Available at: https://web.archive.org/web/20140326074435/http://www.optasportspro.com/en/about/optapro-blog/posts/2013/augmenting-free-kick-data-for-more-meaningful-results.aspx (Accessed 16 June 2022).Search in Google Scholar
Pleuler, D. (2014b). On the topic of expected goals and the repeatability of finishing skill. OptaPro, Available at: https://web.archive.org/web/20140706142343/http://www.optasportspro.com/en/about/optapro-blog/posts/2014/on-the-topic-of-expected-goals-and-the-repeatability-of-finishing-skill.aspx (Accessed 16 June 2022).Search in Google Scholar
Rathke, A.A.T. (2017). An examination of expected goals and shot efficiency in soccer. J. Hum. Sport Exerc. 12: 514–529. https://doi.org/10.14198/jhse.2017.12.Proc2.05.Search in Google Scholar
Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. Roy. Stat. Soc. B Stat. Methodol. 73: 689–710. https://doi.org/10.1111/j.1467-9868.2011.00781.x.Search in Google Scholar
Rowlinson, A. (2020). Football shot quality: visualizing the quality of soccer/football shots, Master’s thesis. Aalto University School of Business, Available at: http://urn.fi/URN:NBN:fi:aalto-202008234885.Search in Google Scholar
Stan Development Team (2021). RStan: the R interface to stan, R package version 2.21.3, Available at: https://mc-stan.org/ (Accessed 5 April 2022).Search in Google Scholar
Tibshirani, R.J., Price, A., and Taylor, J. (2011). A statistician plays darts. J. Roy. Stat. Soc. Stat. Soc. 174: 213–226. https://doi.org/10.1111/j.1467-985X.2010.00651.x.Search in Google Scholar
Wilhelm, S. and Manjunath, B.G. (2010). tmvtnorm: a package for the truncated multivariate normal distribution. R J. 2: 25–29, https://doi.org/10.32614/RJ-2010-005.Search in Google Scholar
Zhou, C., Zhang, S., Calvo, A.L., and Cui, Y. (2018). Chinese Soccer Association Super League, 2012–2017: key performance indicators in balance games. Int. J. Perform. Anal. Sport 18: 645–656, https://doi.org/10.1080/24748668.2018.1509254.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Editor’s note: on fairness in sports analytics
- Research Articles
- Evaluating plate discipline in Major League Baseball with Bayesian Additive Regression Trees
- Plackett–Luce modeling with trajectory models for measuring athlete strength
- Miss it like Messi: Extracting value from off-target shots in soccer
- On the design of international match calendar: the effect of “FIFA reserved dates” on European football matches’ outcomes
- Review
- Contributions of Carl Morris in sports analytics, a memorium
Articles in the same Issue
- Frontmatter
- Editorial
- Editor’s note: on fairness in sports analytics
- Research Articles
- Evaluating plate discipline in Major League Baseball with Bayesian Additive Regression Trees
- Plackett–Luce modeling with trajectory models for measuring athlete strength
- Miss it like Messi: Extracting value from off-target shots in soccer
- On the design of international match calendar: the effect of “FIFA reserved dates” on European football matches’ outcomes
- Review
- Contributions of Carl Morris in sports analytics, a memorium