Abstract
Count data play a crucial role in sports analytics, providing valuable insights into various aspects of the game. Models that accurately capture the characteristics of count data are essential for making reliable inferences. In this paper, we propose the use of the Conway–Maxwell–Poisson (CMP) model for analyzing count data in sports. The CMP model offers flexibility in modeling data with different levels of dispersion. Here we consider a bivariate CMP model that models the potential correlation between home and away scores by incorporating a random effect specification. We illustrate the advantages of the CMP model through simulations. We then analyze data from baseball and soccer games before, during, and after the COVID-19 pandemic. The performance of our proposed CMP model matches or outperforms standard Poisson and Negative Binomial models, providing a good fit and an accurate estimation of the observed effects in count data with any level of dispersion. The results highlight the robustness and flexibility of the CMP model in analyzing count data in sports, making it a suitable default choice for modeling a diverse range of count data types in sports, where the data dispersion may vary.
Acknowledgments
We would like to express our gratitude to Scott Powers for his invaluable assistance and suggestions. We also extend our thanks to the two anonymous referees and the editors, for their insightful comments and recommendations, which have greatly improved the quality of this paper.
-
Research ethics: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: Data available at: https://www.football-data.co.uk/ and https://www.retrosheet.org/.
Appendix A: Proof of Equation (3)
Proof.
Let δ
i
= (δi1, δi2), where δ
ij
= exp(b
ij
), then δ
i
= exp(b
i
) ∼ LN2(μ, Σ) with μ = exp(0.5 ⋅ diag(D)), and Σ = diag(μ)(exp(D) − 11
T
) diag(μ). Also, assume that
Clearly, (3) can be positive or negative depending on the sign of d12, i.e., the non-diagonal element of D.
Appendix B: Prior sensitivity analysis
We assess the performance of the bivariate Conway–Maxwell–Poisson (CMP) model under various prior specifications. Specifically, we focus on the Over-dispersed scenario of 1 Season discussed in Section 3 and examine different combinations of values for B0, G0, ν0, and R0. These combinations are summarized in Table 5. To monitor convergence, we calculate the multivariate potential scale reduction factor
Prior sensitivity: values considered in each scenario to assess the sensitivity of the model’s performance to the specification of prior hyperparameters.
| Scenarios | A | B | C | D |
|---|---|---|---|---|
| B 0 | 0.1I | I | 3I | 10I |
| G 0 | 0.1I | I | 3I | 10I |
| ν 0 | 30 | 10 | 10 | 5 |
| R 0 | I | I | 0.1I | 0.1I |
Moving on to Table 6, we present the mean squared error (MSE) values for μ j across all scenarios. Consistent with expectations, scenario A demonstrates the best recovery of the observed values. Again, this outcome can be attributed to the small variance of the priors employed in this scenario. Conversely, scenarios C and D exhibit the poorest performance in terms of recovering the observed values.
Mean squared error (MSE) of μ – prior sensitivity analysis.
| Scenarios | A | B | C | D |
|---|---|---|---|---|
| μ 1 | 0.71 | 1.01 | 1.19 | 1.2 |
| μ 2 | 0.25 | 0.25 | 0.22 | 0.545 |
Prior sensitivity analysis: multivariate potential scale factor
| Scenarios | A | B | C | D | ||||
|---|---|---|---|---|---|---|---|---|
|
|
y 1 | y 2 | y 1 | y 2 | y 1 | y 2 | y 1 | y 2 |
| β | 1.07 | 1.06 | 1.15 | 1.16 | 1.2 | 1.21 | 4.22 | 1.82 |
| γ | 1.05 | 1.05 | 1.19 | 1.17 | 1.22 | 1.24 | 4.74 | 5.04 |
| b | 1.02 | 1.02 | 1.03 | 1.03 | 1.22 | 1.24 | 4.22 | 3.91 |
Prior sensitivity analysis: effective sample size (ESS).
| Scenarios | A | B | C | D | ||||
|---|---|---|---|---|---|---|---|---|
|
|
y 1 | y 2 | y 1 | y 2 | y 1 | y 2 | y 1 | y 2 |
| β | 350.44 | 408.54 | 123.25 | 131.13 | 98.48 | 97.13 | 7.26 | 10.71 |
| γ | 506.3 | 511.11 | 195.37 | 192.4 | 106.45 | 111.09 | 47.51 | 43.48 |
| b | 9,366 | 9,555.62 | 9,175.4 | 9,286.9 | 2,324.947 | 2,351.463 | 2,306.58 | 2,329.84 |
Scenarios A and B exhibit more stable estimations, with all
Appendix C: Shape parameters γ
In Figure 18, we observe the shape parameters estimation for the Premier League data analysis. In the left sub-figure we have the estimation for parameters associated with the Home Goals. Strong teams, such as Chelsea, Liverpool, and Manchester City, are located below 0, indicating that when other teams play against them, the dispersion parameter will be lower, implying a larger variance in their scored goals (over-dispersion). We observe a similar effect analyzing the Away Goals. Conversely, weaker teams such as Norwich or Brighton have their shape parameter above 0, indicating that when teams play against them, the goals will be more under-dispersed. On the other hand, in the x-axis, Manchester United and Manchester City are below 0, which means that the number of goals they score tends to be more over-dispersed compared to the other teams. We see the opposite trend with Liverpool, where the goals scored tend to be more under-dispersed. We can compare this with respect to over or under-dispersion because the intercept effect was 0. It is noticeable that the effects are different in the Home Goals and Away scores, supporting the modeling approach we assumed.

Analysis of premier league data: shape parameters.
Similarly, we plot the estimations for the MLB analysis in Figure 19. One of the noticeable cases is Miami Marlins (MIA), with a negative effect on the x-axis in the Home Points and a positive effect in the Away Points. This means that at Home, MIA tends to have scores more dispersed, while Away, their scores will be less dispersed in comparison to the others. This highlights the ability of our model to adapt to data of different types and to accommodate different phenomena and mechanics observed in the different teams and sports.

Analysis of MLB data: shape parameters.
References
Backlund, J. and Johdet, N. (2018). A Bayesian approach to predict the number of soccer goals: modeling with Bayesian negative binomial regression. Dissertation, Linköping University, The Division of Statistics and Machine Learning, Available at: https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-149028.Search in Google Scholar
Baio, G. and Blangiardo, M. (2010). Bayesian hierarchical model for the prediction of football results. J. Appl. Stat. 37: 253–264. https://doi.org/10.1080/02664760802684177.Search in Google Scholar
Benson, A. and Friel, N. (2021). Bayesian inference, model selection and likelihood estimation using fast rejection sampling: the Conway–Maxwell–Poisson distribution. Bayesian Anal. 16: 905–931. https://doi.org/10.1214/20-ba1230.Search in Google Scholar
Benz, L.S. and Lopez, M.J. (2021). Estimating the change in soccer’s home advantage during the Covid-19 pandemic using bivariate Poisson regression. AStA Adv. Stat. Anal.: 1–28. https://doi.org/10.1007/s10182-021-00413-9.Search in Google Scholar PubMed PubMed Central
Boshnakov, G., Kharrat, T., and McHale, I.G. (2017). A bivariate Weibull count model for forecasting association football scores. Int. J. Forecast. 33: 458–466. https://doi.org/10.1016/j.ijforecast.2016.11.006.Search in Google Scholar
Brooks, S.P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7: 434–455. https://doi.org/10.2307/1390675.Search in Google Scholar
Chanialidis, C., Evers, L., Neocleous, T., and Nobile, A. (2018). Efficient Bayesian inference for COM-Poisson regression models. Stat. Comput. 28: 595–608. https://doi.org/10.1007/s11222-017-9750-x.Search in Google Scholar
Chiu, Y. and Chang, C. (2022). Major league baseball during the COVID-19 pandemic: does a lack of spectators affect home advantage? Humanit. Soc. Sci. Commun. 9: 1–6. https://doi.org/10.1057/s41599-022-01193-6.Search in Google Scholar
Conway, R.W. and Maxwell, W.L. (1962). A queuing model with state dependent service rates. J. Ind. Eng. 12: 132–136.Search in Google Scholar
Dixon, M.J. and Coles, S.G. (1997). Modelling association football scores and inefficiencies in the football betting market. J. R. Stat. Soc., C: Appl. Stat. 46: 265–280. https://doi.org/10.1111/1467-9876.00065.Search in Google Scholar
Fedrizzi, G., Canal, L., and Micciolo, R. (2022). UEFA EURO 2020: a pure game of chance? arXiv preprint arXiv:2203.07531.Search in Google Scholar
Guikema, S.D. and Goffelt, J.P. (2008). A flexible count data regression model for risk analysis. Risk Anal. Int. J. 28: 213–223. https://doi.org/10.1111/j.1539-6924.2008.01014.x.Search in Google Scholar PubMed
Higgs, N. and Stavness, I. (2021). Bayesian analysis of home advantage in North American professional sports before and during COVID-19. Sci. Rep. 11: 1–11. https://doi.org/10.1038/s41598-021-93533-w.Search in Google Scholar PubMed PubMed Central
Jones, M.B. (2015). The home advantage in major league baseball. Percept. Mot. Ski. 121: 791–804. https://doi.org/10.2466/26.pms.121c25x1.Search in Google Scholar
Karlis, D. and Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. J. R. Stat. Soc. Ser. D Statistician 52: 381–393. https://doi.org/10.1111/1467-9884.00366.Search in Google Scholar
Karlis, D. and Ntzoufras, I. (2009). Bayesian modelling of football outcomes: using the Skellam’s distribution for the goal difference. IMA J. Manag. Math. 20: 133–145. https://doi.org/10.1093/imaman/dpn026.Search in Google Scholar
Kleiber, C. and Zeileis, A. (2016). Visualizing count data regressions using rootograms. Am. Stat. 70: 296–303. https://doi.org/10.1080/00031305.2016.1173590.Search in Google Scholar
Kramer, D. (2022). 3 reasons for seattle’s recent surge. MLB, Available at: https://www.mlb.com/news/mariners-playoff-odds-surging (Accessed 18 April 2024).Search in Google Scholar
Lee, A.J. (1997). Modeling scores in the premier league: is manchester united really the best? Chance 10: 15–19. https://doi.org/10.1080/09332480.1997.10554791.Search in Google Scholar
Lopez, M.J. (2016). Persuaded under pressure: evidence from the national football league. Econ. Inq. 54: 1763–1773. https://doi.org/10.1111/ecin.12341.Search in Google Scholar
Losak, J.M. and Sabel, J. (2021). Baseball home field advantage without fans in the stands. Int. J. Sport Finance 16. https://doi.org/10.32731/ijsf/163.082021.04.Search in Google Scholar
Maher, M.J. (1982). Modelling association football scores. Stat. Neerl. 36: 109–118. https://doi.org/10.1111/j.1467-9574.1982.tb00782.x.Search in Google Scholar
McCarrick, D., Bilalic, M., Neave, N., and Wolfson, S. (2021). Home advantage during the COVID-19 pandemic in European football. Psychol. Sport Exerc. 56: 102013. https://doi.org/10.1016/j.psychsport.2021.102013.Search in Google Scholar PubMed PubMed Central
McHale, I. and Scarf, P. (2011). Modelling the dependence of goals scored by opposing teams in international soccer matches. Stat. Model. 11: 219–236. https://doi.org/10.1177/1471082x1001100303.Search in Google Scholar
Murray, I., Ghahramani, Z., and MacKay, D. (2012) MCMC for doubly-intractable distributions. In: Proceedings of the twenty-second conference on uncertainty in artificial intelligence, pp. 359–366.Search in Google Scholar
Payne, E.H., Gebregziabher, M., Hardin, J.W., Ramakrishnan, V., and Egede, L.E. (2018). An empirical approach to determine a threshold for assessing overdispersion in Poisson and negative binomial models for count data. Commun. Stat. Simulat. Comput. 47: 1722–1738. https://doi.org/10.1080/03610918.2017.1323223.Search in Google Scholar PubMed PubMed Central
Pettersson-Lidbom, P. and Priks, M. (2010). Behavior under social pressure: empty Italian stadiums and referee bias. Econ. Lett. 108: 212–214. https://doi.org/10.1016/j.econlet.2010.04.023.Search in Google Scholar
Piancastelli, L.S., Friel, N., Barreto-Souza, W., and Ombao, H. (2023). Multivariate Conway–Maxwell–Poisson distribution: Sarmanov method and doubly-intractable Bayesian inference. J. Comput. Graph. Stat. 32: 483–500. https://doi.org/10.1080/10618600.2022.2116443.Search in Google Scholar
Price, K., Cai, H., Shen, W., and Hu, G. (2022). How much does home field advantage matter in soccer games? A causal inference approach for English premier league analysis. arXiv preprint arXiv:2205.07193.Search in Google Scholar
Reade, J., Schreyer, D., and Singleton, C. (2022). Eliminating supportive crowds reduces referee bias. Econ. Inq. 60: 1416–1436, https://doi.org/10.1111/ecin.13063.Search in Google Scholar
Reep, C., Pollard, R., and Benjamin, B. (1971). Skill and chance in ball games. J. Roy. Stat. Soc. 134: 623–629. https://doi.org/10.2307/2343657.Search in Google Scholar
Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., and Boatwright, P. (2005). A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc., C: Appl. Stat. 54: 127–142. https://doi.org/10.1111/j.1467-9876.2005.00474.x.Search in Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. J. Roy. Stat. Soc. B Stat. Methodol. 64: 583–639. https://doi.org/10.1111/1467-9868.00353.Search in Google Scholar
Thomas, R. (2019). West Ham are better when they don’t have the ball, which is why they’re thriving away from home. The Athletic, Available at: https://theathletic.com/1467224/2019/12/19/west-ham-are-better-when-they-dont-have-the-ball-which-is-why-theyre-thriving-away-from-home/ (Accessed 18 April 2024).Search in Google Scholar
Tilp, M. and Thaller, S. (2020). Covid-19 has turned home advantage into home disadvantage in the German soccer bundesliga. Front. Sports Act. Living 2: 593499. https://doi.org/10.3389/fspor.2020.593499.Search in Google Scholar PubMed PubMed Central
Vihola, M. (2012). Robust adaptive metropolis algorithm with coerced acceptance rate. Stat. Comput. 22: 997–1008. https://doi.org/10.1007/s11222-011-9269-5.Search in Google Scholar
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Thoughts from the Editor
- Research Articles
- European football player valuation: integrating financial models and network theory
- On the efficiency of trading intangible fixed assets in Major League Baseball
- Expected goals under a Bayesian viewpoint: uncertainty quantification and online learning
- Bayesian bivariate Conway–Maxwell–Poisson regression model for correlated count data in sports
- Success factors in national team football: an analysis of the UEFA EURO 2020
Articles in the same Issue
- Frontmatter
- Editorial
- Thoughts from the Editor
- Research Articles
- European football player valuation: integrating financial models and network theory
- On the efficiency of trading intangible fixed assets in Major League Baseball
- Expected goals under a Bayesian viewpoint: uncertainty quantification and online learning
- Bayesian bivariate Conway–Maxwell–Poisson regression model for correlated count data in sports
- Success factors in national team football: an analysis of the UEFA EURO 2020