Abstract
In this study, we investigate data from the Indian Premier League (IPL) spanning from its inception in 2008 to the most recent 2024 season to identify and analyze key factors influencing cricket scores. Using the H2O AutoML framework, we develop a predictive model focused on identifying low first-innings scores, incorporating data on location, weather conditions, teams, and players, while distinguishing them from matches with par or high score. Explainable AI (XAI) tools are employed to quantify the influence of various match features on score predictions, ensuring transparency in the model’s decision-making process. To further enhance classification performance, we introduce pre-match pitch report descriptions generated by a Large Language Model (LLM). For a subset of matches, we leverage multimodal LLM capabilities to analyse pitch report videos, comparing their predictive value against textual descriptions. Our findings underscore the potential of AI and machine learning in sports analytics, specifically in predicting cricket scores based on pitch conditions and other influential factors. This research provides valuable insights for teams, coaches, fantasy sports enthusiasts, IPL administrators and analysts, helping to optimize strategies based on available pre-match information. As part of our work we are sharing a pitch report dataset, python source code for the predictive model with explainability, and a Most Valuable Player (MVP) implementation framework to enhance reproducibility and support further research in cricket analytics.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: Mohit: Contributed to all aspects of the paper, including conceptualization, study design, data collection, data analysis, model development, and manuscript drafting. Manya: Responsible for data collation, feature engineering and manuscript review. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: We have used ChatGPT for improving writing and for generating template python code.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: Not applicable.
References
Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Herrera, F., Gil-Lopez, S., Molina, D., Benjamins, R., et al.. (2020). Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58: 82–115, https://doi.org/10.1016/j.inffus.2019.12.012.Search in Google Scholar
Asif, M. and McHale, I.G. (2016). In-play forecasting of win probability in one-day international cricket: a dynamic logistic regression model. Int. J. Forecast. 32: 34–43, https://doi.org/10.1016/j.ijforecast.2015.02.005.Search in Google Scholar
Asif, M. and McHale, I.G. (2019). A generalized non-linear forecasting model for limited overs international cricket. Int. J. Forecast. 35: 634–640, https://doi.org/10.1016/j.ijforecast.2018.12.003.Search in Google Scholar
Bailey, M. and Clarke, S.R. (2006). Predicting the match outcome in one day international cricket matches, while the game is in progress. J. Sports Sci. Med. 5: 480.Search in Google Scholar
BARC India. (2016). What you may not know about T20 cricket, Available at: https://www.barcindia.co.in/newsletter/what-you-may-not-know-about-t20-cricket.pdf.Search in Google Scholar
Bhardwaj, P. (2024). IPL complete dataset (2008-2024). Kaggle, Available at: https://www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020.Search in Google Scholar
Brooker, S. and Hogan, S. (2011). A method for inferring batting conditions in ODI cricket from historical data. In: Working paper. University of Canterbury-Department of Economics and Finance, New Zealand.Search in Google Scholar
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.Search in Google Scholar
Dash, M. and Liu, H. (1997). Feature selection for classifications. Intell. Data Anal. Int. J. 1: 131–156, https://doi.org/10.1016/s1088-467x(97)00008-5.Search in Google Scholar
Duckworth, F.C. and Lewis, A.J. (1998). A fair method for resetting the target in interrupted one-day cricket matches. J. Oper. Res. Soc. 49: 220–227, https://doi.org/10.1038/sj.jors.2600524.Search in Google Scholar
ESPN Cricinfo. (2024). ESPN Cricinfo, Available at: https://www.espncricinfo.com/cricketers.Search in Google Scholar
Gupta, A. and Muthiah, S.B. (2023). Learning cricket strokes from spatial and motion visual word sequences. Multimed. Tools Appl. 82: 1237–1259, https://doi.org/10.1007/s11042-022-13307-y.Search in Google Scholar
H2O.ai. (2024). Variable importance, H2O.ai Documentation, Available at: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html.Search in Google Scholar
Hall, M.A. (1999). Correlation-based feature selection for machine learning, Doctoral dissertation. The University of Waikato.Search in Google Scholar
He, X., Zhao, K., and Chu, X. (2021). AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212: 106622, https://doi.org/10.1016/j.knosys.2020.106622.Search in Google Scholar
IPL Official Site. (2024). IPL official site, Available at: https://www.iplt20.com/.Search in Google Scholar
Joshi, V. and Singh, A. (2024). SoccerRAG: a multimodal soccer analysis model integrating text, video, and audio. IEEE Trans. Sports Anal. 34: 89–101.Search in Google Scholar
Kanhaiya, K., Gupta, R., and Arpit, K.A.S.A. (2019). Cracked cricket pitch analysis (CCPA) using image processing and machine learning. Global J. Appl. Data Sci. Internet Things 3: 1–9.Search in Google Scholar
Kapadia, K., Abdel-Jaber, H., Thabtah, F., and Hadi, W. (2022). Sport analytics for cricket game results using machine learning: an experimental study. Appl. Comput. Inform. 18: 256–266, https://doi.org/10.1016/j.aci.2019.11.006.Search in Google Scholar
Kumar, C. and Balasubramanian, G. (2023). Comparative analysis of pitch ratings in all formats of cricket. Manag. Labour Stud. 48: 307–314, https://doi.org/10.1177/0258042x221148069.Search in Google Scholar
Liu, Y., Perez-Rosas, V., and Mihalcea, R. (2023). A survey on multimodal models for sports analytics. arXiv preprint arXiv:2406.12252v1.Search in Google Scholar
Lundberg, S.M. and Lee, S.I. (2017). A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst, 30.Search in Google Scholar
Maldives Cricket Association. (2018). MVP CricHQ explanation, Available at: https://www.maldivescricket.org/wp-content/uploads/2018/09/2018-MVP-CricHQ-Explanation.pdf.Search in Google Scholar
Moustakidis, S., Plakias, S., Kokkotis, C., Tsatalas, T., and Tsaopoulos, D. (2023). Predicting football team performance with explainable AI: leveraging SHAP to identify key team-level performance metrics. Future Internet 15: 174, https://doi.org/10.3390/fi15050174.Search in Google Scholar
Noel, J.T.P., Prado da Fonseca, V., and Soares, A. (2024). A comprehensive data pipeline for comparing the effects of momentum on sports leagues. Data 9: 29, https://doi.org/10.3390/data9020029.Search in Google Scholar
Prakash, C.D. and Verma, S. (2022). A new in-form and role-based deep player performance index for player evaluation in T20 cricket. Decis. Anal. J.: 100025, https://doi.org/10.1016/j.dajour.2022.100025.Search in Google Scholar
Sahoo, D.K., Deyar, D.U., and Gupta, D. (2024) Prediction of shot selection of a batsman in the death over of a T20 cricket match using machine learning. In: 2024 IEEE 9th international conference for convergence in technology (I2CT), pp. 1–7.10.1109/I2CT61223.2024.10543886Search in Google Scholar
Saikia, H., Bhattacharjee, D., and Lemmer, H. (2012). Predicting the performance of bowlers in IPL: an application of artificial neural networks. Int. J. Perform. Anal. Sport 12: 262–275.10.1080/24748668.2012.11868584Search in Google Scholar
Sarkhoosh, M.H., Gautam, S., Midoglu, C., Sabet, S.S., and Halvorsen, P. (2024) Multimodal AI-based summarization and storytelling for soccer on social media. In: Proceedings of the 15th ACM multimedia systems conference, pp. 485–491.10.1145/3625468.3652197Search in Google Scholar
Sharma, S. and Singla, D. (2021). Predicting outcome of Indian premier league (IPL) matches using machine learning. J. Emerging Technol. Innovative Res. 8: 710–715.Search in Google Scholar
Tonmoy, M.A.S., Dey, S.K., Islam, T., and Apu, J. (2023) A data-driven approach to predict scores in T20 cricket match using machine learning classifier. In: International conference on big data, IoT and machine learning. Springer Nature, Singapore, pp. 727–745.10.1007/978-981-99-8937-9_49Search in Google Scholar
Viswanadha, S., Sivalenka, K., Jhawar, M.G., and Pudi, V. (2021) Dynamic winner prediction in Twenty20 cricket: based on relative team strengths. In: Proceedings of the 2021 international conference on sports analytics and machine learning, pp. 1–9.Search in Google Scholar
Weatherbit API. (2024). Weatherbit API, Available at: https://www.weatherbit.io/.Search in Google Scholar
Weeraddana, N. and Premaratne, S. (2021). Unique approach for cricket match outcome prediction using XGBoost algorithms. J. Theor. Appl. Inf. Technol. 99: 2162–2171.Search in Google Scholar
Wickramasinghe, I. (2022). Applications of machine learning in cricket: a systematic review. Mach. Learn. Appl. 10: 100435, https://doi.org/10.1016/j.mlwa.2022.100435.Search in Google Scholar
Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., and Chen, E. (2023). A survey on multimodal large language models. arXiv preprint arXiv:2306.13549.10.1093/nsr/nwae403Search in Google Scholar PubMed PubMed Central
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- NHL aging curves using functional principal component analysis
- Aerodynamics, technology or pit strategy: why did overtaking in Formula 1 decline during the 1980s and 1990s? A micro-level analysis
- Analyzing and forecasting success in the Men’s Ice Hockey World (Junior) Championships using a dynamic ranking model
- FIVB ranking: misstep in the right direction
- Analyzing key factors influencing IPL cricket scores using explainability and multimodal data
Articles in the same Issue
- Frontmatter
- Research Articles
- NHL aging curves using functional principal component analysis
- Aerodynamics, technology or pit strategy: why did overtaking in Formula 1 decline during the 1980s and 1990s? A micro-level analysis
- Analyzing and forecasting success in the Men’s Ice Hockey World (Junior) Championships using a dynamic ranking model
- FIVB ranking: misstep in the right direction
- Analyzing key factors influencing IPL cricket scores using explainability and multimodal data