Home Analyzing key factors influencing IPL cricket scores using explainability and multimodal data
Article
Licensed
Unlicensed Requires Authentication

Analyzing key factors influencing IPL cricket scores using explainability and multimodal data

  • Mohit Bhatnagar EMAIL logo and Manya Bhatnagar
Published/Copyright: June 11, 2025

Abstract

In this study, we investigate data from the Indian Premier League (IPL) spanning from its inception in 2008 to the most recent 2024 season to identify and analyze key factors influencing cricket scores. Using the H2O AutoML framework, we develop a predictive model focused on identifying low first-innings scores, incorporating data on location, weather conditions, teams, and players, while distinguishing them from matches with par or high score. Explainable AI (XAI) tools are employed to quantify the influence of various match features on score predictions, ensuring transparency in the model’s decision-making process. To further enhance classification performance, we introduce pre-match pitch report descriptions generated by a Large Language Model (LLM). For a subset of matches, we leverage multimodal LLM capabilities to analyse pitch report videos, comparing their predictive value against textual descriptions. Our findings underscore the potential of AI and machine learning in sports analytics, specifically in predicting cricket scores based on pitch conditions and other influential factors. This research provides valuable insights for teams, coaches, fantasy sports enthusiasts, IPL administrators and analysts, helping to optimize strategies based on available pre-match information. As part of our work we are sharing a pitch report dataset, python source code for the predictive model with explainability, and a Most Valuable Player (MVP) implementation framework to enhance reproducibility and support further research in cricket analytics.


Corresponding author: Mohit Bhatnagar, Jindal Global Business School, Sonipat, India, E-mail: 

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: Mohit: Contributed to all aspects of the paper, including conceptualization, study design, data collection, data analysis, model development, and manuscript drafting. Manya: Responsible for data collation, feature engineering and manuscript review. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: We have used ChatGPT for improving writing and for generating template python code.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: None declared.

  7. Data availability: Not applicable.

References

Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Herrera, F., Gil-Lopez, S., Molina, D., Benjamins, R., et al.. (2020). Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58: 82–115, https://doi.org/10.1016/j.inffus.2019.12.012.Search in Google Scholar

Asif, M. and McHale, I.G. (2016). In-play forecasting of win probability in one-day international cricket: a dynamic logistic regression model. Int. J. Forecast. 32: 34–43, https://doi.org/10.1016/j.ijforecast.2015.02.005.Search in Google Scholar

Asif, M. and McHale, I.G. (2019). A generalized non-linear forecasting model for limited overs international cricket. Int. J. Forecast. 35: 634–640, https://doi.org/10.1016/j.ijforecast.2018.12.003.Search in Google Scholar

Bailey, M. and Clarke, S.R. (2006). Predicting the match outcome in one day international cricket matches, while the game is in progress. J. Sports Sci. Med. 5: 480.Search in Google Scholar

BARC India. (2016). What you may not know about T20 cricket, Available at: https://www.barcindia.co.in/newsletter/what-you-may-not-know-about-t20-cricket.pdf.Search in Google Scholar

Bhardwaj, P. (2024). IPL complete dataset (2008-2024). Kaggle, Available at: https://www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020.Search in Google Scholar

Brooker, S. and Hogan, S. (2011). A method for inferring batting conditions in ODI cricket from historical data. In: Working paper. University of Canterbury-Department of Economics and Finance, New Zealand.Search in Google Scholar

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.Search in Google Scholar

Dash, M. and Liu, H. (1997). Feature selection for classifications. Intell. Data Anal. Int. J. 1: 131–156, https://doi.org/10.1016/s1088-467x(97)00008-5.Search in Google Scholar

Duckworth, F.C. and Lewis, A.J. (1998). A fair method for resetting the target in interrupted one-day cricket matches. J. Oper. Res. Soc. 49: 220–227, https://doi.org/10.1038/sj.jors.2600524.Search in Google Scholar

ESPN Cricinfo. (2024). ESPN Cricinfo, Available at: https://www.espncricinfo.com/cricketers.Search in Google Scholar

Gupta, A. and Muthiah, S.B. (2023). Learning cricket strokes from spatial and motion visual word sequences. Multimed. Tools Appl. 82: 1237–1259, https://doi.org/10.1007/s11042-022-13307-y.Search in Google Scholar

H2O.ai. (2024). Variable importance, H2O.ai Documentation, Available at: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html.Search in Google Scholar

Hall, M.A. (1999). Correlation-based feature selection for machine learning, Doctoral dissertation. The University of Waikato.Search in Google Scholar

He, X., Zhao, K., and Chu, X. (2021). AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212: 106622, https://doi.org/10.1016/j.knosys.2020.106622.Search in Google Scholar

IPL Official Site. (2024). IPL official site, Available at: https://www.iplt20.com/.Search in Google Scholar

Joshi, V. and Singh, A. (2024). SoccerRAG: a multimodal soccer analysis model integrating text, video, and audio. IEEE Trans. Sports Anal. 34: 89–101.Search in Google Scholar

Kanhaiya, K., Gupta, R., and Arpit, K.A.S.A. (2019). Cracked cricket pitch analysis (CCPA) using image processing and machine learning. Global J. Appl. Data Sci. Internet Things 3: 1–9.Search in Google Scholar

Kapadia, K., Abdel-Jaber, H., Thabtah, F., and Hadi, W. (2022). Sport analytics for cricket game results using machine learning: an experimental study. Appl. Comput. Inform. 18: 256–266, https://doi.org/10.1016/j.aci.2019.11.006.Search in Google Scholar

Kumar, C. and Balasubramanian, G. (2023). Comparative analysis of pitch ratings in all formats of cricket. Manag. Labour Stud. 48: 307–314, https://doi.org/10.1177/0258042x221148069.Search in Google Scholar

Liu, Y., Perez-Rosas, V., and Mihalcea, R. (2023). A survey on multimodal models for sports analytics. arXiv preprint arXiv:2406.12252v1.Search in Google Scholar

Lundberg, S.M. and Lee, S.I. (2017). A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst, 30.Search in Google Scholar

Maldives Cricket Association. (2018). MVP CricHQ explanation, Available at: https://www.maldivescricket.org/wp-content/uploads/2018/09/2018-MVP-CricHQ-Explanation.pdf.Search in Google Scholar

Moustakidis, S., Plakias, S., Kokkotis, C., Tsatalas, T., and Tsaopoulos, D. (2023). Predicting football team performance with explainable AI: leveraging SHAP to identify key team-level performance metrics. Future Internet 15: 174, https://doi.org/10.3390/fi15050174.Search in Google Scholar

Noel, J.T.P., Prado da Fonseca, V., and Soares, A. (2024). A comprehensive data pipeline for comparing the effects of momentum on sports leagues. Data 9: 29, https://doi.org/10.3390/data9020029.Search in Google Scholar

Prakash, C.D. and Verma, S. (2022). A new in-form and role-based deep player performance index for player evaluation in T20 cricket. Decis. Anal. J.: 100025, https://doi.org/10.1016/j.dajour.2022.100025.Search in Google Scholar

Sahoo, D.K., Deyar, D.U., and Gupta, D. (2024) Prediction of shot selection of a batsman in the death over of a T20 cricket match using machine learning. In: 2024 IEEE 9th international conference for convergence in technology (I2CT), pp. 1–7.10.1109/I2CT61223.2024.10543886Search in Google Scholar

Saikia, H., Bhattacharjee, D., and Lemmer, H. (2012). Predicting the performance of bowlers in IPL: an application of artificial neural networks. Int. J. Perform. Anal. Sport 12: 262–275.10.1080/24748668.2012.11868584Search in Google Scholar

Sarkhoosh, M.H., Gautam, S., Midoglu, C., Sabet, S.S., and Halvorsen, P. (2024) Multimodal AI-based summarization and storytelling for soccer on social media. In: Proceedings of the 15th ACM multimedia systems conference, pp. 485–491.10.1145/3625468.3652197Search in Google Scholar

Sharma, S. and Singla, D. (2021). Predicting outcome of Indian premier league (IPL) matches using machine learning. J. Emerging Technol. Innovative Res. 8: 710–715.Search in Google Scholar

Tonmoy, M.A.S., Dey, S.K., Islam, T., and Apu, J. (2023) A data-driven approach to predict scores in T20 cricket match using machine learning classifier. In: International conference on big data, IoT and machine learning. Springer Nature, Singapore, pp. 727–745.10.1007/978-981-99-8937-9_49Search in Google Scholar

Viswanadha, S., Sivalenka, K., Jhawar, M.G., and Pudi, V. (2021) Dynamic winner prediction in Twenty20 cricket: based on relative team strengths. In: Proceedings of the 2021 international conference on sports analytics and machine learning, pp. 1–9.Search in Google Scholar

Weatherbit API. (2024). Weatherbit API, Available at: https://www.weatherbit.io/.Search in Google Scholar

Weeraddana, N. and Premaratne, S. (2021). Unique approach for cricket match outcome prediction using XGBoost algorithms. J. Theor. Appl. Inf. Technol. 99: 2162–2171.Search in Google Scholar

Wickramasinghe, I. (2022). Applications of machine learning in cricket: a systematic review. Mach. Learn. Appl. 10: 100435, https://doi.org/10.1016/j.mlwa.2022.100435.Search in Google Scholar

Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., and Chen, E. (2023). A survey on multimodal large language models. arXiv preprint arXiv:2306.13549.10.1093/nsr/nwae403Search in Google Scholar PubMed PubMed Central

Received: 2024-09-10
Accepted: 2025-05-29
Published Online: 2025-06-11
Published in Print: 2025-09-25

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 19.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jqas-2025-0006/pdf
Scroll to top button