Abstract
We leverage Large Language Models (LLMs) to extract information from scouting report texts and improve predictions of National Hockey League (NHL) draft outcomes. In parallel, we derive statistical features based on a player’s on-ice performance leading up to the draft. These two datasets are then combined using ensemble machine learning models. We find that both on-ice statistics and scouting reports have predictive value, however combining them leads to the strongest results.
Acknowledgments
Thanks to Amanda Glazer for advice throughout this process, and thanks to Yayu Xu, Fresa Luo, Steve Cao, Kevin Lin, Arvind Kumar, Junsheng (Allen) Shi, Anisha Jahagirdar, Jack Han, and David Wang for helpful discussions and feedback.
-
Research ethics: Not applicable.
-
Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The author states no conflict of interest.
-
Research funding: None declared.
-
Data availability: Not applicable.
A. Appendix
A.1 Forward strengths/weaknesses
The following were the lists of strengths and weaknesses for players, after a human adjustment to the original classes generated by a LLM.
Forward Strengths
Skating: Strong skating ability with good speed, agility, and balance
Playmaking: Able to create scoring chances, make great passes, and has strong vision
Shooting: Impressive shot, quick release, and goal-scoring ability
Puckhandling: Quick hands and puckhandling ability to beat opponents easily
Hockey IQ: Has smart positioning, able to anticipate plays and make quick decisions on the ice
Competitiveness: Able to win battles, competitive nature, and strong work ethic
Physical Game: Strong and physical play on the ice
Size: Large player who uses it effectively on the ice
Versatility: Able to play a variety of roles and excel in all situations
Defensive Abilities: Responsible defensive player and able to disrupt opponent plays
Leadership: Good leadership qualities
Defenceman Strengths
Skating: Strong skating ability with good speed, agility, and balance
Defensive Abilities: Strong defensive play and able to disrupt opponent plays
Transition Game: Able to transition the puck up ice effectively, quickly, and cleanly
Physical Game: Strong and physical play on the ice
Size: Large player who uses it effectively on the ice
Competitiveness: Able to win battles, competitive nature, and strong work ethic
Hockey IQ: Has smart positioning, able to anticipate plays and make quick decisions on the ice
Poise and Patience: Poised under pressure and patient in making plays
Playmaking: Able to create scoring chances, make great passes, and has strong vision
Puckhandling: Quick hands and puckhandling ability to beat opponents easily
PowerPlay Quarterbacking: Able to quarterback the power play effectively
Leadership: Good leadership qualities
Forward Weaknesses
Skating: Concerns about speed, quickness, and stride technique
Offensive Ability: Questioned in terms of playmaking, finishing, and overall skill level
Hockey IQ: Poor decision-making, reads, and understanding of the game
Defensive Play: Concerns about consistency, defensive engagement, and battles
Consistency: Inconsistent effort and weak play away from the puck
Puck Management: Tendency to force plays, make risky decisions, and have issues with turnovers
Size: Undersized and lacks physicality
Physical Game: Lack of strength and physical play on the ice
Inexperience: Concerns about facing more experienced players at the next level
Injury History: Significant injury history that might impact his play on the ice in the future
Defenceman Weaknesses
Skating: Concerns about speed, quickness, and stride technique
Defensive Play: Issues with positioning, decision-making, and battles
Offensive Upside: Lack of creativity, puck skills, and scoring production
Size: Undersized and lacks physicality
Hockey IQ: Poor decision-making, reads, and understanding of the game
Consistency: Inconsistent effort and weak play away from the puck
Transition: Unable to move the puck up the ice
Puck Management: Tendency to force plays, make risky decisions, and have issues with turnovers
Physical Game: Lack of strength and physical play on the ice
Inexperience: Concerns about facing more experienced players at the next level
Injury History: Significant injury history that might impact his play on the ice in the future
A.2 LLM code: likelihood scores + generating player strengths/weaknesses
A.3 LLM code: generating topics
A.4 LLM code: classifying topics
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining.10.1145/3292500.3330701Search in Google Scholar
Berri, D.J., Brook, S.L., and Fenn, A.J. (2011). From college to the pros: predicting the Nba amateur player draft. J. Prod. Anal. 35: 25–35. https://doi.org/10.1007/s11123-010-0187-x.Search in Google Scholar
Chann, S. (2023). Non-determinism in gpt-4 is caused by sparse moe. https://152334h.github.io/blog/non-determinism-in-gpt-4/.Search in Google Scholar
Chen, B., Zhang, Z., Langrené, N., and Zhu, S. (2023). Unleashing the potential of prompt engineering in large language models: a comprehensive review. Arxiv, Guangdong, China.Search in Google Scholar
Deaner, R.O., Lowen, A., and Cobley, S. (2013). Historical perspectives and current directions in hockey analytics. PLoS ONE 8: 1–7, https://doi.org/10.1371/journal.pone.0057753.Search in Google Scholar PubMed PubMed Central
Desjardins, G. (2005). Projecting junior hockey players and translating performance to the nhl. Behind the net.Search in Google Scholar
Liu, Y., Schulte, O., and Li, C. (2019) Model trees for identifying exceptional players in the nhl and nba drafts. In: Machine learning and data mining for sports analytics. Springer International Publishing, pp. 93–105.10.1007/978-3-030-17274-9_8Search in Google Scholar
Lopez-Lira, A. and Tang, Y. (2023). Can chatgpt forecast stock price movements? Return predictability and large language models, https://arxiv.org/abs/2304.07619.10.2139/ssrn.4412788Search in Google Scholar
Luszczyszyn, D. (2023). Introducing the ‘new’ nhl stats fans should know: Offensive and defensive rating, The Athletic.Search in Google Scholar
Manning, C.D., Raghavan, P., and Schütze, H. (2008) Stemming and lemmatization. In: Introduction to information retrieval.10.1017/CBO9780511809071Search in Google Scholar
Nandakumar, N. and Jensen, S.T. (2018). Historical perspectives and current directions in hockey analytics. Annu. Rev. Stat. Appl. 6: 19–36. https://doi.org/10.1146/annurev-statistics-030718-105202.Search in Google Scholar
Schuckers, M. (2011a). An alternative to the nfl draft pick value chart based upon player performance. J. Quant. Anal. Sports 7: 10. https://doi.org/10.2202/1559-0410.1329.Search in Google Scholar
Schuckers, M. (2011b). What’s an nhl draft pick worth? A value pick chart for the national hockey league. St. Lawrence University, Canton, USA.Search in Google Scholar
Schuckers, M. (2016). Draft by numbers: Using data and analytics to improve national hockey league player selection. In: MIT sloan sports analytics conference.Search in Google Scholar
Seppa, T., Schuckers, M.E., and Rovito, M. (2017). Text mining of scouting reports as a novel data source for improving nhl draft analytics. In: Ottawa hockey analytics conference.Search in Google Scholar
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., and Christiano, P.F. (2020). Learning to summarize with human feedback. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (Eds.). Advances in neural information processing systems, Vol. 33. Curran Associates, Inc, pp. 3008–3021.Search in Google Scholar
Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., et al.. (2024). Video understanding with large language models: a survey. Arxiv, Rochester, USA.Search in Google Scholar
Tu, T., Loreaux, E., Chesley, E., Lelkes, A.D., Gamble, P., Bellaiche, M., Seneviratne, M., and Chen, M.-J. (2022). Automated loinc standardization using pre-trained large language models. In: Parziale, A., Agrawal, M., Joshi, S., Chen, I.Y., Tang, S., Oala, L., and Subbaswamy, A. (Eds.). Proceedings of the 2nd machine learning for health symposium, volume 193 of proceedings of machine learning research. PMLR, pp. 343–355.Search in Google Scholar
Turtoro, C. (2020). Network nhl equivalences (nnhle).Search in Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA.Search in Google Scholar
Wheeler, S. (2023a). 2023 nhl draft ranking, The Athletic.Search in Google Scholar
Wheeler, S. (2023b). What is the scouting process for nhl draft prospects? Everything you need to know in 2023, The Athletic.Search in Google Scholar
Wolfson, J., Addona, V., and Schmicker, R.H. (2011). The quarterback prediction problem: forecasting the performance of college quarterbacks selected in the nfl draft. J. Quant. Anal. Sports 7(3), https://doi.org/10.2202/1559-0410.1302.Search in Google Scholar
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- A comprehensive survey of the home advantage in American football
- Introducing Grid WAR: rethinking WAR for starting pitchers
- Improving NHL draft outcome predictions using scouting reports
- Comparison of individual playing styles in football
- A generative approach to frame-level multi-competitor races
- An empirical Bayes approach for estimating skill models for professional darts players
Articles in the same Issue
- Frontmatter
- Research Articles
- A comprehensive survey of the home advantage in American football
- Introducing Grid WAR: rethinking WAR for starting pitchers
- Improving NHL draft outcome predictions using scouting reports
- Comparison of individual playing styles in football
- A generative approach to frame-level multi-competitor races
- An empirical Bayes approach for estimating skill models for professional darts players