Abstract
NBA team managers and owners try to acquire high-performing players. An important consideration in these decisions is how well the new players will perform in combination with their teammates. Our objective is to identify elite five-person lineups, which we define as those having a positive plus-minus per minute (PMM). Using individual player order statistics, our model can identify an elite lineup even if the five players in the lineup have never played together, which can inform player acquisition decisions, salary negotiations, and real-time coaching decisions. We combine seven classification tools into a unanimous consent classifier (all-or-nothing classifier, or ANC) in which a lineup is predicted to be elite only if all seven classifiers predict it to be elite. In this way, we achieve high positive predictive value (i.e., precision), the likelihood that a lineup classified as elite will indeed have a positive PMM. We train and test the model on individual player and lineup data from the 2017–18 season and use the model to predict the performance of lineups drawn from all 30 NBA teams’ 2018–19 regular season rosters. Although the ANC is conservative and misses some high-performing lineups, it achieves high precision and recommends positionally balanced lineups.
Funding source: National Science Foundation
Award Identifier / Grant number: DMS-1757952
Funding source: Harvey Mudd College
Acknowledgments
The authors thank Isys Johnson, Lucius Bynum, and Robert Gonzalez for their contributions to earlier phases of this work and to the code base, portions of which were adapted and used in this paper. Lastly, the authors thank the anonymous reviewers and editors whose feedback greatly improved the analysis.
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This material is based upon work supported by the National Science Foundation under Grant No. DMS-1757952. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. The authors would also like to acknowledge financial support from Harvey Mudd College.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Appendix A: Individual player statistics used as predictors in ANC
Individual player statistics used as predictors in ANC.
FGM | Field goals made per minute |
FGA | Field goals attempted per minute |
FGPCT | Field goal percentage |
FG3M | Three-point field goals made per minute |
FG3A | Three-point field goals attempted per minute |
FG3PCT | Three-point field goals percentage |
FTM | Free throws made per minute |
FTA | Free throws attempted per minute |
FTPCT | Free throw percentage |
OREB | Offensive rebounds per minute |
DREB | Defensive rebounds per minute |
AST | Assists per minute |
TOV | Turnovers per minute |
STL | Steals per minute |
BLK | Blocks per minute |
BLKA | Blocks attempted per minute |
PF | Personal fouls per minute |
PTS | Points earned per minute |
PFD | Personal fouls drawn per minute |
PMM | Plus-minus per minute |
CONTESTEDSHOTS | Shots contested per minute |
CONTESTEDSHOTS2PT | Two-point shots contested per minute |
CONTESTEDSHOTS3PT | Three-point shots contested per minute |
CHARGESDRAWN | Charges drawn per minute |
DEFLECTIONS | Passes deflected per minute |
LOOSEBALLSRECOVERED | Loose balls recovered per minute |
SCREENASSISTS | Screens that led to baskets per minute |
BOXOUTS | Box outs per minute |
Appendix B: Comparison of ANC to simpler model
One might also wonder whether the complete set of five-player order statistics is required by the ANC to achieve high precision. In this section, we analyze a simpler model that uses only the first order statistics (i.e., the lineup’s minimum) of each individual player metric used by the ANC.
We tune the simple model parameters as in Section 4.1, using ten-fold cross-validation. The parameter combination that lies on the efficient frontier of average precision and worst-case precision over the folds on the training data is given in Table 17. This combination achieved an average precision of 86.5 %, minimum precision of 57.1 % and average accuracy of 51.8 % on the training data. When the performance was insensitive to a parameter value, the value was chosen to match that used in the ANC.
Tuned parameter values used in simple model based on first order statistics.
Subclassifier | Parameter | Chosen value |
---|---|---|
Decision tree | cp (cost complexity) | −1 |
loss (misclassification penalty) | 1 | |
Random forest | c (cutoff) | 0.7 |
ntree (number of trees) | 100 | |
Boosting | mfinal (number of trees) | 500 |
maxdepth (depth of each tree) | 3 | |
cp (cost complexity) | 0.01 | |
Support vector machine | cost (misclassification penalty) | 0.1 |
gamma (influence decay) | 0.01 | |
K-nearest neighbors | k (number of neighbors) | 5 |
Logistic regression | thresh (1 − probability threshold) | 0.25 |
All-or-nothing classifier (ANC) | numVotes (agreement required) | 7 |
Having tuned the parameters, we fit the first order statistic model to the full, standardized, training set, as described earlier, and apply the trained model to the testing data. The confusion matrix is given in Table 18.
Confusion matrix for the simple model based on first order statistics applied to the test data set. Of the 12 lineups predicted to be elite, nine have a true label of elite, corresponding to a precision of 75.0 %.
Predicted class | |||
---|---|---|---|
Elite | Not elite | ||
True class | Elite | 9 | 86 |
Not elite | 3 | 78 |
Of twelve lineups predicted to be elite, nine of these have a true label of elite, indicating a strictly positive PMM. The simpler model achieves a testing precision of only 75 % compared to the ANC’s testing precision of 86.7 %.
Appendix C: Actual lineup performance for LAL and GSW case study
Actual lineup performance compared to ANC predictions for the Los Angeles Lakers during the 2018–19 season, for all lineups having at least 25 min of playing time. ‘−’ denotes lineups for which no ANC prediction is given.
Los Angeles Lakers | |||
---|---|---|---|
Lineup | Minutes played | Actual PMM | ANC prediction |
R. Rondo, K. Caldwell-Pope, B. Ingram, I. Zubaca, J. Hart | 25 | 0.68 | − |
L. James, B. Ingram, I. Zubac, L. Ball, K. Kuzma | 55 | 0.36 | − |
T. Chandler, L. James, K. Caldwell-Pope, L. Ball, K. Kuzma | 39 | 0.36 | Not elite |
L. James, J. McGee, L. Ball, K. Kuzma, J. Hart | 133 | 0.31 | Not elite |
L. James, R. Rondo, J. McGee, K. Caldwell-Pope, K. Kuzma | 47 | 0.23 | Not elite |
T. Chandler, L. Stephenson, K. Caldwell-Pope, B. Ingram, J. Hart | 37 | 0.21 | Not elite |
T. Chandler, B. Ingram, L. Ball, K. Kuzma, J. Hart | 36 | 0.14 | Not elite |
T. Chandler, L. James, B. Ingram, L. Ball, K. Kuzma | 61 | 0.13 | Not elite |
L. James, R. Rondo, J. McGee, K. Caldwell-Pope, B. Ingram | 31 | 0.13 | Not elite |
B. Ingram, I. Zubac, L. Ball, K. Kuzma, J. Hart | 39 | 0.13 | − |
L. James, J. McGee, K. Caldwell-Pope, L. Ball, K. Kuzma | 34 | 0.12 | Not elite |
L. James, J. McGee, R. Bullock, B. Ingram, K. Kuzma | 73 | 0.11 | Not elite |
T. Chandler, K. Caldwell-Pope, B. Ingram, K. Kuzma, J. Hart | 31 | 0.10 | Not elite |
T. Chandler, K. Caldwell-Pope, B. Ingram, L. Ball, K. Kuzma | 45 | 0.04 | Not elite |
T. Chandler, L. James, L. Ball, K. Kuzma, J. Hart | 66 | 0.02 | Not elite |
L. James, R. Rondo, J. McGee, B. Ingram, K. Kuzma | 43 | 0.00 | Not elite |
L. James, J. McGee, B. Ingram, L. Ball, K. Kuzma | 234 | 0.00 | Not elite |
L. James, R. Rondo, R. Bullock, B. Ingram, K. Kuzma | 62 | −0.05 | Not elite |
J. McGee, K. Caldwell-Pope, M. Muscala, A. Caruso, J. Jonesb | 31 | −0.06 | − |
L. James, K. Caldwell-Pope, L. Ball, K. Kuzma, J. Hart | 25 | −0.16 | Not elite |
L. James, R. Rondo, J. McGee, R. Bullock, K. Kuzma | 62 | −0.21 | Not elite |
R. Rondo, K. Caldwell-Pope, B. Ingram, I. Zubac, K. Kuzma | 29 | −0.24 | − |
L. James, L. Stephenson, L. Ball, K. Kuzma, J. Hart | 31 | −0.25 | Not elite |
R. Rondo, M. Beasley, K. Caldwell-Pope, B. Ingram, I. Zubac | 25 | −0.28 | − |
J. McGee, K. Caldwell-Pope, B. Ingram, L. Ball, J. Hart | 25 | −0.32 | Not elite |
L. James, R. Rondo, B. Ingram, I. Zubac, K. Kuzma | 33 | −0.43 | Not elite |
J. McGee, B. Ingram, L. Ball, K. Kuzma, J. Hart | 83 | −0.47 | Not elite |
R. Rondo, J. McGee, K. Caldwell-Pope, A. Caruso, M. Wagnerc | 27 | −1.31 | − |
-
aIvica Zubac was traded to the Los Angeles Clippers and was not included in ANC predictions for the Lakers. bJemerrio Jones did not have data from the 2017–18 NBA regular season. cMoritz Wagner did not have data from the 2017–18 NBA regular season.
Actual lineup performance compared to ANC predictions for the Golden State Warriors during the 2018–19 season, for all lineups having at least 25 min of playing time.‘−’ denotes lineups for which no ANC prediction is given.
Golden State Warriors | |||
---|---|---|---|
Lineup | Minutes played | Actual PMM | ANC prediction |
A. McKinnie, D. Green, K. Looney, S. Livingston, S. Curry, | 28 | 1.01 | Not elite |
A. Iguodala, D. Green, K. Durant, K. Looney, S. Curry | 25 | 0.80 | Elite |
A. Iguodala, D. Cousins, D. Green, K. Thompson, S. Curry | 29 | 0.77 | Elite |
A. Iguodala, J. Bell, K. Durant, K. Thompson, S. Curry | 36 | 0.73 | Elite |
A. Iguodala, D. Green, K. Durant, K. Thompson, S. Curry | 178 | 0.69 | Elite |
A. Iguodala, K. Durant, K. Looney, K. Thompson, Q. Cook | 35 | 0.63 | Not elite |
A. McKinnie, J. Jerebko, K. Durant, K. Looney, S. Curry | 26 | 0.61 | Not elite |
D. Green, K. Durant, K. Looney, K. Thompson, S. Curry | 313 | 0.39 | Elite |
D. Cousins, D. Green, K. Durant, K. Thompson, S. Curry | 268 | 0.29 | Elite |
A. Bogut, D. Green, K. Durant, K. Thompson, S. Curry | 83 | 0.27 | Not elite |
A. Iguodala, D. Cousins, D. Green, K. Thompson, S. Livingston | 67 | 0.24 | Not elite |
D. Jones, J. Jerebko, K. Durant, K. Thompson, Q. Cook | 29 | 0.20 | Not elite |
A. McKinnie, A. Iguodala, K. Durant, K. Looney, S. Curry | 48 | 0.19 | Not elite |
A. Iguodala, K. Durant, K. Looney, K. Thompson, S. Curry | 141 | 0.17 | Elite |
A. Iguodala, J. Jerebko, K. Durant, K. Looney, K. Thompson | 47 | 0.13 | Not elite |
A. McKinnie, A. Iguodala, J. Jerebko, K. Looney, S. Curry | 27 | 0.11 | Not elite |
D. Jones, D. Green, K. Durant, K. Thompson, S. Curry | 142 | 0.11 | Not elite |
A. Iguodala, D. Green, J. Jerebko, S. Livingston, S. Curry | 54 | 0.07 | Not elite |
A. Iguodala, D. Cousins, K. Thompson, Q. Cook, S. Livingston | 39 | 0.05 | Not elite |
A. Iguodala, D. Green, J. Jerebko, K. Thompson, S. Livingston | 26 | 0.00 | Not elite |
D. Lee, J. Jerebko, K. Looney, K. Thompson, S. Livingston | 30 | 0.00 | Not elite |
D. Green, J. Jerebko, K. Durant, K. Thompson, S. Curry | 45 | −0.22 | Elite |
A. Iguodala, D. Jones, K. Durant, K. Thompson, Q. Cook | 77 | −0.33 | Not elite |
D. Green, J. Bell, K. Durant, K. Thompson, S. Curry | 26 | −0.38 | Elite |
A. McKinnie, D. Cousins, D. Green, K. Durant, S. Curry | 32 | −0.56 | Not elite |
A. McKinnie, J. Evansa, J. Jerebko, J. Bell, Q. Cook | 37 | −0.57 | − |
-
aJacob Evans did not have data from the 2017–18 NBA regular season.
References
Basketball-Reference.com. 2021a. 2017–18 Boston Celtics Roster and Stats. https://www.basketball-reference.com/teams/BOS/2018.html (accessed June 7, 2021).Search in Google Scholar
Basketball-Reference.com. 2021b. 2017–18 Houston Rockets Roster and Stats. https://www.basketball-reference.com/teams/HOU/2018.html (accessed June 7, 2021).Search in Google Scholar
Basketball-Reference.com. 2022a. 2017–18 NBA Player Stats: Per Game. https://www.basketball-reference.com/leagues/NBA_2018_per_game.html (accessed April 7, 2022).Search in Google Scholar
Basketball-Reference.com. 2022b. 2018–19 NBA Player Stats: Per Game. https://www.basketball-reference.com/leagues/NBA_2019_per_game.html (accessed May 10, 2022).Search in Google Scholar
Bendl, J., J. Stourac, O. Salanda, A. Pavelka, E. Wieben, J. Zendulka, J. Brezovsky, and J. Damborsky. 2014. “PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations.” PLoS Computational Biology 10 (1): e1003440, https://doi.org/10.1371/journal.pcbi.1003440.Search in Google Scholar PubMed PubMed Central
Bynum, L. E. J. 2018. “Modeling Subset Behavior: Prescriptive Analytics for Professional Basketball Data.” Senior thesis. Claremont: Harvey Mudd College.Search in Google Scholar
Cheng, G., Z. Zhang, M. Kyebambe, and K. Nasser. 2016. “Predicting the Outcome of NBA Playoffs Based on the Maximum Entropy Principle.” Entropy 18: 450. https://doi.org/10.3390/e18120450.Search in Google Scholar
Clemente, F., F. Martins, D. Kalamaras, and R. Mendes. 2015. “Network Analysis in Basketball: Inspecting the Prominent Players Using Centrality Metrics.” Journal of Physical Education and Sport 15: 212–7.10.1080/24748668.2015.11868825Search in Google Scholar
Deshpande, S. K., and S. T. Jensen. 2016. “Estimating an NBA Player’s Impact on His Team’s Chances of Winning.” Journal of Quantitative Analysis in Sports 12 (2): 51–72. https://doi.org/10.1515/jqas-2015-0027.Search in Google Scholar
Ghimire, S., J. Ehrlich, and S. Sanders. 2020. “Measuring Individual Worker Output in a Complementary Team Setting: Does Regularized Adjusted Plus Minus Isolate Individual NBA Player Contributions?” PLoS One 15 (8): e0237920. https://doi.org/10.1371/journal.pone.0237920.Search in Google Scholar PubMed PubMed Central
Glickman, M., and J. Sonas. 2015. “Introduction to the NCAA Men’s Basketball Prediction Methods Issue.” Journal of Quantitative Analysis in Sports 11: 1–3. https://doi.org/10.1515/jqas-2015-0013.Search in Google Scholar
Gumm, J., G. Hu, and A. Barrett. 2015. “A Machine Learning Strategy for Predicting March Madness Winners.” In Proc. of the 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 1–6. IEEE.10.1109/SNPD.2015.7176206Search in Google Scholar
Hua, S. 2015. “Comparing Several Modeling Methods on NCAA March Madness.” PhD diss., North Dakota State University.Search in Google Scholar
Kalman, S., and J. Bosch. 2020. “NBA Lineup Analysis on Clustered Player Tendencies: A New Approach to the Positions of Basketball and Modeling Lineup Efficiency of Soft Lineup Aggregates.” In Proc. of the 14th MIT Sloan Sports Analytics Conference, 42. Boston: Analytics.Search in Google Scholar
Kubatko, J., D. Oliver, K. Pelton, and D. Rosebaum. 2015. “A Starting Point for Analyzing Basketball Statistics.” Journal of Quantitative Analysis in Sports 3 (3): 1–12, https://doi.org/10.2202/1559-0410.1070.Search in Google Scholar
Lin, R. 2017. “Mason: Real-Time NBA Matches Outcome Prediction.” PhD diss., Arizona State University.Search in Google Scholar
Loeffelholz, B., E. Bednar, and K. Bauer. 2009. “Predicting NBA Games Using Neural Networks.” Journal of Quantitative Analysis in Sports 5 (1): 1–15. https://doi.org/10.2202/1559-0410.1156.Search in Google Scholar
Maymin, A., P. Maymin, and E. Shen. 2013. “NBA Chemistry: Positive and Negative Synergies in Basketball.” International Journal of Computer Science in Sport 12 (2): 4–23.Search in Google Scholar
McMahon, I. 2018. How (and why) Position-Less Lineups Have Taken Over the NBA Playoffs. The Guardian. https://www.theguardian.com/sport/blog/2018/may/01/how-and-why-position-less-lineups-have-taken-over-the-nba-playoffs (accessed May 11, 2022).Search in Google Scholar
NBA.com. 2018–19a. NBA Advanced Stats: Stats Home/Lineups/Traditional. https://www.nba.com/stats/lineups/traditional/?Season=2018-19&SeasonType=Regular%20Season&sort=MIN&dir=1&PerMode=Totals (accessed May 20, 2021).Search in Google Scholar
NBA.com. 2018–19b. NBA Advanced Stats: Stats Home/Teams/Advanced. https://www.nba. com/stats/teams/advanced/?sort=W&dir=-1&Season=2018-19&SeasonType=Regular 20Season (accessed May 9, 2022).Search in Google Scholar
Oh, M., S. Keshri, and G. Iyengar. 2015. “Graphical Models for Basketball Match Simulation.” In Proc. of the 2015 MIT Sloan Sports Analytics Conference, vol. 2728.Search in Google Scholar
Özmen, 2016 Özmen, M. U. 2016. “Marginal Contribution of Game Statistics to Probability of Winning at Different Levels of Competition in Basketball: Evidence from the Euroleague.” International Journal of Sports Science & Coaching 11: 98–107. https://doi.org/10.1177/1747954115624828.Search in Google Scholar
Pelechrinis, K. 2019. “LinNet: Probabilistic Lineup Evaluation through Network Embedding.” In Machine Learning and Knowledge Discovery in Databases, edited by U. Brefeld, E. Curry, E. Daly, B. MacNamee, A. Marascu, F. Pinelli, M. Berlingerio, and N. Hurley, 20–36. Cham: Springer International Publishing.Search in Google Scholar
Ribeiro, J., P. Silva, R. Duarte, K. Davids, and J. Garganta. 2017. “Team Sports Performance Analysed through the Lens of Social Network Theory: Implications for Research and Practice.” Sports Medicine 47: 1–8. https://doi.org/10.1007/s40279-017-0695-1.Search in Google Scholar PubMed
Robertson, M. 2017. “An Analysis of NBA Spatio-Temporal Data.” MS diss., Duke University.Search in Google Scholar
Ruiz, F. J., and F. Perez-Cruz. 2015. “A Generative Model for Predicting Outcomes in College Basketball.” Journal of Quantitative Analysis in Sports 11 (1): 39–52. https://doi.org/10.1515/jqas-2014-0055.Search in Google Scholar
Shen, G., D. Gao, Q. Wen, and R. Magel. 2016. “Predicting Results of March Madness Using Three Different Methods.” Journal of Sports Research 3: 10–7. https://doi.org/10.18488/journal.90/2016.3.1/90.1.10.17.Search in Google Scholar
Sisneros, R., and M. Van Moer. 2013. “Expanding Plus-Minus for Visual and Statistical Analysis of NBA Box-Score Data.” In 1st IEEE Workshop on Sports Data Visualization.Search in Google Scholar
Vaz de Melo, P., V. Almeida, A. Loureiro, and C. Faloutsos. 2012. “Forecasting in the NBA and Other Team Sports: Network Effects in Action.” ACM Transactions on Knowledge Discovery from Data 6: 13. https://doi.org/10.1145/2362383.2362387.Search in Google Scholar
Wäsche, H., G. Dickson, A. Woll, and U. Brandes. 2017. “Social Network Analysis in Sport Research: An Emerging Paradigm.” European Journal for Sport and Society 14: 1–28. https://doi.org/10.1080/16138171.2017.1318198.Search in Google Scholar
Wikipedia. 2021. 2018–19 Milwaukee Bucks Season. https://en.wikipedia..org/wiki/2018.Search in Google Scholar
Wikipedia. 2022. Mike Budenholzer. https://en.wikipedia.org/wiki/Mike_Budenholzer (accessed May 4, 2022).Search in Google Scholar
Winston, W. L. 2012. Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football. Princeton, NJ: Princeton University Press.10.1515/9781400842070Search in Google Scholar
Zimmermann, A., S. Moorthy, and Z. Shi. 2013. “Predicting NCAAB Match Outcomes Using ML Techniques - Some Results and Lessons Learned.” In MLSA@PKDD/ECML.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Predicting elite NBA lineups using individual player order statistics
- Modern and post-modern portfolio theory as applied to moneyline betting
- Pitching strategy evaluation via stratified analysis using propensity score
- Clustering of football players based on performance data and aggregated clustering validity indexes
- Testing styles of play using triad census distribution: an application to men’s football
Articles in the same Issue
- Frontmatter
- Research Articles
- Predicting elite NBA lineups using individual player order statistics
- Modern and post-modern portfolio theory as applied to moneyline betting
- Pitching strategy evaluation via stratified analysis using propensity score
- Clustering of football players based on performance data and aggregated clustering validity indexes
- Testing styles of play using triad census distribution: an application to men’s football