Predicting the NCAA basketball tournament using isotonic least squares pairwise comparison model

Ayala Neudorfer; Saharon Rosset

doi:10.1515/jqas-2018-0039

Article

Predicting the NCAA basketball tournament using isotonic least squares pairwise comparison model

Published/Copyright: September 28, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 14 Issue 4

Abstract

Each year, millions of people fill out a bracket to predict the outcome of the popular NCAA men’s college basketball tournament, known as March Madness. In this paper we present a new methodology for team ranking and use it to predict the NCAA basketball tournament. We evaluate our model in Kaggle’s March Machine Learning Mania competition, in which contestants were required to predict the results of all possible games in the tournament. Our model combines two methods: the least squares pairwise comparison model and isotonic regression. We use existing team rankings (such as seeds, Sagarin and Pomeroy ratings) and look for a monotonic, non-linear relationship between the ranks’ differences and the probability to win a game. We use the isotonic property to get new rankings that are consistent with both the observed outcomes of past tournaments and previous knowledge about the order of the teams. In the 2016 and 2017 competitions, submissions based on our methodology consistently placed in the top 5% out of over 800 other submissions. Using simulations, we show that the suggested model is usually better than commonly used linear and logistic models that use the same variables.

Keywords: least squares pairwise comparison; multivariate isotonic regression; ranking

Appendix A

An extension of the pairwise comparison model for the Bernoulli log loss function

Our discussion so far has focused on using the standard L₂ loss function for fitting the isotonic pairwise comparison model. This problem can be solved easily using quadratic programming. However, the natural choice for this problem would be the Bernoulli log likelihood loss function, which is in accordance with Kaggle’s loss function. Such a model can be viewed as an isotonic Bradley-Terry model (Bradley and Terry 1952) as follows:

minr−∑g=1n{yglog(11+e−Bgr)+(1−yg)log(11+eBgr)}Subjectto:r(xi)≥r(xj) ∀(i,j)∈ℐ

An advantage of using the Bernoulli log likelihood loss function is that the predictions are limited to [0, 1] and the probabilities can be obtained directly from the ranks estimators. The disadvantage is that this model takes into account only the win/loss data and ignores the win margin information.

A well-known result of Barlow and Brunk (1972) (Theorem 3.1) implies that, the solution of a whole variety of loss functions subject to isotonicity constraints can be obtained by solving standard (L₂) isotonic regression with a loss function: minz∑i=1n(gi−zi)2, as long as the loss function can be written as minimizing:

(1)minz∑i=1n(Φ(zi)−gizi) in z∈Rn

for some convex differentiable Φ and some data-dependent values g. Specifically, the solution to the isotonic regression subject to L₂ loss is identical to the solution of the isotonic regression subject to the Bernoulli log likelihood loss.

While this equivalence holds for regular isotonic regression, it no longer holds in the pairwise comparison model, where the loss function cannot be expressed as a generalized isotonic regression problem, as suggested by Barlow and Brunk (1). In the pairwise comparison model, the comparison between the two loss functions reduces to a comparison between a constrained linear regression and a constrained logistic regression, where the independent variables are composed of matrix B. It is well known that linear regression and logistic regression without the constraints are not equivalent. Therefore, the problems are also not equivalent with the constraints. It follows that solving the pairwise comparison problem with a Bernoulli log-likelihood loss cannot be obtained using quadratic programming and a simple transformation, and becomes a much more complex problem, which we do not pursue further.

References

Barlow, R. E. and H. D. Brunk. 1972. “The Isotonic Regression Problem and its Dual.” Journal of the American Statistical Association 67(337):140–147.10.1080/01621459.1972.10481216Search in Google Scholar

Barrow, Daniel, Ian Drayer, Peter Elliott, Garren Gaut, and Braxton Osting. 2013. “Ranking Rankings: an Empirical Comparison of the Predictive Power of Sports Ranking Methods.” Journal of Quantitative Analysis in Sports 9(2):187–202.10.1515/jqas-2013-0013Search in Google Scholar

Block, H., S. Qian, and A. Sampson. 1994. “Structure Algorithms for Partially Ordered Isotonic Regression.” Journal of Computational and Graphical Statistics 3(3):285–300.10.1080/10618600.1994.10474646Search in Google Scholar

Boulier, Bryan, and Herman Stekler. 1999. “Are Sports Seedings Good Predictors? An Evaluation.” International Journal of Forecasting 15(1):83–91.10.1016/S0169-2070(98)00067-3Search in Google Scholar

Bradley, Ralph Allan, and Milton E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39(3/4):324–345.10.1093/biomet/39.3-4.324Search in Google Scholar

Carlin, Bradley P. 1996. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50(1):39–43.10.1137/1.9780898718386.ch18Search in Google Scholar

Chartier, Timothy P., Erich Kreutzer, Amy N. Langville, and Kathryn E. Pedings. 2011. “Sports Ranking with Nonuniform Weighting.” Journal of Quantitative Analysis in Sports 7(3):Article 6.10.2202/1559-0410.1290Search in Google Scholar

Dykstra, Richard L. and Tim Robertson. 1982. “An Algorithm for Isotonic Regression for Two or more Independent Variables.” The Annals of Statistics 10(3):708–716.10.1214/aos/1176345866Search in Google Scholar

Gupta, Ajay Andrew. 2015. “A New Approach to Bracket Prediction in the NCAA Men’s Basketball Tournament Based on a Dual-Proportion Likelihood.” Journal of Quantitative Analysis in Sports 11(1):53–67.10.1515/jqas-2014-0047Search in Google Scholar

Harville, David. 1980. “Predictions for National Football League Games Via Linear-Model Methodology.” Journal of the American Statistical Association 75(371):516–524.10.1137/1.9780898718386.ch6Search in Google Scholar

Hoegh, Andrew, Marcos Carzolio, Ian Crandell, Xinran Hu, Lucas Roberts, Yuhyun Song, and Scotland C. Leman. 2015. “Nearest-Neighbor Matchup Effects: Accounting for Team Matchups for Predicting March Madness.” Journal of Quantitative Analysis in Sports 11(1):29–37.10.1515/jqas-2014-0054Search in Google Scholar

Jacobson, Sheldon H. and Douglas M. King. 2009. “Seeding in the NCAA Men’s Basketball Tournament: When is a Higher Seed Better?” Journal of Gambling Business and Economics 3(2):63–87.10.5750/jgbe.v3i2.546Search in Google Scholar

Lee, Chu In Charles. 1983. “The Min-Max Algorithm and Isotonic Regression.” The Annals of Statistics 11:467–477.10.1214/aos/1176346153Search in Google Scholar

Lopez, Michael J. and Gregory J. Matthews. 2015. “Building an NCAA Men’s Basketball Predictive Model and Quantifying its Success.” Journal of Quantitative Analysis in Sports 11(1):5–12.10.1515/jqas-2014-0058Search in Google Scholar

Luss, Ronny, Saharon Rosset, and Moni Shahar. 2012. “Eflcient Regularized Isotonic Regression with Application to Gene-Gene Interaction Search.” The Annals of Applied Statistics 6(1):253–283.10.1214/11-AOAS504Search in Google Scholar

Mair, Patrick, Kurt Hornik, and Jan de Leeuw. 2009. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods.” Journal of Statistical Software 32(5):1–24.10.18637/jss.v032.i05Search in Google Scholar

Massey, Kenneth. 1997. “Statistical Models Applied to the Rating of Sports Teams.” Master’s thesis, Bluefield College.Search in Google Scholar

Osting, Braxton, Christoph Brune, and Stanley Osher. 2013. “Enhanced Statistical Rankings Via Targeted Data Collection.” In International Conference on Machine Learning, Atlanta, GA, USA, 489–497.Search in Google Scholar

Ruiz, Francisco J. R. and Fernando Perez-Cruz. 2015. “A Generative Model for Predicting Out-Comes in College Basketball.” Journal of Quantitative Analysis in Sports 11(1):39–52.Search in Google Scholar

Schwertman, Neil C., Kathryn L. Schenk, and Brett C. Holbrook. 1996. “More Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 50(1):34–38.10.1080/00031305.1996.10473539Search in Google Scholar

Singh, Akshay Kumar, and Shubhabratha Das. 2014. “Rank Consistent Bradley-Terry Models for Repeated Tournaments.” Technical Report, IIM Bangalore Research Paper No. 466.10.2139/ssrn.2464964Search in Google Scholar

Stefani, Raymond T. 1977. “Football and Basketball Predictions Using Least Squares.” IEEE Transactions on systems, Man, and Cybernetics 7:117–121.10.1109/TSMC.1977.4309667Search in Google Scholar

Stern, Hal. 1991. “On the Probability of Winning a Football Game.” The American Statistician 45(3):179–183.10.1137/1.9780898718386.ch8Search in Google Scholar

Tiwisina, Johannes, and Philipp Külpmann. 2016. “Probabilistic Transitivity in Sports.” Technical report, Institute of Mathematical Economics Working Paper No. 520.Search in Google Scholar

Yuan, Lo-Hua, Anthony Liu, Alec Yeh, Aaron Kaufman, Andrew Reece, Peter Bull, Alex Franks, Sherrie Wang, Dmitri Illushin, and Luke Bornn. 2015. “A mixture-of-Modelers Approach to Forecasting NCAA Tournament Outcomes.” Journal of Quantitative Analysis in Sports 11(1):13–27.10.1515/jqas-2014-0056Search in Google Scholar

Published Online: 2018-09-28

Published in Print: 2018-11-27

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/jqas-2018-0039

Keywords for this article

least squares pairwise comparison; multivariate isotonic regression; ranking