A new approach to bracket prediction in the NCAA Men’s Basketball Tournament based on a dual-proportion likelihood

Ajay Andrew Gupta

doi:10.1515/jqas-2014-0047

Article

A new approach to bracket prediction in the NCAA Men’s Basketball Tournament based on a dual-proportion likelihood

Ajay Andrew Gupta

Published/Copyright: October 11, 2014

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 11 Issue 1

Abstract

The widespread proliferation of and interest in bracket pools that accompany the National Collegiate Athletic Association Division I Men’s Basketball Tournament have created a need to produce a set of predicted winners for each tournament game by people without expert knowledge of college basketball. Previous research has addressed bracket prediction to some degree, but not nearly on the level of the popular interest in the topic. This paper reviews relevant previous research, and then introduces a rating system for teams using game data from that season prior to the tournament. The ratings from this system are used within a novel, four-predictor probability model to produce sets of bracket predictions for each tournament from 2009 to 2014. This dual-proportion probability model is built around the constraint of two teams with a combined 100% probability of winning a given game. This paper also performs Monte Carlo simulation to investigate whether modifications are necessary from an expected value-based prediction system such as the one introduced in the paper, in order to have the maximum bracket score within a defined group. The findings are that selecting one high-probability “upset” team for one to three late rounds games is likely to outperform other strategies, including one with no modifications to the expected value, as long as the upset choice overlaps a large minority of competing brackets while leaving the bracket some distinguishing characteristics in late rounds.

Keywords: brackets; college basketball; maximum likelihood; statistics

Corresponding author: Ajay Andrew Gupta, Statistics, The Florida State University, 117 N. Woodward Ave. P.O. Box 3064330, Tallahassee, FL 32306, USA, e-mail: ajgupta@stat.fsu.edu

Appendix A: Implementation of likelihood maximization for ratings

In Section 5, Eqn. 7 showed a log-likelihood function ℓ_R(Ω∣D) that could be optimized to produce regular-season team ratings for use in tournament prediction. In this, D was a general reference to the data and Ω was a set of the parameters. There were strength parameters s(t) for team numbers t, one home-court strength parameter s_HC, and a scaling parameter β which converted strength differences for use in the big-win term of the log-likelihood function.

Multiple optimization methods are possible, but I ran 500 cycles of optimizing one parameter at a time while holding all others constant. The order I used set the non-Division I rating, then the remaining ratings in alphabetical order, followed by s_HC and β. I initialized the parameters s_HC and β to 0 and 1, respectively. I then initialized each team strength using Eqn. 11 and constrained the values to be from –6 to 6. Below, π^(t) is team t’s proportion of games won.

(11)

s0(t)={ln2π^(t),π^(t)≤0.5−ln(2[1−π^(t)]),π^(t)>0.5 (11)

For each parameter in each cycle, I ran a Newton-Raphson algorithm to find the spot where each partial derivative of the log-likelihood was equal to zero. This is a maximum, because the log-likelihood is convex, as shown in Eqns. 13, 15, and 17. I use the following first and second derivatives of the log-likelihood for β, which are calculated across the set of games G_BW={g: b(g)=1}.

(12)

∂ℓR∂β=2∑g∈BWc(g)d(g)[11−0.5e−βd(g)1(w(g)=h(g))−1] (12)

(13)

∂2ℓR∂β2=−∑g∈BWc(g)d(g)2e−βd(g)[1−0.5e−βd(g)]21(w(g)=h(g)) (13)

The remaining log-likelihood derivatives needed are in Eqns. 14–17. They use the functions η(t, g), which is 1 if h(g)=t and –1 if l(g)=t, and ν(g), which is 1 if game g is on h(g)’s home court, –1 if it is on l(g)’s home court, and 0 if it is on a neutral court. The number of games in a season is G, and the sets G(t) and G_HC are all of team t’s games and all games not played on a neutral court, respectively.

(14)

∂ℓR∂s(t)=∑g∈(t)c(g)η(t, g)[(0.5e−d(g)1−0.5e−d(g)+βb(g)e−βd(g)1−0.5e−βd(g))1(w(g)=h(g))−(1+2βb(g))1(w(g)=l(g))] (14)

(15)

∂2ℓR∂s(t)2=−∑g∈(t)c(g)[0.5e−d(g)(1−0.5e−d(g))2+β2b(g)e−βd(g)(1−0.5e−βd(g))2]1(w(g)=h(g)) (15)

(16)

∂ℓR∂sHC=∑g=1Gc(g)ν(g)[(0.5e−d(g)1−0.5e−d(g)+βb(g)e−βd(g)1−0.5e−βd(g))1(w(g)=h(g))−(1+2βb(g))1(w(g)=l(g))] (16)

(17)

∂2ℓR∂sHC2=−∑g∈HCc(g)[0.5e−d(g)(1−0.5e−d(g))2+β2b(g)e−βd(g)(1−0.5e−βd(g))2]1(w(g)=h(g)) (17)

Eqn. 18 shows how the Newton-Raphson algorithm updates an estimate θ_i for parameter θ at Newton-Raphson iteration i.

(18)

θi=θi−1−∂ℓR∂θ|θi−1∂2ℓR∂θ2|θi−1 (18)

The advantage of the Newton-Raphson algorithm is speed. Testing revealed that it was safe to assume convergence after only six iterations, so the run-time was tolerable despite this running for each parameter in a procedure with 500 cycles of 347–354 parameters per cycle. I implemented the algorithm in C++ and ran it on a laptop with a dual-core 2.10 GHz AMD A6 processor and 4 GB of RAM. The 2008–2009 through 2013–2014 seasons ran with a mean of 5 min and 35.5 s of run-time, with a maximum of 6 min (2013–2014).

The disadvantage is that Newton-Raphson only converges to the appropriate position if the derivative of the function for the root-finding (in this case, the log-likelihood derivative) and its derivative are both continuous. For β, this fits because different values of β do not change the indicator function. The s(t) and s_HC parameters, however, can cause changes in which team is the higher-strength team and which is the lower strength team, producing discontinuities.

The Newton-Raphson algorithm can still be used, if within a neighborhood where the function for root-finding and its derivative are well-defined. In this application, a game is in this neighborhood the vast majority of the time. The initialization establishes a proper neighborhood for most cases, and continued iterations should move teams away from the discontinuity more often than toward the discontinuity. My hypothesis was that strengths which approach the discontinuity would end up somewhere close to the correct point, and that the next cycle could start in a continuous neighborhood and fix the estimate. If a strength diverged, it would eventually be stopped by the limits of –6 and 6, and could be fixed during the next few cycles. The log-likelihood maximization will not find the global maximum, but it already was limited because I optimize by each of more than 300 parameters individually. Bracket predictions are based on the relationships for pairs of teams, though, so the ratings can still be useful if they converge to areas where the tournament teams’ ratings are appropriate relative to each other.

The algorithm converges in log-likelihood. A prototypical pattern was a quick spike, followed by a quick drop and then a climb to a plateau. The algorithm generally found some sort of convergence after 30 cycles, but sometimes had a climb within the plateau that made 500 useful. I used the ending values rather than those with the highest likelihood observed, because they are more stable.

Appendix B: Implementation of regression for tournament strengths

In Section 6, Eqn. 8 shows an equation to produce NCAA tournament strengths s_T(t) for teams t. These are meant to be used within the dual-proportion model from Eqn. 3. Eqn. 8 is a linear combination, but least-squares cannot be used because s_T(t) are used within the non-linear log-likelihood function from Eqn. 3. I could not use the Newton-Raphson algorithm from Appendix A for the β_n. The speed also was not necessary because I always optimized fewer than ten parameters. The discontinuity problems in the log-likelihood function were more severe for these regression coefficients than for the strength parameters. Changes to strengths have a nearly linear effect on log-likelihood in Eqn. 7. By contrast, the regression coefficients β_n are multiplied, amplifying any inaccuracy. Instead of using Newton-Raphson, I initialized each β_n to zero, then optimized the β_n with a brute-force search followed by a modified bisection algorithm. I used 100 cycles of the brute-force search, and 100 cycles of the modified bisection algorithm with 10 bisections per parameter per cycle. In each cycle, I optimized one β_n at a time.

For the brute-force search, I determined a maximum and minimum allowable value of β_n to keep β_nx_n from –9 to 9 for all x_n in the data. The choice of –9 and 9 was based on the range of –6 to 6 for regular-season ratings. This way, one team could have a perfect 6 rating and I could still test whether amplifying the predictor was useful. Within these extreme values of each β_n, there were 99 intermediate points creating 100 equally-spaced steps from the minimum to the maximum allowable β_n. I tested the likelihood for each of these 101 values for the coefficient, holding the other coefficients constant, and selected the value with the highest likelihood. The brute-force search tested the same potential values of a coefficient in each cycle, but multiple cycles are necessary to eliminate the effect of the order of predictors optimized.

Generally speaking, the bisection method is a root-finding method like Newton-Raphson is. Its most common application is for the derivative of a log-likelihood function where the log-likelihood function ℓ_T is convex, which is the case here, as shown in Eqn. 19. In these cases, the root for the log-likelihood derivative is the maximum likelihood estimate. For remaining discussion of the bisection method, I will only address the application to a log-likelihood derivative because more general applications are not relevant to this paper. Below, {g} are games numbered from 1 to G. The difference between strength parameters for teams in game g is d(g). The functions h(g), l(g), and w(g) yield the higher-strength team, lower-strength team, and winner of game g, respectively.

(19)

∂2ℓT∂βn2=−∑g=1G0.5e−d(g)[xn(h(g))−xn(l(g))]2(1−0.5e−d(g))21(w(g)=h(g)) (19)

The bisection method starts with an interval where at the start, the log-likelihood derivative is positive, and at the end, it is negative. If continuous, the log-likelihood derivative must pass through zero somewhere in the interval. One divides the full interval in two, and evaluates the log-likelihood derivative at the dividing point between the intervals. If the value is still positive, the maximum likelihood estimate must be in the upper interval. If the value has become negative, the maximum likelihood estimate must be in the lower interval. Either the upper or lower interval becomes the new interval which gets divided in two, and the procedure repeats.

The bisection method assumes a continuous function over the interval, and I achieved this by using small intervals where the brute-force search had placed the interval center in a neighborhood with a continuous log-likelihood derivative. As an added precaution, I replaced the derivative of the log-likelihood at the dividing point with the slope passing through the midpoints of the two subintervals. Because a decrease in log-likelihood from the lower midpoint to the upper midpoint implies a negative slope and an increase implies a positive slope, I forwent the slope calculation and instead compared the two log-likelihoods directly.

The bisection method can only adjust a coefficient slowly because of the small starting interval. In practice, the brute-force search alleviates the need for large adjustments. In every run I checked, the bisection method never moved any coefficients outside of the first bisection cycle’s starting interval. The method always converged almost immediately. The dual-proportion model using Eqn. 10 reaches log-likelihood of –212.897 after four cycles, and finishes at –212.896 after 200 cycles. I used the laptop described in Appendix A to test the run-time for the dual-proportion model using Eqn. 10. The run-time for the combined six seasons was 16 s for finding the regression coefficients. After that, calculating the bracket picks and bracket points took less than a second. This used the simplified bracket pick methodology from Section 6.

Appendix C: Implementation of reference models

I needed to replicate the predictors from previous researchers’ models for my model comparison in Section 7. Kaplan and Garstka’s Las Vegas odds model (labeled Vegas Odds in Tables 1 and 2) required historical point spreads and totals that were not available through their original source. I was forced to use multiple sources. For the 2013 and 2014 tournaments, I averaged the numbers from the Las Vegas Hotel & Casino and the Mirage Hotel & Casino (DonBest 2014). For the 2010, 2011, and 2012 tournaments, I combined the information from many online casinos (OddsPortal 2014). For the 2009 tournament, I relied on information released by SportsBook.Com (1800-Sports 2014). Except for 2009, these numbers were from slightly before the Round of 64 began, so they should be even more market-based than those used by Carlin, Kaplan, and Garstka.

As noted in Section 2.2, Kaplan and Garstka had calculated probabilities of winning for their Las Vegas Odds model using Gaussian CDFs based on parameters λ_i for teams i. The same is true for their Sagarin rating model, which is labeled Sagarin in Tables 1 and 2. These λ_i were expected points for a team, so pre-tournament Sagarin ratings are used directly (West 2014). The Las Vegas Odds model deduces them, starting with half the point total from the team’s Round of 64 game. It adjusts by half the point spread, adding this for the predicted winner and subtracting it for the predicted loser (Kaplan and Garstka 2001). Kaplan and Garstka also had a Bradley-Terry model (labeled Bradley-Terry in Tables 1 and 2) fit to all regular-season and conference tournament games in which an NCAA Tournament Round of 64 team or NIT team played against a Division I opponent. In this, 97 teams i had strength parameters s_i: 64 NCAA Tournament teams, 32 NIT teams, and a combined team for other opponents. The probabilities p_i,j that team i defeats team j follow Eqn. 20.

(20)

pi, j=sisi+sj, ∑isi=1,si≥0 ∀i (20)

For their three methods, Kaplan and Garstka selected predicted winners in the latest rounds first and the earliest rounds last. If a team is selected for a later round, it is selected for all earlier rounds as specified by an algorithm they labeled UNPACK. If a game did not have a predicted winner, they claimed to select the team which was the argument of the maximum for the expected points in the subsection of the tournament ending with that game. Context suggests that they meant the team with the highest total expected points for the subtournament, if picked to win all games in that subtournament. This is not the argument of the maximum, though, because selecting one team can affect another team’s contributions to the expected points for the combined teams.

The model from West (2006) is labeled West in Tables 1 and 2. Rather than recalculate the probabilities, I used West’s own work because it follows his intended implementation and incorporates his changes to included predictors. West has performed unpublished work over the years using ordinal logistic regression as in his published model. Beginning with the 2009 tournament, though, he changed predictors from those listed in Section 2. For the 2009 tournament, he added squared terms for each of the original four predictors. Only the squared term for the cumulative points scored minus points allowed was retained for subsequent tournaments. For 2010 and later tournaments, West removed the number of wins over top-30 Sagarin-rating teams as a predictor. In place of that, he added the number of assists per game and the ratio of assists to turnovers.

West provided the model’s probabilities, as well as ratings, in a spreadsheet on his web site (West 2014). The regression coefficients are calibrated from previous years’ data. So, this model uses the same six validation sets, but validation in time instead of six-fold cross-validation. I use the probabilities West released rather than the ratings, because the probabilities better suit the purpose of making bracket predictions. I use the same simplified procedure used for the dual-proportion model when translating probabilities to bracket predictions. For bracket points, I used the probabilities as-is. West’s method is designed for potential use by the NCAA selection committee, not for bracket prediction after the tournament layout has been decided. So, it normalizes probabilities by round for the combined teams that reach the Round of 64, not by the specific matchups (West 2006). For the log-likelihoods in Table 2, the original probabilities are not proper, so I normalized them such that the two teams in each game had a combined 100% probability of winning.

The Schwertman et al. (1991, 1996), and Jacobson et al. (2011) models are not listed separately in Table 1 because their predicted brackets are all identical to those from Pick the Seeds. When a model considered multiple teams equally strong for a bracket selection, I calculated points from each option and averaged the points. This occurred most frequently with Pick the Seeds, which chooses four #1 seeds to make the Final 4 and then has no way to distinguish among them. It thus gets points if any #1 seed wins Final 4 games, but fewer points than a method that predicts the specific team.

References

“2011 NCAAB Tournament Odds 1st Round Match Ups.” Free Sports Picks. 1800-Sports.Com, n.d. Web. May 29, 2014 (http://www.1800-sports.com/310-800.shtml).Search in Google Scholar

“Basketball Odds Comparison, Basketball Betting Odds & Lines.” OddsPortal.Com. OddsPortal.Com, n.d. Web. May 28, 2014 (http://www.oddsportal.com/basketball/).Search in Google Scholar

Breiter, David J. and Bradley P. Carlin. 1996. “How to Play Office Pools If You Must.” Chance 10(1):5–11.10.1080/09332480.1997.10554789Search in Google Scholar

Carlin, Bradley P. 1994. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50:39–43.Search in Google Scholar

“College Basketball Scores & History.” Sports-Reference.Com. Sports Reference LLC, n.d. Web. April 30, 2014 (http://www.sports-reference.com/cbb/).Search in Google Scholar

“ESPN – Tournament Challenge – Who Picked Whom.” ESPN. ESPN, n.d. March 28, 2014 (http://games.espn.go.com/tournament-challenge-bracket/2014/en/whopickedwhom).Search in Google Scholar

Geiling, Natasha. 2014. “When Did Filling Out a March Madness Bracket Become Popular?” Smithsonian.com. Smithsonian Magazine, March 20, 2014. Web. May 19, 2014 (http://www.smithsonianmag.com/history/when-did-filling-out-march-madness-bracket-become-popular-180950162).Search in Google Scholar

Jacobson, Sheldon H., et al., 2011. “Seed Distributions for the NCAA Men’s Basketball Tournament.” Omega 39(6):719–24.10.1016/j.omega.2011.02.004Search in Google Scholar

Kaplan, Edward H., and Stanley J. Garstka. 2001. “March Madness and the Office Pool.” Management Science 47(3):369–82.10.1287/mnsc.47.3.369.9769Search in Google Scholar

Koenker, Roger, and Gilbert W. Basset, Jr. 2010. “March Madness, Quantile Regression Bracketology, and the Hayek Hypothesis.” Journal of Business & Economic Statistics 28:26–35.10.1198/jbes.2009.07093Search in Google Scholar

Metrick, Andrew. 1996. “March Madness? Strategic Behavior in NCAA Basketball Tournament Betting Pools.” Journal of Economic Behavior & Organization 30:159–72.10.1016/S0167-2681(96)00855-4Search in Google Scholar

“NCAA – Men’s College Basketball Teams, Scores, Stats, News, Standings, Rumors.” ESPN. ESPN, n.d. Web. April 30, 2014 (http://espn.go.com/mens-college-basketball/).Search in Google Scholar

“NCAA Basketball Betting Odds, NCAA Tournament Point Spreads & Money Lines.” DonBest. Dodgeball Ventures, Inc., n.d. Web. May 18, 2014 (http://www.donbest.com/ncaab/odds/).Search in Google Scholar

Quintong, James. 2014. “Tournament Challenge: Final Four Update.” College Basketball Nation Blog. ESPN, April 5, 2014. Web. June 12, 2014 (http://espn.go.com/blog/collegebasketballnation/post/_/id/98226/tournament-challenge-final-four-update-2).Search in Google Scholar

Sagarin, Jeff. “Final College Basketball 2013-2014 Through Results of 2014 April 7 Monday – NCAA Championship.” USA Today. USA Today, April 8, 2014. Web. May 20, 2014.Search in Google Scholar

Schwertman, Neil C., T. A. McCready, and L. Howard. 1991. “Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 45:35–8.Search in Google Scholar

Schwertman, Neil C., Kathryn L. Schenk, and Brett C. Holbrook. 1996. “More Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 50(1): 34–8.Search in Google Scholar

West, Brady. 2006. “A Simple and Flexible Rating Method for Predicting Success in the NCAA Basketball Tournament.” Journal of Quantitative Analysis in Sports 2(3):n.p.10.2202/1559-0410.1039Search in Google Scholar

West, Brady. “New Ratings 2014.” Brady West’s Home Page. University of Michigan, n.d. Web. May 31, 2014 (http://www-personal.umich.edu/bwest/new_ratings_2014.xls).Search in Google Scholar

Published Online: 2014-10-11

Published in Print: 2015-3-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/jqas-2014-0047

Keywords for this article

brackets; college basketball; maximum likelihood; statistics