Ranking the performance of tennis players: an application to women’s professional tennis

McKinley L. Blackburn

doi:10.1515/jqas-2013-0006

Article

Ranking the performance of tennis players: an application to women’s professional tennis

McKinley L. Blackburn

Published/Copyright: November 1, 2013

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 9 Issue 4

Abstract

Paired-comparison models have been previously used in the literature to assess the relative performance of tennis players over a given period of time. In this paper, I discuss how the rankings of tennis players can be modified to address variations in the importance of tennis tournaments, and concerns about under-participation over the tennis season. The methods are applied to the 2011 WTA season, where the WTA-ranked number one player Caroline Wozniacki was often criticized for not being the true top player. The alternative rankings proposed here indicate that Petra Kvitova was the top player in 2011, with Serena Williams a close second. These rankings do appear to perform better in predicting match probabilities in early 2012 than methods based on the official rankings.

Keywords: paired comparisons; sports rankings; women’s tennis

Corresponding author: McKinley L. Blackburn, University of South Carolina – Economics, 1705 College St., Columbia, SC 29206, USA, Tel.: +803-777-4931, Fax: +803-777-6876, e-mail: blackbrn@moore.sc.edu

¹
As an example, Martina Navratilova questioned Wozniacki’s ranking at the 2012 Australian Open, suggesting that the WTA ranking method should be revised to take into account the quality of wins (The Independent, Jan. 23, 2012). Numerous references to other commentators claiming Wozniacki was not truly number one could be obtained throughout the 2011 season.
²
By comparison, the men’s tour had seen Novak Djokovic win three of the major tournaments along with winning 43 consecutive matches to start the season, leaving little doubt at season’s end as to who deserved the number one ranking.
³
The rules are described more completely in Women’s Tennis Association (2012).
⁴
The data were taken from the Tennis.Matchstat web site at tennis.matchstat.com.
⁵
For a discussion and evalution of non-model-based ranking methods for football, soccer, and rugby, see Stefani and Pollard (2007). Radicchi (2011) employed a network-system analysis of tennis matches for men that also allows for a ranking of players based on weighted connections in matches.
⁶
David (1988) and Cattelan (2012) provide reviews of paired-comparison methods. The outcome measure could be changed to incorporate the score difference in number of sets (as in Clarke and Dyte 2000 ) or number of games (as in McHale and Morton 2011). I focus only on the won/loss outcome, however, as I wanted to use the same outcomes the WTA considers important and the WTA ranking system does not take “margin of victory” into account. Similar models applied to college football outcomes (see Harville 1977) include a constant term for home-field advantage, and a dummy for neutral-site games. I did not consider evidence of a “home-country” advantage in tennis; Holder and Nevill (1997) did not find evidence of such an effect at the major tournaments for men.
⁷
The model has been previously applied to modeling tennis match outcomes by Glickman (1999) and McHale and Morton (2011).
⁸
Retired players may ask to be excluded from the rankings after they retire; for example, both Justine Henin and Patty Schyder retired during 2011.
⁹
Mease uses a probit functional form rather than logit. A similar procedure for the logit model was suggested by Clogg et al. (1991).
¹⁰
This if often criticized for the fact that naive models can often perform very well by this measure, though in this case a naive model would be expected to predict only 50% correctly.
¹¹
Both only played six matches during the year, but even when players with a small number of matches are excluded from the rankings (anyone with <20 matches, say) there are still a number of lower-ranked players among the top based on winning percentage (three of the top 20 by winning percentage are outside the top 50 based on WTA rankings).
¹²
This effectively gives the terms in the last two sums a weight equal to the average weight across matches in the sample.
¹³
One exception is the year-end championship (YEC) tournaments, where the payoff per match is substantially higher than all of the other tournaments. I use a weight of $80,000 per match for the YEC, giving it similar importance to the four Grand Slam tournaments (which average from $80,000 to $84,000 per match).
¹⁴
WTA points tend to be proportional across all rounds in comparing tournaments at two different levels.
¹⁵
I use robust standard errors for the estimates that allow for potential correlation in match-outcome errors for multiple matches with the same pair of players.
¹⁶
If the predictions in Table 3 are restricted to matches between players with ranks better than 200 in the WTA rankings, there is only a very slight improvement in the deviance criterion when SE-adjustments are incorporated.
¹⁷
There is also a concern about the consistency of the parameter estimates, which rely on the number of matches for each player approaching infinity. In practice, the number of matches among the top-ranked players is somewhat high (59 per player among the top 20), though the impact of a low number of matches among the lower-ranked players on outcomes for the higher-ranked is not clear.
¹⁸
There were a couple of much lower-ranked players who were winless over a number of matches in 2011 who also receive high difficulty measures, but this may have more to do with the importance of the penalized likelihood in keeping their estimated logit parameter finite.
¹⁹
This uses all matches before the US Open to predict performance starting with that tournament.
²⁰
This was chosen to maximize the deviance for the last 22% of the sample based on estimates from the first 78%. In this case, a correction of 1.1 standard errors was chosen.
²¹
More complete prediction models for tennis matches (though using official rankings or points) are presented in Gilsdorf and Sukhatme (2008) and del Corral and Prieto-Rodriguez (2010).
²²
Note that this is just a logistic regression model with no constant and log(pts_j/pts_k) as the independent variable. Boulier and Stekler (1999), Clarke and Dyte (2000) Klassen and Magnus (2002), and del Corral and Prieto-Rodriguez (2010) all estimate similar models in which the right-hand side is the difference in the rank of the winner and loser, but I found this model did not predict as well as the model based on ranking points. The estimated value for λ was 1.05.
²³
For the latter measure, I used a paired test of proportions (the McNemar test) which had a p-value of 1.00 for the test that the proportions were equal.
²⁴
I use the same standard-error correction as for the estimates for the general year. Serena Williams is top-ranked using just hardcourt matches, while Wozniacki was second.
²⁵
Quality points start at 100 for winning against #1, 75 for winning against #2, 65 for #3, #55 for #4, and so on (until 1 point is received for winning against someone ranked between 250 and 500). Double quality points are received for wins at Grand Slam events.

Appendix

For the log-likelihood in equation (2), the first-order condition for the jth player’s parameter (given the other parameter estimates) can be written as the implicit solution to:

The win-loss difference for player j equals the sum of the predicted probabilities of winning for matches won minus the sum of the predicted probabilities of losing for matches lost.

To derive a quality-of-opponent adjustment factor, suppose that a “reference” player with was the opponent in every match. Then the first-order condition for player j would be:

from which it follows that

The difference can then be interpreted as an adjustment factor for difficulty of opponent. If the penalized log-likelihood of equation (4) is used, the quality parameter value if the reference player had always been played becomes:

References

Boulier, Bryan L. and H. O. Stekler. 1999. “Are Sports Seedings Good Predictors?: An Evaluation.” International Journal of Forecasting 15:83–91.10.1016/S0169-2070(98)00067-3Search in Google Scholar

Bradley, Ralph A. and Milton E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345.10.1093/biomet/39.3-4.324Search in Google Scholar

Cattelan, Manuela. 2012. “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data.” Statistical Science 27(3):412–433.10.1214/12-STS396Search in Google Scholar

Clarke, Stephen R. and David Dyte. 2000. “Using Official Rankings to Simulate Major Tennis Tournaments.” International Transactions in Operational Research 7:585–594.10.1111/j.1475-3995.2000.tb00218.xSearch in Google Scholar

Clogg, Clifford C., Donald B. Rubin, Nathaniel Schenker, Bradley Schultz, and Lynn Weidman. 1991. “Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression.” Journal of the American Statistical Association 86(413):68–78.10.1080/01621459.1991.10475005Search in Google Scholar

David, Herbert A. 1988. The Method of Paired Comparisons. New York: Oxford University Press.Search in Google Scholar

del Corral, Julio and Juan Prieto-Rodriguez. 2010. “Are Differences in Ranks Good Predictors for Grand Slam Tennis Matches.” International Journal of Forecasting 26:551–563.10.1016/j.ijforecast.2009.12.006Search in Google Scholar

Dixon, Mark J. and Stuart G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Applied Statistics 46(2):265–280.10.1111/1467-9876.00065Search in Google Scholar

Gilsdorf, Keith F. and Vasant Sukhatme. 2008. “Testing Rosen’s Sequential Elimination Tournament Model: Incentives and Player Performance in Professional Tennis.” Journal of Sports Economics 9(3):287–303.10.1177/1527002507306790Search in Google Scholar

Glickman, Mark E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Applied Statistics 48(3):377–394.10.1111/1467-9876.00159Search in Google Scholar

Harville, David. 1977. “The Use of Linear-Model Methodology to Rate High School or College Football Teams. Journal of the American Statistical Association 72(358):278–289.10.1080/01621459.1977.10480991Search in Google Scholar

Holder, Roger L. and Alan M. Nevill. 1997. “Modelling Performance at International Tennis and Golf Tournaments: Is there a Home Advantage?” The Statistician 46(4):551–559.10.1111/1467-9884.00109Search in Google Scholar

Klassen, Franc J. G. M and Jan R. Magnus. 2002. “Forecasting the Winner of a Tennis Match.” European Journal of Operational Research 148:257–267.10.1016/S0377-2217(02)00682-3Search in Google Scholar

Mease, David. 2003. “A Penalized Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins.” The American Statistician 57(4):241–248.10.1198/0003130032396Search in Google Scholar

McHale, Ian and Alex Morton. 2011. “A Bradley-Terry Type Model for Forecasting Tennis Match Results.” International Journal of Forecasting 27:619–630.10.1016/j.ijforecast.2010.04.004Search in Google Scholar

Radicchi, Filippo. 2011. “Who is the Best Player Ever? A Complex Network Analysis of the History of Professional Tennis.” PloS ONE. 6(2):1–7.10.1371/journal.pone.0017249Search in Google Scholar PubMed PubMed Central

Stefani, Ray, and Richard Pollard. 2007. “Football Rating Systems for Top-Level Competition: A Critical Survey.” Journal of Quantitative Analysis in Sports 3(3):1–20.10.2202/1559-0410.1071Search in Google Scholar

Women’s Tennis Association. 2012. 2012 Official Rulebook. St. Petersburg, Florida: WTA Tour Inc.Search in Google Scholar

Published Online: 2013-11-01

Published in Print: 2013-12-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/jqas-2013-0006

Keywords for this article

paired comparisons; sports rankings; women’s tennis