Abstract
This study provides an analysis of performance by round among PGA TOUR players over the 2003–2020 period in connection with drives, approach shots, short shots, and putts. Player performance is evaluated as if each were competing in four separate hypothetical shotmaking contests per round, one contest per shot type, where in each separate contest, scoring is based on regression-adjusted Strokes Gained per shotmaking opportunity. The analysis focuses on contest winners and the relative roles of skill and luck when winning contests of each type. All tests suggest that a substantial amount of luck in Strokes Gained-based scoring is required to win each type of contest, although much more luck is required to win approach shot, short shot, and putting contests than driving contests.
Acknowledgments
The author acknowledges the helpful comments of Natalya Pya Arnqvist, Robert Connolly, William Perreault, David Robb, Gary Smith, Rob Whited, and Zhengwu Zhang, along with those of Mark Broadie in connection with other studies related to this work.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The author states no conflict of interest.
-
Research funding: None declared.
-
Data availability: The data used in connection with this study is proprietary but has been made available to the author by the PGA TOUR.
A. Expected strokes to finish, further detail
A.1. How distance to the pin is computed
For each shot, other than shots taken from the tee box on par-4 and par-5 holes, the distance to the pin is the straight-line distance computed from the {X, Y} coordinate position of the ball to the {X, Y} coordinate position of the pin. Although one could obtain slightly more accurate distances by including Z coordinates, some Z coordinates are missing and a few seem inconsistent in relation to those of nearby ball positions. Moreover, including Z coordinates makes almost no difference in final computed distances.
In golf, very few par-4 and par-5 holes are intended to be played in a perfectly straight line from tee to pin. Instead, on most par-4s and par-5s, the hole’s natural routing “angles off” (“doglegs”) to the left or right from the intended drive finishing position. In such cases, the straight-line distance from tee to pin will understate the distance of the hole relative to that associated with its intended routing.
For each par-4 and par-5 hole, the median drive finishing position is computed, and it is assumed that this position is where drives are intended to be hit. The distance from the tee to the median position and distance from the same median position to the pin are added together. This becomes the estimate of hole length, tee to pin.
A.2. Recovery shots
Broadie (2012) outlines a two-step process for inferring whether a shot is likely to be a “recovery” shot, that is a shot that is “impeded by trees or other obstacles” (p. 152). Initially, a ball is identified as being eligible for the recovery classification if it lies in the rough 30 or more yards from the pin. Next, in step 1, if the absolute angle created by the finishing position of the shot taken relative to the position of the pin exceeds 15°, the shot taken is classified as a recovery shot. Alternatively, if the distance traveled by the ball once it is hit is less than 40 % of the distance to the pin, the shot is classified as a recovery shot.
In step 2, a ball not otherwise identified as a recovery shot in step 1 is identified as being in the recovery position if it lies within close proximity to a ball identified in step 1 as being in a recovery position. As explained by Broadie (2012), suppose the balls of two golfers lie in approximately the same position. “The first golfer chips back onto the fairway, and the second golfer attempts a big slice around the trees. Step 1 would identify the first golfer’s shot as a recovery shot; however, this step might not identify the second golfer’s shot as a recovery, although it started in the same position and was significantly affected by trees” (p. 153).
In this study, the methodology for identifying a recovery shot in step 2 deviates somewhat from that described by Broadie. A ball is identified as being in a recovery position if it lies within 3 feet of any ball so identified in step 1. By contrast, Broadie identifies such balls as being in the recovery position if they lie within 9 feet of recovery balls identified in step 1. However, in correspondence, Broadie indicates that he now employs a 3-foot distance in step 2 to identify balls in the recovery position and also indicates that the PGA TOUR no longer employs step 2 in identifying recovery shots. Nevertheless, step 2, as described herein, is retained for the purposes of identifying shots in a recovery position.
Finally, a detailed analysis of records in the ShotLink shot files reveals that approximately 17 % of all shots classified as recovery shots by the PGA TOUR originated from a bunker, and five originated from the fairway, shots that otherwise would not have been classified as recovery shots in this study nor in Broadie’s work.
B. Regression modeling detail
B.1. Choosing the form of the distance function,
f
d
i
,
c
In each separate year, the distance function,
B.2. Minimum number of rounds and the specification of player identifiers
During the initial estimation process, no restriction was placed on the minimum number of rounds per player in a given year. However, with no restriction, some estimated fixed player and condition effects, μ
i
and γ
c
, were too large in absolute value to be consistent with what one would normally expect in connection with professional golf competition. This problem was eliminated by treating each player who had recorded fewer than 10 rounds in a given year as the same player when estimating equation (4) and then assigning the μ
i
value so estimated to each such player individually. By treating such players in this way, the distance functions,
B.3. Specification of the condition identifier
With the two exceptions noted below in Section B.4, the condition of each shot taken is defined by the tournament/round combination in which the shot was taken. Initially, the author considered defining shot conditions by tournament/round/hole combinations, rather than tournament/round, to reflect that in a given round, some holes might play more difficult than others, but as described in Section B of the supplementary document, this created collinearity problems, especially for drives.
B.4. Special treatment of recovery shots and shots taken from the green
For recovery shots, when condition effects are defined by tournament/round and/or when player effects are included in the estimation of equation (4), resulting estimates of some fixed player and condition effects are too large in absolute value to be consistent with what one would expect for professional golf competition, with the same being true in connection with the other shot condition specifications listed in Section B.3 above. As a result, for recovery shots only, equation (4) is estimated without condition and player effects.
For shots taken from the green (i.e., putts), when condition effects are defined by tournament/round in the estimation of equation (4) and then transformed to corresponding Strokes Gained values, such values can be slightly negative for some putts that were holed out from a relatively short distance. This problem is essentially eliminated by defining the shot condition by tournament/round for putts with a remaining distance to the pin greater than 5 feet and by a single identifier common to all putts taken within a given year with a remaining distance less than or equal to 5 feet. The resulting distance function for putts,
C. Relationship between raw and regression-adjusted Strokes Gained and the PGA TOUR’s SGBase
Approximately 59 % of all raw Strokes Gained values and 35 % of all regression-adjusted values computed by the author for years 2004–2020 are within 0.01 strokes of the PGA TOUR’s SGBase value, with 95 % and 76 % of the author’s two values falling within 0.05 strokes of SGBase. (Note, the PGA TOUR does not report SGBase values for 2003.) However, there are some large inexplicable differences between the PGA TOUR’s raw Strokes Gained values (SGBase) and raw and adjusted values computed by the author, especially among recovery shots. For example, among recovery shots, 45 % and 51 % of raw and adjusted Strokes Gained values computed by the author differ from corresponding SGBase values by more than 0.12 strokes.
References
Bergé, L. (2018). Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm. In: DEM Discussion Paper Series 18-13. Department of Economics at the University of Luxembourg.Search in Google Scholar
Broadie, M. (2012). Assessing golfer performance on the PGA TOUR. Interfaces 42: 146–165, https://doi.org/10.1287/inte.1120.0626.Search in Google Scholar
Broadie, M. (2014). Every Shot Counts. Gotham Books, New York, NY, USA.Search in Google Scholar
Broadie, M. and Rendleman, R. J.Jr. (2013). Are the official world golf rankings biased? J. Quant. Anal. Sports 9: 127–140.10.1515/jqas-2012-0013Search in Google Scholar
Connolly, R. and Rendleman, R. J.Jr. (2008). Skill, luck and streaky play on the PGA Tour. J. Am. Stat. Assoc. 103: 74–88, https://doi.org/10.1198/016214507000000310.Search in Google Scholar
Connolly, R. and Rendleman, R. J.Jr. (2011). Going for the green: a simulation study of qualifying success probabilities in professional golf. J. Quant. Anal. Sports 7: 1–48, https://doi.org/10.2202/1559-0410.1308.Search in Google Scholar
Connolly, R. and Rendleman, R. J.Jr. (2012a). Tournament selection efficiency: an analysis of the PGA TOUR’s FedExCup. J. Quant. Anal. Sports 8: 1–34, https://doi.org/10.1515/1559-0410.1495.Search in Google Scholar
Connolly, R. and Rendleman, R. J.Jr. (2012b). What it takes to win on the PGA TOUR (if your name is “Tiger” or if it isn’t). Interfaces 42: 554–57, https://doi.org/10.1287/inte.1110.0615.Search in Google Scholar
Fearing, D., Acimovic, J., and Graves, S. C. (2011). How to catch a tiger: understanding putting performance on the PGA TOUR. J. Quant. Anal. Sports 10: 1268–1274, https://doi.org/10.2202/1559-0410.1268.Search in Google Scholar
Pya, N. and Wood, S. N. (2015). Shape constrained additive models. Stat. Comput. 25: 543–559, https://doi.org/10.1007/s11222-013-9448-7.Search in Google Scholar
Rendleman, R. J.Jr. (2020). The relative roles of skill and luck within 11 different golfer populations. J. Quant. Anal. Sports 16: 237–254, https://doi.org/10.1515/jqas-2019-0028.Search in Google Scholar
Stigler, S. M. and Stigler, M. L. (2018). Luck and skill in tournament golf. Chance 31: 4–13, https://doi.org/10.1080/09332480.2018.1522206.Search in Google Scholar
Storey, J. D. (2002). A direct approach to false discovery rates. J. Royal Stat. Soc. B 64: 479–98, https://doi.org/10.1111/1467-9868.00346.Search in Google Scholar
Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 31: 2013–2035, https://doi.org/10.1214/aos/1074290335.Search in Google Scholar
Storey, J. D. and Bass, A. J. (2022). Bioconductor’s qvalue package version 2.29.0. Memo, April 26, 2022.Search in Google Scholar
Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100: 9440–9445, https://doi.org/10.1073/pnas.1530509100.Search in Google Scholar PubMed PubMed Central
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/jqas-2025-0046).
© 2025 Walter de Gruyter GmbH, Berlin/Boston