Simply Better: Using Regression Models to Estimate Major League Batting Averages

Dan Neal; James Tan; Feng Hao; Samuel S Wu

doi:10.2202/1559-0410.1229

Article

Simply Better: Using Regression Models to Estimate Major League Batting Averages

Dan Neal , James Tan , Feng Hao and Samuel S Wu

Published/Copyright: July 19, 2010

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 6 Issue 3

We consider the problem of estimating a Major League Baseball player’s batting average in the second half of a season based on his performance in the first half. We fit two linear regression models to players’ averages from each half of the 2004 season, use these models to predict batting averages in the latter half of 2005 and compare the results to those achieved by three Bayesian estimators considered by Brown (2008). The linear models consistently outperform the Bayesian estimators in terms of four measures of error. Since the regression models use data from 2004 as well as 2005, while Brown’s estimators were based strictly on 2005 data, we also compare the performance of the linear models to that of the Bayesian estimators when the Bayesian estimators are based on the same amount of data. We find the linear models to be superior in this case as well. As a further test, we use the same methods to predict on-base percentages in the last half of the 2005 season, and we find that the linear models again do a better job. While we change the question proposed in Brown’s original paper, our results are a valuable reminder of the power of linear regression.

Keywords: batting average; regression; Bayesian

Published Online: 2010-7-19

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.2202/1559-0410.1229

Keywords for this article

batting average; regression; Bayesian