Home Simply Better: Using Regression Models to Estimate Major League Batting Averages
Article
Licensed
Unlicensed Requires Authentication

Simply Better: Using Regression Models to Estimate Major League Batting Averages

  • Dan Neal , James Tan , Feng Hao and Samuel S Wu
Published/Copyright: July 19, 2010

We consider the problem of estimating a Major League Baseball player’s batting average in the second half of a season based on his performance in the first half. We fit two linear regression models to players’ averages from each half of the 2004 season, use these models to predict batting averages in the latter half of 2005 and compare the results to those achieved by three Bayesian estimators considered by Brown (2008). The linear models consistently outperform the Bayesian estimators in terms of four measures of error. Since the regression models use data from 2004 as well as 2005, while Brown’s estimators were based strictly on 2005 data, we also compare the performance of the linear models to that of the Bayesian estimators when the Bayesian estimators are based on the same amount of data. We find the linear models to be superior in this case as well. As a further test, we use the same methods to predict on-base percentages in the last half of the 2005 season, and we find that the linear models again do a better job. While we change the question proposed in Brown’s original paper, our results are a valuable reminder of the power of linear regression.

Published Online: 2010-7-19

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Downloaded on 28.9.2025 from https://www.degruyterbrill.com/document/doi/10.2202/1559-0410.1229/html
Scroll to top button