Simply Better: Using Regression Models to Estimate Major League Batting Averages
-
Dan Neal
We consider the problem of estimating a Major League Baseball players batting average in the second half of a season based on his performance in the first half. We fit two linear regression models to players averages from each half of the 2004 season, use these models to predict batting averages in the latter half of 2005 and compare the results to those achieved by three Bayesian estimators considered by Brown (2008). The linear models consistently outperform the Bayesian estimators in terms of four measures of error. Since the regression models use data from 2004 as well as 2005, while Browns estimators were based strictly on 2005 data, we also compare the performance of the linear models to that of the Bayesian estimators when the Bayesian estimators are based on the same amount of data. We find the linear models to be superior in this case as well. As a further test, we use the same methods to predict on-base percentages in the last half of the 2005 season, and we find that the linear models again do a better job. While we change the question proposed in Browns original paper, our results are a valuable reminder of the power of linear regression.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Articles in the same Issue
- Article
- Home Advantage in Three National Netball Competitions: Australia (1997-2007), New Zealand (1998-2007) and England (2005/06-2008/09)
- Relative Importance of Performance Factors in Winning NBA Games in Regular Season versus Playoffs
- An Examination of Judging Consistency in a Combat Sport
- An Improved LRMC Method for NCAA Basketball Prediction
- Bayesian Modeling of Footrace Finishing Times
- Rating/Ranking Systems, Post-Season Bowl Games, and "The Spread"
- A New Approach in the Evaluation of Team Chess Championships Rankings
- A Point-Mass Mixture Random Effects Model for Pitching Metrics
- Tail Modeling, Track and Field Records, and Bolt's Effect
- AccuV College Football Ranking Model
- Validation of Match Notation (A Coding System) in Tennis
- Simply Better: Using Regression Models to Estimate Major League Batting Averages
- Scoring Variables and Judge Bias in United States Dressage Competitions
- The "Bradman Class": An Exploration of Some Issues in the Evaluation of Batsmen for Test Matches, 1877-2006
Articles in the same Issue
- Article
- Home Advantage in Three National Netball Competitions: Australia (1997-2007), New Zealand (1998-2007) and England (2005/06-2008/09)
- Relative Importance of Performance Factors in Winning NBA Games in Regular Season versus Playoffs
- An Examination of Judging Consistency in a Combat Sport
- An Improved LRMC Method for NCAA Basketball Prediction
- Bayesian Modeling of Footrace Finishing Times
- Rating/Ranking Systems, Post-Season Bowl Games, and "The Spread"
- A New Approach in the Evaluation of Team Chess Championships Rankings
- A Point-Mass Mixture Random Effects Model for Pitching Metrics
- Tail Modeling, Track and Field Records, and Bolt's Effect
- AccuV College Football Ranking Model
- Validation of Match Notation (A Coding System) in Tennis
- Simply Better: Using Regression Models to Estimate Major League Batting Averages
- Scoring Variables and Judge Bias in United States Dressage Competitions
- The "Bradman Class": An Exploration of Some Issues in the Evaluation of Batsmen for Test Matches, 1877-2006