Relieving and Readjusting Pythagoras
Abstract
Bill James invented the Pythagorean expectation in the late 70's to predict a baseball team's winning percentage knowing just their runs scored and allowed. His original formula estimates a winning percentage of RS2/( RS2+ RA2), where RS stands for runs scored and RA for runs allowed; later versions found better agreement with data by replacing the exponent 2 with numbers near 1.83. Miller and his colleagues provided a theoretical justification by modeling runs scored and allowed by independent Weibull distributions. They showed that a single Weibull distribution did a very good job of describing runs scored and allowed, and led to a predicted won-loss percentage of ( RS obs-1/2)γ / (( RS obs-1/2)γ + ( RA obs-1/2)γ), where RS obs and RA obs are the observed runs scored and allowed and γ is the shape parameter of the Weibull (typically close to 1.8). We show a linear combination of Weibulls more accurately determines a team's run production and increases the prediction accuracy of a team's winning percentage by an average of about 25% (thus while the currently used variants of the original predictor are accurate to about four games a season, the new combination is accurate to about three). The new formula is more involved computationally; however, it can be easily computed on a laptop in a matter of minutes from publicly available season data. It performs as well (or slightly better) than the related Pythagorean formulas in use, and has the additional advantage of having a theoretical justification for its parameter values (and not just an optimization of parameters to minimize prediction error).