Luck has
been the explanation whenever a pitcher has a significantly lower ERA than his
FIP. There are two statistics where luck plays a huge role, BABIP and LOB%.
Using Steve Staude’s pitching stat correlation tool, we can see that BABIP only
has a correlation of 0.156 from one season to the next, while LOB% has a
correlation of 0.205, for pitchers with a minimum of 30 innings pitched from
2007 to 2013. These numbers are much lower than the correlation of K% or BB%,
suggesting that a large portion of BABIP and LOB% are subject to random
variation and independent of a pitcher’s skill. However, the correlation is not
0. They are not completely random, and a pitcher can still play a small role in
controlling their BABIP and LOB%. Many writers, including Steve, have tackled
the issue of BABIP using batted ball data. In this article, I will be
estimating a pitcher’s LOB% for the current season. This is not supposed to be
a predictive stat, but a descriptive one. Think of it as FIP. While FIP
estimates the pitcher’s ERA using strikeouts, walks and homeruns, xLOB%
estimates the pitcher’s LOB% given his other pitching statistics for the same
season. I will be introducing pLOB% in the next article, which attempts to
project LOB% of a pitcher for the following season.
First, take a look at which statistics correlate most closely to LOB%. Again, I am using Steve’ pitching stat correlation tool and setting the minimum innings pitched at 30 from 2007 to 2013.
Correlation
with current year LOB%
|
Correlation
with next year LOB%
|
|
BABIP
|
-0.452
|
-0.127
|
GB%
|
-0.050
|
-0.047
|
FB%
|
0.103
|
0.059
|
LD%
|
-0.135
|
-0.030
|
PU%
(Popup%)
|
0.166
|
0.106
|
HR/FB
|
-0.131
|
-0.135
|
HR/TBF
|
-0.138
|
-0.157
|
K%
|
0.421
|
0.348
|
BB%
|
-0.037
|
0.052
|
HBP%
|
-0.034
|
0.013
|
O-Swing%
|
0.246
|
0.169
|
Z-Swing%
|
-0.040
|
-0.057
|
Swing%
|
0.146
|
0.077
|
O-Contact%
|
-0.163
|
-0.165
|
Z-Contact%
|
-0.332
|
-0.311
|
Contact%
|
-0.331
|
-0.307
|
Zone%
|
-0.046
|
-0.034
|
SwStr%
|
0.345
|
0.302
|
Foul%
|
0.311
|
0.256
|
rSB
|
0.062
|
0.009
|
rPM
|
0.045
|
0.001
|
LOB%
|
1
|
0.205
|
Looking at
the first column, a few stats stand out as strongly correlated with LOB%. BABIP
has the strongest correlation with LOB%, at -0.452. This makes perfect sense as
a pitcher who gives up a lot of hits would have more of his base runners score.
K% comes next at 0.421. This also makes sense as a strikeout does not advance
the runner, and high-strikeout pitchers should be able to strand more runners
without subjecting themselves to the whims of BABIP. Next comes a series of
stats that are highly correlated with K%, namely SwStr%, Z-contact%, contact%,
O-swing%. Foul%, which has a correlation of 0.311 with LOB%, initially caught
me by surprise. However, a deeper look reveals that it has a correlation of
0.708 with K%, so it does not add much additional information. Both HR/FB and
HR/TBF have a fairly strong negative association with LOB%, which should have
been expected as homeruns score all the base runners. What surprises me the
most is BB%, which has only a -0.037 correlation with LOB%. I did not know what
I was expecting before the study, but I probably expected a stronger
association, either positive or negative. Now that I think about it, a walk can
be positively associated with LOB% because it is the least dangerous form of a
base runner, compared to a single or an extra-base hit. It does not advance the
runners already on base as much as hits, and the batter only reaches first base
after a walk. A walk can also be negatively associated with LOB% because it
still advances the base runners and makes them easier to score after the walk.
The two factors seem to cancel out each other, and BB% does not seem to have a
strong association with LOB%. I also tested the fielding statistics, but they
do not appear to have strong associations with LOB%.
Using
multiple regression, my model for xLOB% = 0.87 - 0.76 BABIP + 0.42 K%. The
R-squared value is 31.8%. The standard error is 0.0574, or 5.74%, suggesting
that xLOB% differs from LOB% by 5.74% on average. O-swing%, rSB, FB% and HR/TBF
are all significant variables in the model at α = 0.05. However, none of these
variables add more than 1% to R-squared value, so I decided to omit them in the
model to maintain its simplicity.
Testing out of
sample, using data from 2002-2006 with a minimum of 30 innings pitched, xLOB%
has a correlation of 0.573 with LOB%. This is very close to the correlation
coefficient of 0.564 between xLOB% and LOB% in the data from 2007-2013,
suggesting the relationship between BABIP+K% and LOB% is not a quirk of the
data from 2007-2013.
How does xLOB%
perform as a predictor? Not so well. Using data from 2007-2013, xLOB% has a
correlation of 0.299 with LOB% of the following season. This is a lower
correlation coefficient than K% has with LOB% of the following season alone
(0.348). The reason behind the relative uselessness of xLOB% as a predictor is
that BABIP is very inconsistent from year to year. xLOB% itself only has a
correlation of 0.463 from year to year, which is similar to the correlation
coefficient of PU%, but much lower than that of K% or BB% from year to year. How
can LOB% be predicted? That will be the topic of my next article.
All statistics courtesy of Fangraphs.
No comments:
Post a Comment