Basic Statistics Glossary

Here you will find a list of baseball statistics and other scouting jargon. I’m going to keep this page simple, only going over the statistics and terms I use most frequently. If you want to know more about sabermetrics and advanced baseball statistics, I highly recommend you pick up “Baseball Between the Numbers” by the crew at Baseball Prospectus. It does a fantastic job of outlining the key components of baseball analysis at an advanced level, and it is a very simple book to get through. You can also check out the sabermetrics wikipedia page.

Statistics

BABIP (Batting Average on Balls in Play) – Another core sabermetrics measure, BABIP is a reading of the number of balls put in play that go for base hits. Similar to FIP in usage, BABIP is an indicator used to determine, at a glance, if a pitcher has been lucky or unlucky based on the frequency of the balls put in play falling in for hits. An “average” BABIP is around .300, with an average defense behind the pitcher. Really poor team defenses tend to yield higher average BABIPs, while above average team defense will lead to a lower average BABIP. Extreme high and low BABIPs are generally unsustainable, and regression is normally in order. Therefore, if you see a pitcher with a .365 BABIP in one season and an ERA of 4.00, if his peripherals remain the same the following year, and his BABIP regresses down to .300, his ERA will likely drop as well. The formula for BABIP is:

BABIP = H-HR/(AB-K-HR+SF)

DIPS (Defense Independent Pitching Statistics) – A statistical breakthrough by Voros McCracken in the late 90s, the DIPS statistics attempt to isolate only the core attributes that pitchers really have control over, namely walks, strikeouts, and home runs. A number of statistics have been created, including FIP, dERA and DICE, which is the version I prefer to use because of its simplicity and ease of calculation. The formula for DICE is

3.00 + (13*HR + 3*(BB+HBP) – 2*K)/IP

The “scale” is essentially the same as ERA. A pitcher with a lower DICE than his actual ERA could be considered unlucky, while a pitcher with a lower ERA than a DICE could be considered a bit lucky.

The Hardball Times developed a statistic called xFIP, or Expected Fielding Independent Pitching, which removes a pitcher’s HR rate and replaces it with an average of 10.6%, as home run rate tends to fluctuate from year to year.

Batted Ball Data (GB%, FB%, LD%) – Batted ball data is vital when trying to parse out oddities in both BABIP and DIPS. The average batting line for each of the 3 types of contact produce different results. For example, the 2010 batting lines for batted ball types were

Groundballs: .235/.235/.255
Flyballs: .219/.214/.580
Line Drives: .724/.720/.972

These numbers tell you the very simple story. A ball hit on the ground has a better chance of falling in for a base hit than a flyball. A fly ball is much more likely to produce an extra base hit than a groundball. And a line drive is toxic for a pitcher, and gold for a hitter.

ISO (Isolated Power) – A statistic used to evaluate the raw power of a player. ISO basically removes singles from the equation and calculates extra base hits. The MLB average ISO is somewhere in the range of .145. The formula for ISO is

Slugging % – Batting Average

HR/FB% (Home Run per Fly Ball Percentage) – As mentioned above with DIPS, the MLB average for home runs resulting from flyballs is 10.6%. Players with a higher percentage of their flyballs turning into home runs may see their home run totals decrease in subsequent seasons as their rate normalizes. Some players are able to maintain higher/lower percentages based on their raw power.

Concepts

Sample Size – I mention sample size a lot. Its a self-evident term and concept. In statistics, not just baseball statistics but any kind of statistical measure, the larger your sampling, the more accurate your results. A sample of 50 plate appearances will have more weight than a sample of 20 plate appearances, just like a sample of 500 plate appearances will be more significant than a sample of 200 plate appearances. This Fangraphs entry gives you a basic idea of the threshold for when sample sizes become meaningful. But remember, in all cases, a larger sample is more accurate than a smaller sample in weeding out statistical noise.

Regression to the Mean – An add-on concept to sample size, regression to the mean, in a nutshell, is the concept that over time, and over larger samples of data, performance will generally tend to gravitate toward the mean or average, because over time, “luck” or statistical outliers are neutralized. Regression is not a set in stone rule, and some players will regress faster than others. Regression to the mean does not connotate numbers become worse, regression works in both directions. You can check the Fangraphs entry on regression here, and the wikipedia entry here. A example of a “positive” regression

A player starts his career with a BABIP of .150 over his first 100 PAs. Over his next 100 PA, you can expect his BABIP to rise, as the league average BABIP is .300. In some cases, it will rise fast, in others it will rise slow, and yet in others it will remain almost constant for a period of time before eventually rising.

An example of a “negative” regression is the polar opposite

A player starts his career with a BABIP of .400 over his first 100 PAs. Over his next 100 PA, you can expect his BABIP to fall.

Defensive Spectrum – A concept I talk about frequently, the defensive spectrum is a way of assigning value (as well as difficulty) to the positions on the field. The generally accepted spectrum is, from toughest to easiest

Catcher > Shortstop > Second Base > Centerfield > Third Base > Right Field > Left Field > First Base > DH (duh)

Phuture Phillies

Phillies Prospects

Basic Statistics Glossary

Statistics

Concepts

1 thought on “Basic Statistics Glossary”

Statistics

Concepts

Share it:

1 thought on “Basic Statistics Glossary”