Baseball BeatFebruary 09, 2010
The Curious Case of Carlos Marmol
By Rich Lederer

After watching my nephew Brett make his PGA Tour debut in the Northern Trust Open at Riviera Country Club last Thursday, my wife and I headed to Palm Desert to hang out for a couple of days while our house was being fumigated for termites.

I woke up on Friday morning, checked my emails, and read the following news in Lee Sinins' daily ATM Report.

The Cubs re-signed P Carlos Marmol to a 1 year, $2.125 million contract, to avoid salary arbitration.
YEAR AGE RSAA  ERA     G  GS   IP    SO   SO/9 BR/9   W   L   SV  NW  NL  TEAM
2007 24   26   1.43   59   0   69.1   96 12.46 10.38   5   1   1   5   1  Cubs         
2008 25   17   2.68   82   0   87.1  114 11.75  8.97   2   4   7   4   2  Cubs         
2009 26    9   3.41   79   0   74     93 11.31 14.59   2   4  15   4   2  Cubs         
CAREER    40   3.42  239  13  307.2  362 10.59 12.34  14  16  23  18  12  
LG AVG     0   4.35           307.2  235  6.88 12.90  17  17

I glanced at Marmol's three-year stat line and noticed that he struck out 11.31 batters per nine innings last season. Not too shabby, I thought. I had been under the impression that he didn't have a particularly good year. Despite his stellar SO/9 rate (or more commonly referred to as K/9), Marmol did indeed struggle as noted in the column next to it on the right. BR/9 stands for "base runners per 9," which is essentially WHIP expressed over nine innings rather than one (although HBP are included in the former and not the latter).

In Marmol's case, hit by pitch is not a trivial statistic. He hit 12 batters last season, good bad enough to rank third in the majors. The 28-year-old righthander, in fact, was the only reliever to reach double digits in this category.

A BR/9 of 14.59 means Marmol allowed 1.62 base runners per inning. That's a horrific rate for any pitcher, much less a closer/setup man. Marmol got there in a strange manner. Carlos allowed 43 hits, 65 walks, and 12 hit batters in 74 innings.

Nolan Ryan, one of the most famous high walks/low hits pitchers of all time, only had two seasons when he allowed more walks than hits. Unlike Marmol, Ryan never approached a BB/H ratio of 1.5:1. His worst ratio was 1.13 in 1970 when he was a 23-year-old part-time starter for the New York Mets. Marmol's BB/H ratio was 1.51 last year. Ryan's career ratio was 0.71. Marmol's ratio over his first four seasons? A stunning 1.03.

Among pitchers with 50 or more games, Marmol had the second-best batting average against (.171 vs. .170 for Jonathan Broxton) and the third-best HR/9 (0.24) and HR/TBF (0.60%) even though he is an extreme flyball pitcher. However, Marmol also had the worst BB/9 (7.91), BB/TBF (19.40%), HBP/9 (0.16), and HBP/TBF (3.58%).

You might say that Marmol missed the strike zone and a lot of bats. If so, you would be right. He struck out, walked, or hit a batter more than half the time! Yup, Carlos had a combined 170 SO, BB, and HBP while facing 335 batters in 2009.

What should we make of Marmol? His K/9, BAA, and HR/9 suggest he is one of the best relievers in the game. On the other hand, his BB and HBP rates indicate that he is a wild man and far from a polished product. Like my house, you can throw a tent over Marmol. While I wouldn't want to exterminate him if I were Jim Hendry or Lou Piniella, I might be inclined to sell tickets to his circus act if I were new Cubs' owner Tom Ricketts.

By the way, Brett and former major winners Padraig Harrington, Davis Love III, Corey Pavin, Vijay Singh, and Mike Weir all missed the cut last week as Steve Stricker won his fourth tournament in less than a year to pass Phil Mickelson as the No. 2 player in the World Golf Rankings.

Designated HitterFebruary 08, 2010
Evaluating Baseball's Managers
By Chris Jaffe

[Editor's Note: Chris Jaffe, writer for The Hardball Times, has written a new book, “Evaluating Baseball’s Managers.” The commentary below is the introductory essay to EBM’s Chapter 5, which is titled “Rise of the Fundamentalists, 1893-1919.”]

The importance of managers peaked at the turn of the century. They inhabited a specific period in the evolution of baseball between two crucial metamorphoses of the game. First, in the late nineteenth century, field generals like Gus Schmelz and Ned Hanlon caused the rise of the modern manager and the extinction of the old business manager. By placing a premium of the preparation of players before contests and handling strategy during them, the position of manager came into its own. A generation later, the rise of the front office diminished the manager’s position by serving as a rival power source within the franchise. Between these transformations, managerial power in the sport crested. Managers ascended into the ranks of ownership with greater frequency than at any other time in baseball history, as there were fewer steps between themselves and owners. Even those who did not own a share of the club frequently had considerable autonomy. When John McGraw became Giants manager, he told the owners which players to keep or remove from the roster, indicating who called the shots for that franchise. Not all managers wielded such authority in this era, and many held considerable power in the future, but they had their strongest opportunity to control the entire franchise at the turn of the century.

Managerial power also reached its zenith because coaching was more important in this period than any other. Old time baseball is often remembered as a glory era, when players dedicated themselves to the craft of the game in a way that modern players with their supposedly softer attitudes never could. Though this attitude is very frequent in the modern day, ideas that the old-timers were better, wiser, and more dedicated are as old as the game itself.

People look at John McGraw and his devotion to those precious fundamentals. He ordered his players come to the park to practice and work out for several hours every day, making the athletes perform precisely in accordance with his formidable will. Other managers, like Frank Chance, made a similar fervent push for sound ball. Chance’s Cubs had a well-earned reputation as the sharpest players in the league.

However, not only was the deadball era far from being the golden era of fundamentals, but the evidence used to make it seem like a Mecca of proper execution are the very facts that indicate otherwise. John McGraw did not want his players practicing constantly because they were so committed, but because those who earned a spot in major league baseball commonly displayed poor fundamentals. The book Crazy ‘08 by Cait Murphy provides an interesting window into baseball during the 1908 NL pennant race. Despite focusing on teams that diligently practiced their basics – McGraw’s Giants and Chance’s Cubs – examples of shoddy play litter the book. It was not a matter of errors; the gloves and conditions of the day made muffed grounders understandable. The problems went deeper. Virtually every game contained at least one boneheaded play that could not be blamed on the conditions. Flies landed between fielders. A base runner would be doubled off on a pop up. An outfielder would misplay a grounder for an inside-the-park home run. These plays still happen, but not nearly as often. If the Cubs and Giants played like that, imagine how the doormats played. There were also some extremely smart plays, but the floor for proper conduct was much lower in 1908.

It seems strange that teams that practiced so religiously played so poorly, but think for a second. Much of what is now received wisdom was still being worked out. In the last quarter of the nineteenth century, players slowly began figuring out how to work together, or back each other up. For example, what should a catcher do when a base runner is caught in a run-down between first and second? Where should the shortstop go when the runner on first heads for third on a single to right? People are not born knowing the answers.

Look at it from the point of view of someone born in 1879 earning a roster slot in 1900. He grew up in a world where even the best players at the highest levels were still learning the core basics. It did not trickle down to Iowa’s cornfields or Pennsylvania’s coal mines overnight. Neither TV nor radio existed to teach him how the pros acted. Odds were very good he had never seen a big league game, and may not know anyone who has. Sandlot baseball has always been self-regulating, but there is usually at least some fundamental knowledge for kids to rely on. When he starts playing semipro ball, his manager was likely another player, probably under 30 years old himself. That man hopefully has some exposure to the basics being threshed out, but that was not guaranteed. Even if the skipper had basic knowledge of fundamentals, perhaps he cannot coach well. Depending on the club’s finances, he might be a business manager. If a kid could hit or possessed a strong arm, he would receive playing time, no matter how ignorant he was of fundamentals.

Thus you end up with the following story told by baseball historian Fred Stein. In 1897, a rawboned young buck called Honus Wagner began playing for the Louisville Colonels. His manager, a not yet 25-years-old Fred Clarke, told the kid to “lay one down” in his next at bat. Instead, Wagner hit a home run. Appreciative of the result but curious as to why the rookie ignored his instructions to bunt, Clarke asked Wagner what happened. Shamefacedly, the future Hall of Famer shortstop admitted he had never heard the phrase “lay one down” before. He had no idea what his manager was talking about. This was the situation Clarke, McGraw, and Chance contended with.

Fundamentals first have to be developed. Then they diffuse. Next, their instruction becomes institutionalized. Once the lessons become second nature to one generation, the next wave can be fully and immediately immersed in them. Nowadays, high schoolers are better versed in solid fundamentals than many big leaguers a century ago. After enough years and decades go by, fundamentals are so ingrained even Little Leaguers learn them, and you assume that everyone getting paid to play the game knows them by heart. Even a poor kid from the Dominican Republic has access to more knowledgeable adults and coaches than was the case for an 1890s Wisconsin farm boy.

This might oversell the point. At SABR’s annual convention in 2007, I heard Cait Murphy talk about what she learned from researching her book, and she was surprised at how advanced the level of play sometimes was. Examples of intelligent play existed – for instance the Cubs had worked out an impressive system of defensive signals amongst each other. However, such plays coincided with embarrassing miscues, as the floor for acceptable play was quite low. A wide discrepancy existed in the quality of fundamental ball played in these years. The more advanced examples of shrewd gamesmanship were often the result of major league managers instilling those values into their charges.

This explains why coaching fundamentals mattered so much for this generation of managers. The basic ideas of how to play had been worked out, now it was a time to diligently instruct them to the players. McGraw, Chance, and their ilk focused on the fundamentals because their players so sorely lacked knowledge that these pointers could significantly improve squads.

A century later, in his bestseller Moneyball, Michael Lewis introduced the phrase “market inefficiency” to baseball fans. He argued the 2002 A’s won 103 games despite a low payroll because they realized the baseball world undervalued the importance of on-base percentage. By exploiting this gap between reality and perception, A’s GM Billy Beane made his team a winner. A century earlier, the market inefficiency was fundamentals. The best managers, such as McGraw and Chance, were those who could transform raw clumps of talent into majestic creations. One should not underestimate how important sound play was back then. In the early twentieth century some teams made 100 fewer errors a year than their rivals. Combined with improved base running, solid mental play, and all those other little things, proper fundamentals were worth many wins.

Chris Jaffe is an instructor of history and a columnist for the The Hardball Times. He lives in Schaumburg, Illinois. For more information about Chris Jaffe and Evaluating Baseball’s Managers, visit the author’s website.

F/X VisualizationsFebruary 05, 2010
Thoughts on a New Box Score
By Dave Allen

I have fond memories of, as a child, reading box scores in the newspaper. In the pre-internet, or at least pre-internet in my house, days box scores in newspapers was the medium by which I, and I assume, most people consumed baseball data. The data were all there, tightly yet efficiently packed in a format that allowed you to pull out any or all you wanted without feeling overwhelmed. Each was small enough for box scores for all the day's games to fit on one page.

I still read box scores, the medium has changed to the internet, but the box score itself is largely the same. I guess the format has stayed largely the same since the mid-1800s. Some of the stats are different but the layout is very similar. Over 150 years with little change shows that the format is remarkably successful, but that does not mean there cannot be innovations. FanGraphs's WPA charts are not box scores per se, but are a very effective way of presenting what happened in a game.

I thought it would be an interesting exercise to attempt to create a new box score. I wanted it to retain the original box score's quality of presenting a relatively large amount of information in a relatively small space, but making that data accessible and not overwhelming. Beyond that I hoped my new method gave a more immediate feeling for the pace and tenor of the game, like the WPA chart does.

Here is my attempt. The image is may be too small, but I kept it that way so that it didn't push out the right margin of the page. You can click on it for a larger version. I used game one of the 2009 World Series for the example.
New Box Score
Each at-bat is represented by a bar, the height of which denotes the base the batter reached. White bars are for outs, black for hits or walks. The batter's progression around the rest of the bases that inning is indicated in gray (steals have a vertical black line through them). Runners on-base during an at-bat are indicated in red: circles for those not moved over in the at-bat, lines to show their progression as a result of the at-bat and an 'ex' if they were thrown or tagged out in that at-bat.

The score can be counted along as the black or gray bars reach the top. That also allows you to count individual batter's runs scored or pitcher's runs allowed. Red lines that reach the top are RBIs.

Compared to a traditional box score it is harder to find an individual player's line. For example to see that Chase Utley went 2-4 with 2 HRs, 2 runs, 2 RBIs, a strikeout and a walk you have to go through, find his at-bats and count all of the events. But the trade-off is, I think, this formulation gives a better feel for the pace of the game, and allows the events to be easily recreated: in the top of the first CC Sabathia escaped a base-loaded two-outs jam; Phil Hughes took over to start the eighth and walked the only two batters he faced, both of whom came around to score on Raul Ibanez's single; Utley's two solo-HRs were the only runs through the first seven innings; Cliff Lee didn't allow a runner past first until the ninth, and up to that point faced just three batters over the minimum; the Yankees burned through five relievers, who gave up four runs, in the last two innings; the top of the ninth ended with Shane Victorino getting thrown out at home on a Ryan Howard double and the game ended with two more Cliff Lee strikeouts. All of this can be easily seen through a close, but not difficult, reading of the chart.

What do you think of this format: Complicated and poorly laid out? Hard to read? Brilliant? I welcome constructive criticism in light of what you want from a representation of a baseball game.

Touching BasesFebruary 04, 2010
Hitters by Zones
By Jeremy Greenhouse

Few in MLB can beat a well-located pitch down and away. I wanted to look up those who could, so I broke the plate area down into nine zones, scaling the vertical component of the pitch for the batter’s height. For this analysis, I decided to restrict my sample to only 2009 pitches at which the batter swung. Here’s a crude chart showing the percentage of swings in each zone and how batters fare when swinging, indicated by color.

Zones.jpg

Batters have the advantage when the pitch is middle-middle, and for the other eight zones, the run value is negative.

Getting right to the leaderboards. There are nine of these, but I’m going to leave the commentary short and I’ll leave a spreadsheet at the end.

Down-In

Name Runs Swings
Derrek Lee 5.6 57
David Wright 3.8 72
Corey Hart 3.6 60
Hunter Pence 2.8 73
Carlos Delgado 2.6 11
Chase Headley -5.9 58
Ryan Braun -6.1 84
Aubrey Huff -6.2 56
David Ortiz -6.2 64
Ryan Howard -6.6 84

Ryan Howard and David Ortiz are similar type hitters who like the ball out over the plate but can get beat inside. Carlos Delgado hit a homer, three doubles and a single on his eleven swings at pitches down and in.

Down-Middle

Name Runs Swings
Joey Votto 10.6 193
Brian Roberts 9.9 204
Miguel Cabrera 9.7 191
Dustin Pedroia 6.9 150
Nick Markakis 6.8 160
Garret Anderson -11.7 174
Nate McLouth -12.3 125
Jack Cust -12.7 124
Dan Uggla -13.4 185
Derek Jeter -13.9 173

I’m surprised Derek Jeter’s on this list, as he’s a successful groundball hitter. Dan Uggla and Jack Cust on the other hand are fly ball hitters.

Down-Away

Name Runs Swings
Carlos Gonzalez 1.8 69
Denard Span 1.5 68
Ichiro Suzuki 1.4 121
Robinzon Diaz 1.2 18
Trevor Crowe 1.2 17
Hideki Matsui -12.8 107
Adam LaRoche -13.4 145
Jayson Werth -13.5 138
Ryan Howard -13.8 231
Brandon Inge -14.0 120

It appears foot speed is instrumental if one is to succeed by swinging at pitches down and away. I’m assuming the highest percentage of grounders are on pitches in this location, and speed is important to get on base via the grounder. Pitching Howard down in the zone seems to be a good idea.

Middle-In

Name Runs Swings
Martin Prado 13.2 87
Michael Young 10.9 132
James Loney 10.2 83
Mike Cameron 8.8 113
Derrek Lee 8.3 116
Willie Bloomquist -7.1 121
Lyle Overbay -7.2 42
Jeff Francoeur -7.6 172
Edgar Renteria -8.5 132
Mark DeRosa -14.1 125

Derrek Lee likes the ball inside.

Middle-Middle

Name Runs Swings
Prince Fielder 30.7 249
Mark Teixeira 29.9 294
Ryan Braun 29.6 281
Adam Dunn 25.3 294
Andre Ethier 25.2 323
Augie Ojeda -10.9 128
Nick Punto -11.3 191
Luis Rodriguez -11.8 129
Ty Wigginton -12.0 219
Dioner Navarro -13.1 174

This is clearly the most telling list in terms of quality of hitter. To be successful swinging the bat, you have to be able to hit the ball pitched down the middle.

Middle-Away

Name Runs Swings
Adrian Gonzalez 8.2 156
Robinson Cano 7.2 175
Ryan Braun 7.2 101
Nick Markakis 6.3 178
Brad Hawpe 5.9 228
Pedro Feliz -10.5 129
Jimmy Rollins -10.7 301
Chase Utley -11.1 232
Curtis Granderson -13.3 252
Aaron Hill -13.6 152

I already knew that Adrian Gonzalez and Robinson Cano excelled hitting the ball the other way, so it makes sense that they also excel at hitting outside pitches. The Phillies are not so good at hitting the ball when pitched away. They are good at baserunning, however.

Up-In

Name Runs Swings
Casey McGehee 5.1 84
Michael Young 5.0 85
Marco Scutaro 3.8 43
Seth Smith 3.8 14
Pablo Sandoval 3.1 81
Hunter Pence -7.1 77
Matt Holliday -7.7 85
Clint Barmes -8.0 75
Jhonny Peralta -8.6 85
Michael Cuddyer -10.3 123

Michael Young also likes the ball inside. He beat out Lee by six runs last year on pitches at least half a foot inside. Seth Smith had seven hits on the 14 pitches he swung at up and in, including four for extra bases.

Up-Middle

Name Runs Swings
Michael Cuddyer 10.7 186
Raul Ibanez 9.7 114
Aaron Hill 9.6 223
Kevin Youkilis 7.5 172
Todd Helton 7.4 168
Orlando Cabrera -10.3 204
Jason Giambi -11.3 109
Mike Cameron -11.6 122
Jose Bautista -11.9 136
Mark Reynolds -13.5 177

Michael Cuddyer was last at pitches up and in, but first at pitches up and over the plate. I find this very interesting. If you’re a pitcher, you can jam Cuddyer, but you better not miss.

Up-Away

Name Runs Swings
Albert Pujols 5.5 82
Matt Wieters 4.7 42
Chris Coghlan 4.7 76
Matt Kemp 3.9 56
Jacoby Ellsbury 3.8 58
Jimmy Rollins -6.0 110
Rafael Furcal -6.1 93
Jorge Cantu -6.4 56
Brian Roberts -7.3 76
Emilio Bonifacio -8.0 73

It took you a whole article to find Albert Pujols at the top of a leaderboard. My analysis confirms Rich Lederer's preliminary hypothesis. Pujols continues to be good.

Here's a spreadsheet containing all hitters with at least ten pitches swung at in a zone. And why not? Pitchers too.

Change-UpFebruary 03, 2010
Josh Beckett: To Extend or Not?
By Patrick Sullivan

Whether you think they've shaped up as a bunch of banjo-hitting ninnies or the stingiest run prevention unit this side of the 1968 St. Louis Cardinals, or both, or somewhere in between, the Boston Red Sox have set their 2010 roster for all intents and purposes. While Red Sox players and fans alike gear up for another exciting season with high expectations, it falls to the Boston front office to focus on longer term roster planning, no small task given the personnel shifts that are sure to continue.

In the lineup David Ortiz, Victor Martinez and Adrian Beltre will become unrestricted free agents at the end of the 2010 season. Red Sox closer Jonathan Papelbon's contract also expires and given his not-so-subtle eagerness for his big payday, it's fair to say he will probably be moving on. The most critical looming free agent decision, however, will center on Josh Beckett. Beckett will pitch out his 30-year old season this year, his fifth in a Red Sox uniform.

The choice to extend Beckett will test Theo Epstein and his Baseball Operations staff. Beckett's popular, both with teammates and Boston's rabid fan base. We all know that Beckett has experienced an inordinate amount of post-season success. And yet, whether it's a nagging injury here or there, his proclivity to give up the gopher ball or the mere fact that he will be 31 in the first season of his new contract, the Red Sox have a number of red flags to consider. Let's take stock of the factors surrounding Beckett's case.

The first thing to understand is that Beckett is a truly elite pitcher. Since he joined the Red Sox, let's look at where he has ranked in the American League in both xFIP and Wins Above Replacement (WAR):

          xFIP      WAR
2006       21       30
2007        4        2
2008        2        8
2009        7        7

In just under 800 total innings pitched since 2006, Beckett has a 116 ERA+ but if you take out his outlier 5.01 ERA season his first year in Boston, that ERA+ figure jumps to 126 while averaging just under 200 innings per season. To see how he has stacked up since 2007 with other American League pitchers, consider below:

                IP      ERA+
Greinke        553.2     149
Halladay       710.1     141
F. Hernandez   629.2     133
Lackey         563.2     129
Sabathia       593.1     129
Beckett        587.1     126

You get the picture. Josh Beckett is an excellent power arm with historically standout peripherals and dependable durability, and that's a critical part of this equation. He's not Mike Hampton or Barry Zito. And yet, before you commit the sort of dollars it will take to secure Beckett's services, it's essential to understand how pitchers perform from 31 on.

Above, I showed where Beckett stacked up among American League pitchers from 2007 to 2009 with at least 500 innings pitched. Applying the same parameters but extending it out to include the National League and pitchers 31 and older, we get a total of 10 pitchers (as opposed to 35 under 31). Half of them posted ERA+ totals under 100 over that time, and the rest of the list looks like this:

                IP      ERA+
Lilly          588.2     124
D. Davis       542.0     110
Lowe           605.2     108
Pettitte       614.0     104
Washburn       523.1     102

The rest of the list includes Kevin Millwood, Jamie Moyer, Braden Looper, Jeff Suppan and Livan Hernandez. Aside from Ted Lilly, I think the Red Sox would be disappointed with output in line with any of the other 9 pitchers. But let's tinker with the list further. Let's say the Red Sox or any other team giving Beckett 5 years would like him to average 175 innings per season. So let's set the following Play Index list parameters: at least 875 innings (5x175) with an ERA+ of at least 110 from 2000 to 2009, age 31 and older. Here is what we get.

Rk Player ERA+ IP Age Tm
1 Randy Johnson 137 1885.1 36-45 ARI-NYY-SFG
2 Roger Clemens 134 1454.1 37-44 NYY-HOU
3 Curt Schilling 133 1569.1 33-40 TOT-ARI-BOS
4 John Smoltz 132 1058.2 34-42 ATL-TOT
5 Pedro Martinez 126 935.0 31-37 BOS-NYM-PHI
6 Greg Maddux 117 1939.2 34-42 ATL-CHC-TOT-SDP
7 Mike Mussina 116 1790.2 31-39 BAL-NYY
8 Tom Glavine 114 1753.2 34-42 ATL-NYM
9 Andy Pettitte 113 1342.0 31-37 NYY-HOU
10 Al Leiter 111 1096.1 34-39 NYM-TOT
Provided by Baseball-Reference.com: View Play Index Tool Used
Generated 2/3/2010.

Whoa. You might have to go to the very bottom of that list before you even get to a non future Hall of Famer. In Major League Baseball, only the truly elite starting pitchers survive. And Jamie Moyer and Tim Wakefield, I suppose, but that's another story.

The first lesson here is that it's critical to understand that there is a premium to be paid on the unrestricted free agent market, and that you have to recalibrate performance expectations. You might not get the late-aughts Beckett for his next contract, and it might feel like you've overpaid at times, but when you consider how much value Boston got in this last contract, it could all even out. Let's take the John Lackey deal as an example and given Lackey's similarities to Beckett, it's not a bad proxy at all. If you believe Fangraphs free agent dollar values assigned to each win, all the Red Sox need from Lackey to make the deal worthwhile is output like Scott Baker or Carl Pavano produced in 2009, or Andy Sonnanstine in 2008. Can Beckett do that in his 31 to 35 seasons? Maybe.

The second lesson is that, given the odds of a 30-plus pitcher living up to his end of the deal, there are probably better areas to allocate your free agent spend. In Boston's case, this is especially true given the commitment they have made to John Lackey this off-season. As a Red Sox fan, I am not ready to state explicitly that they should let Beckett walk but $35-$40 million committed to Lackey and Beckett annually from 2011-2014 has the potential to hamper Boston's flexibility. As with anything else, this decision will come down to Boston's ability to meld medical, scouting and performance analysis insight to generate an accurate projection of Beckett's future output.

Now don't mess it up!

Behind the ScoreboardFebruary 02, 2010
There Are Two Types of Players...
By Sky Andrecheck

In this article, I'll attempt to finish the title's sentence by doing a principle component analysis on player statistics. Going into this I had no idea what I would find or whether the principle component analysis would find anything interesting at all.

For those unfamiliar with the type analysis, the point of it is to reduce a large number of potentially correlated variables down to a few key underlying factors that explain the variables. The researcher feeds the computer a bunch of records (in the this case, players) and several key variables (in this case, their statistics), The computer, blind to what those variables actually mean, spits out a set of underlying factors which explain the "true" underlying causes for the variables in question. It does this by maximizing the variability between the players. It's then up to the researcher to interpret what each factor represents. In this case, I'm looking for the one underlying factor that best describes a player.

In the baseball world, I wondered what one underlying factor best determined a player's statistics. Normally, this type of analysis would be done on many more variables, but I wanted to see what it would pick out from players' basic, non-team influenced statistics: 1B, 2B, 3B, HR, BB, K.

The principle component analysis spits out a bunch of factors, each with decreasing importance in determining a player's statistics. Only the first one really had much meaning to it, and with only six variables to analyze, this wasn't much of a surprise. The analysis attempts to differentiate players as much as possible, but the big question was how did it divide the players? It could have pitted good players vs. bad players, power hitters vs. contact hitters, patient players vs. free swingers, etc. But what happened?

In fact the factor loadings for the first principle component were as follows:

1B -.556
2B .132
3B -.259
HR .502
BB .382
SO .456

As it turns out, the analysis shows that if you want to put the players into two distinct camps, one camp (whose overall scores will be positive) is made up guys who hit with power, walk a lot, and strikeout a lot, while another camp (whose scores will be negative) is made up of guys who hit a lot of singles and triples and make contact.

I actually think this makes a lot of sense in describing a player's hitting style in just one number. While of course there are plenty of metrics out there to determine a player's skill and value to a team, there isn't a single metric that describes a player's playing style on a sliding scale. A Batting Style score using these values as weights does just that.

On one end of the spectrum are contact hitters, small-ball, Mike Scioscia/Ozzie Guillen type players who make their living with singles, triples, and not striking out much. The other end are Earl Weaver/Billy Beane type players who hit homers and draw walks. Which type of player a man is best determines his statistics. It's Moneyball vs. small-ball. This one number represents the spectrum of playing styles.

To get a Batting Style score for each player, we can simply multiply their normalized statistics by the weights above. Doing so gives a normally distributed set of players with a range going from about -4 to 4. To make the results a little more intuitive, I converted this to a scale where the average was 100 with a standard deviation of 15. Players with high scores are "three true outcome" type players while those with low scores play with the opposite style.

How does the Batting Style number look according to 2009 data? The top ten most extreme players of each batting style are shown below:

style2.PNG

Now, it's hard to imagine a two more different sets of players. Everything that the first group of players does well, the second group does poorly, and vice-versa. Both sets have some good players and some bad players, and whether a player is good or bad doesn't much affect his Style score. Adam Dunn and Jason Bay provided good hitting value to their clubs, as did Jacoby Ellsbury and Ichiro, they just did it in different ways. A stat like wOBA tells you the value of a particular player. For instance, in 2009 Russell Branyan had a wOBA of .368 and Ichiro had a wOBA of .369. So they seem like pretty much the same player, right? Of course not. Ichrio and Branyan have two completely opposite styles of play. Ichiro has speed, gets a ton of singles and rarely homers, walks, or strikes out. Meanwhile Branyan's entire value is based on the long ball and the base on balls. The Batting Style score shows the immense difference between the two players. Branyan has the fifth highest Batting Style score, while Ichiro has the second lowest score.

Of course, not every player falls into one of these two types. Players who have a "medium" style can have moderate scores on each metric. For example, Ronnie Belliard does everything about average, hence his Batting Style score is about average. It also includes unusual players who don't fall into the usual patterns. Aaron Hill doesn't walk much or strikeout much, but he hits homeruns. Hence, his overall style falls in the middle. Meanwhile Bobby Abreu walks a lot, but also gets a lot of singles. Hence, he doesn't fall into either extreme either. The Batting Style doesn't discriminate based on the skill of the player, although as you might expect, guys who have the power/walk Batting Style are as a whole slightly more valuable simply because guys who hit a lot of homeruns and take a lot of walks, are generally more valuable than singles hitters, though the difference is not major. Guys on the contact end of the spectrum have a wOBA of about 10 points lower than guys on the power end of the spectrum. You can check out the full list of player Batting Style scores here:

View image

It's also interesting to look at this same list through history. Which players had the most extreme styles of during each decade? The list below (including all players with at least 1000 career PA's) shows the top three extreme players in each decade.

style1.PNG

As you might expect, Babe Ruth is the original power/walk/strikeout player. As someone who revolutionized the game in that regard, it comes as no surprise. Harmon Killebrew, Mark McGwire, Dave Kingman, are others that famously fall into that same mold and are identified here. Meanwhile, Willie Wilson, Nellie Fox, and Matty Alou are on the other end of the spectrum - precisely the guys that you would expect. The analysis was run on the dataset as a whole (though to really be correct, it really should be run on each individual year). Over time, the styles have definitely shifted away from the contact approach and towards the power/walk style. Overall, there's not really a surprise in the bunch except for the fact that I've never heard of some of the older, more obscure players. Personally, I find both styles of player fun to watch as their extreme styles seem to make them more colorful, though I think that the power guys have historically caught more grief from fans and have been underrated up until the recent sabermetric revolution.

Whether a statistic like Batting Style has any real value to it or not, I think it's fun. Obviously, a line of six statistics isn't too hard to digest, but I like the idea of a single number describing a player's hitting style. In any case, it was interesting that the principle component analysis picked up on the two distinct styles and drew the scale the way it did. I think if you asked fans to name two completely opposite hitters, you would get a lot of Juan Pierre/Adam Dunn responses, which shows that the principle component analysis picked out an intuitive result.