Touching BasesDecember 31, 2010
The Bert Blyleven Awards
By Jeremy Greenhouse

In all likelihood, Bert Blyleven will be inducted into the Baseball Hall of Fame next week. This marks Blyleven's 14th year on the ballot, which places his year of retirement at 1992. I have never, not once in my life, watched Bert Blyleven pitch, but I sure have read a lot about the man. Blyleven was a workhorse who amassed piles of strikeouts, shutouts, and wins. His HOF candidacy over the years has taken a roller coaster ride. Detractors point to his merely decent winning percentage and lack of cultural impact, whereas his supporters make note of Byleven's sterling postseason record and legendary curveball.

What current pitcher is most similar to Bert Blyleven? The nominees:

Adam Wainwright

When you think of big curveballs nowadays, you think of Adam Wainwright. Over the last two years, Wainwright’s curveball has been worth 45.7 runs according to FanGraphs, 20 runs better than the runner-up. Wainwright doesn’t shy away from the pitch, throwing it a quarter of the time, the third-highest rate in the Majors. However, nobody can match the 40% rate Blyleven estimated that he threw in 1978. Blyleven was known for freezing batters with his curve, and Wainwright had at least one such famous moment. Both Wainwright and Blyleven threw their curveballs in unusual fashions. According to pitch grip expert Mike Fast, Wainwright's curve "is not quite a standard curveball grip in that his index finger is completely off the ball. Most pitchers lay it down alongside the middle finger on the ball." Blyleven, on the other hand, said that he "holds both his fastball and curveball across the seams." Blyleven recalled Sandy Koufax and Bob Feller pitching the same way, but at the time knew of no one else who did. I asked Mike Fast, and he is unaware of any current pitcher who exhibits this trait. Here's an image of a potential Blyleven curve.

Roy Oswalt

Like Blyleven, Oswalt has been a durable pitcher, averaging 200 innings per year in his career. According to Blyleven's manager Ray Miller, Blyleven was able to hold up year after year thanks to a smooth delivery with "a lot of leg drive," and Blyleven himself said "my durability as a pitcher comes from my legs more than my arm." 60ft6in's Sven Jenkins describes Roy Oswalt as "the ultimate 'drop and drive' pitcher.' He uses his legs to get the most out of his slight frame."

Blyleven's curve was the subject of Baseball Digest stories in 1978 and then again in 1989. Both times, he described two different variations of his curve. One, a "roundhouse curve" had a big, lazy break. The other, his "overhand drop" became his specialty. Several current pitchers throw multiple curves, including Bronson Arroyo, who can add and subtract from all of his pitches, and Chad Billingsley, who mixes in up to seven distinct pitch types. And Mike Mussina would have been a great Blyleven comp, given their durability, their propensity to throw breaking pitches, throw breaking pitches for strikes, and willingness to pitch to both sides of the plate. But Moose retired, so I'm not including him as a nominee. Instead, I think Roy Oswalt's array of curveballs aligns best with Blyleven's description. Oswalt has a standard overhand curve that clocks in the high 70s, but Oswalt has explained that he also throws a slower curveball by choking the ball deep into his hand. Jenkins notes that Oswalt can vary the velocity on his signature 12-to-6 curve from the upper 70s to down into the 60s. On the left side of this image, you can see the distinct clusters forming Oswalt's curveballs. You can also see that the ball's axis of rotation approaches zero degrees at times.

oswaltcurve.jpg


Justin Verlander

Verlander throws a monster breaking ball. He is generally around the plate with his curve, too. Verlander's curve baffles hitters, but more importantly, it fools umpires as well. In one famous incident, Blyleven got so fed up with an umpire's refusal to call his curveball for strikes that he began to throw batting-practice fastballs, afterward saying, "if he's not going to call my curveball for strikes, then I'm just going to throw my fastball down the middle." Verlander had a notable argument with an umpire this year for "not getting the strike call on back-to-back breaking balls around the inside corner."

Here is the called strike zone for Verlander's curve over the last two years.

Verlandercurve.jpg

I guess the only way you can tell whether the zone is fair or not is by counting the number of green points inside the strike zone box and the red points outside it. The method I used in determining that Verlander's curveball was the most umpire-unfriendly in baseball controlled for batter handedness, batter height, and pitch movement. It showed that Verlander has been screwed out of about 50 strikes, 20 more than anyone else. By comparison, here's the curveball strike zone for Javy Vazquez, to whom umpires have been more generous. Pay particular attention to the area down and away from RHBs.

A.J. Burnett

Ranking in terms of "stuff," Stephen Strasburg and a plethora of relievers boast the nastiest curveballs. But for starters with some degree of longevity, Burnett's is the hardest to hit. Burnett's curveball induces whiffs on 45% of swings, an obscene number. That's partially because he's so wild, throwing his curve in the zone under a third of the time. Blyleven and Burnett had similar philosophies about where to throw their curves, if not similar execution. Blyleven said that he "keeps the ball low and away to a righty," which appears to be Burnett's intention. Against lefties, Blyleven would try to "nick the outside corner" or "break it low and in." Again, this fits a visualization of Burnett's curve vs. LHBs. The problem is that where Blyleven threw strikes, Burnett throws wild pitches. Like Blyleven, Burnett is almost exclusively a two-pitch fastball/curveball pitcher, at times tinkering with a show-me change. Blyleven said that he threw his fastball in the low 90s and his curveball in the mid 80s. Burnett comes as close as it gets to fitting that profile.

Burnett also gets a nod for reportedly loosening up the Yankee clubhouse. His trademark is the cream pie, while Blyleven was a master at the hot foot.

Chris Carpenter

Carpenter, like Wainwright, throws a whole lot of curveballs, and he throws them well. Carp and Waino throw with similar velocity, movement, and release points. Few can spin the ball like these two. What sets Carpenter apart is that, like Blyleven, his fastball might be his better pitch. Wainwright's curveball has dominated baseball over the last two years, but Carpenter is the only pitcher in baseball with a fastball ranking in the top ten in terms of run value in addition to his top ten curveball. Blyleven said that, "my fastball was my best pitch, because it set up my curve. The control of your fastball is the key to success for any pitcher -- and not being afraid to pitch hard inside." Just last week, he said on the Jonah Keri Podcast, "my curveball was a very good pitch for me, but it’s my fastball that set it up. Establishing the fastball on both sides of the plate set up my curveball." Carpenter pitches to both sides of the plate with his fastball. Pretty much anywhere so long as it's a strike. And when he is able to set up his curveball with a fastball, nobody has a chance. Carpenter's curve is on average 1.5 runs per 100 pitches above average, but when preceded by his fastball, it's 3.5 runs above average.

I submitted my ballot to Rich Lederer, who was given the final say on whom to elect for the Bert Blyleven Award:

-----

Rich: Jeremy sent an email a few days ago informing me that he wanted to "compare Blyleven to modern-day pitchers using PITCHf/x data for people like me, who never got to see Blyleven pitch." Here is my return email to Jeremy.

I believe Roy Oswalt, Adam Wainwright, Mike Mussina, Josh Beckett, and Chris Carpenter are good comps. Those would be my top five. All of these pitchers make sense if you think in terms of fastball velocity, wCB and wCB/C, WHIP, and K/BB.

Blyleven was a fastball/curveball pitcher. He threw an occasional changeup but it wasn't a significant part of his repertoire. His roundhouse was the so-called "slow curve" and the overhand drop the "12-to-6 hammer curve" that was his out pitch. With no public postings of radar-gun readings in those days to measure his fastball, my guess is that Blyleven threw a low-90s heater with the ability to dial it up to the mid-90s on occasion during the first half of his career. He definitely threw hard but his fastball more or less set up his curve. He could throw strikes with his fastball and curveball on both sides of the plate and at any point in the count.

Bert was also a workhorse. He threw more than 270 innings in eight different seasons. Of note, the 293.2 innings he pitched in 1985 has not been surpassed in the past 25 years. Leading the AL in home runs allowed in 1986 and 1987 had as much to do with ranking first and fourth, respectively, in innings pitched as it did with being around the plate a lot and hanging a few curveballs. However, for Blyleven's career, he was right at the MLB average for allowing homers (2.1% vs. 2.0%) and, in fact, gave up fewer HR/9 than a composite of his eight most similar HOF pitchers.

As it relates to his comps, Oswalt's fastball has averaged 93.1 mph during his career. Wainwright 90.6. Mussina 88.3 since 2002, probably more like 90ish in the earlier part of his career. Beckett 94. Carpenter 91.5 since 2002. The latter took much longer to develop and has missed more time to injuries than Blyleven. I think these are all good comps though. 90-94 mph fastballs with outstanding curveballs, excellent control and command, and somewhat similar K and BB rates.

I didn't realize I had final say on the Bert Blyleven Award (singular) until Jeremy returned with his nominations. The truth of the matter is that I believe a composite of Oswalt and Wainwright would be one heck of a match. A righthanded starting pitcher with a 92 mph fastball and a hellacious curveball with outstanding control and the ability to miss bats.

The winner? Roy Wainwright. Or is it Adam Oswalt? OK, make it Roy Oswright. Or even Adam Wainwalt. Yeah, it's one of those guys.

For what it's worth, here is a statistical comparison between Blyleven's career through his 32-year-old season and Oswalt:

Blyleven-Oswalt.png

Similarly, here is a statistical comparison between Blyleven's career through his 28-year-old season and Wainwright:

Blyleven-Wainwright.png

-----

This marks my final piece as a regular contributor to Baseball Analysts. I'm no longer a student, which means that I now have to make my way out in the real world--the one with all the hard knocks. I'm much obliged to Rich for giving me a writing platform and always providing thoughtful comments on my work. Thanks to my fellow authors at Baseball Analysts for giving it 100% and no more because they knew doing so would be mathematically impossible. And thanks to the readers, especially to those who were generous enough to offer criticism. Catchphrase.

Touching BasesDecember 24, 2010
The Year in PITCHf/x Calibration
By Jeremy Greenhouse

This week, I handed in potentially the final paper of my academic career. It was titled, "The History of PITCHf/x." That is to say that I greatly enjoy thinking about, reading about, and writing about PITCHf/x data. So I don't mean to cast PITCHf/x in a negative light by bringing up its calibration issues, but data is kind of worthless without knowing the error involved. And while PITCHf/x is precise within a fraction of an inch, the accuracy is not always there, as some ballparks can report errors more along the lines of fractions of a foot.

The list of public analysts who have completed data correction systems is only a few names long. I believe Mike Fast, Josh Kalk, Harry Pavlidis, and Ike Hall have done some quality work in the area. My first pass is likely not as rigorous as their methods, but I feel I stumbled upon enough points of interest to warrant writing something up. My sample consisted of the fastest 25% of pitches thrown by each pitcher in each game. I compared the actual properties of those pitches to a set of expected values. These expected values were generated by finding the average properties of pitches thrown in other ballparks by the same pitchers. There were five values that I tested: the initial horizontal and vertical position (release point), the resultant horizontal and vertical position (plate location), and the pitch velocity.

One mid-august homestand in Houston jumped out at me. The graphs I present below contain the actual and expected values as detailed above, as well as the difference between the two, which loosely represents the magnitude of correction needed.

houstonfx.jpg

You can see that the actual release points and the expected release points follow each other quite well over the first half of the season. For instance, when two left-handed pitchers start, the average release point jumps to the opposite side of the graph. But then in August, the blue delta line spikes by a foot. I created a gif comparing all of Brett Myers' release points leading up to his August 13 game and his recorded release points in that game. Without context, it would be easy to draw the conclusion that Myers had altered his approach.

make gif

Some parks were consistently miscalibrated the entire year. Or perhaps the rubber on the pitching mound was off-center. Kansas City had on average a three-inch difference between the actual and expected horizontal release points. This was certainly the fault of Dayton Moore.

KCfx1.jpg

More importantly, Kansas City overstated velocity, a trend fortunately spotted by Jeff Zimmerman early on in the season. Here, the delta line is plotted on a different axis.

KCfx2.jpg

On average, the delta was 1.1 miles per hour, the exact same number reported by Mike Fast.
Mike published his own 2010 velocity corrections on THT, and I found the correlation coefficient between his and mine to be 0.8.

Texas was at the other end of the spectrum.

texasfx.jpg

And Detroit was fine until the final months of the season.

Detroitfx.jpg

Like Kauffman, Dodger Stadium was on average three inches off with its horizontal release points. Several parks deviated a couple inches from what we'd expect with their vertical release points. Again, rubber position and mound heights are not standardized across MLB, so it could be that pitchers do throw from different release points depending on the stadium. Citizens Bank and Yankee Stadium reported high release points, while Safeco and Petco came in lower.

Plate location adjustments are much harder to nail down. For one, the values reported by PITCHf/x around the plate are generally accurate, as they are more directly observed by cameras, as opposed to the release points which are extrapolated. Furthermore, pitchers vary their intended pitch locations much more than they do their release points. The park with the greatest pitch location abnormality is Yankee Stadium, and the reason is clear. The Yankees possess such a disproportionate number of left-handed batters that pitchers throw to the third-base side of the plate more than they would against any other team.

Correcting PITCHf/x data seems hard. Differences in a ballpark's configurations and a pitcher's intentions are difficult to separate from an oddity in PITCHf/x calibration. Including batter handedness appears vital, given that pitchers shift their position on the rubber or throw to a different side of the plate depending on batter handedness. I do not think that an automated correction system is the answer to correcting PITCHf/x data. I envision how hard it would be to pick up on sudden shifts in the data that stem from recalibrations without picking up on the random game-to-game noise. It would possibly be easiest to simply eyeball a span of time during which one fixed level of adjustment is needed.


Touching BasesDecember 16, 2010
More Observations on Pace
By Jeremy Greenhouse

One month ago, Lucas Apostolereris explored how much time pitchers take in between pitches, and FanGraphs added pace to its player pages shortly thereafter. Dave Allen went on to analyze batter's pace and make some other observations. It's taken awhile for this PITCHf/x timestamp data to be mined, but I've finally decided to get my hands dirty with it.

Like Dave, the way I'm calculating pace results in a 22.4-second difference between pitches, which is slightly slower than the FanGraphs calculation. (FanGraphs' method excludes pickoffs, which I'm not sure I agree with. I've always felt that a pitcher is pitching slowly if he throws to first a bunch.) Dave found that two-strike counts are the most time-consuming. There's certainly something there, but even more significant might be the pitch sequence of the at-bat. On average, 20 seconds pass between the first and second pitches of an at bat, while 30 seconds pass between the 10th and 11th pitches.

1-2 19.7
2-3 22.2
3-4 23.2
4-5 24.3
5-6 26.0
6-7 27.4
7-8 28.2
8-9 28.7
9-10 29.0
10-11 30.0

Batters are more likely to step out of the box the deeper into the at bat they go, and pitchers take more time to determine about their pitch selection. There is no such clear trend in the relationship between overall pitch count and pace.

Pitchers start out blazing coming out of the gate. Many pitchers don't even think, but rather try to solely establish the fastball. Pitches 10-20 cover the most difficult part of the batting order, when it is also likely that there are runners on base, so the pace slows down dramatically. After that, the data smooths out, and pitchers slow down the further along they go.

Back in April, Mike Fast* used the timestamp data to check on why Yankees vs. Red Sox games take so long, and he found that the reason was more than simply batters and pitchers taking a lot of time between pitches. It turns out that the average time between innings is a little over two-and-a-half minutes, which can fluctuate depending on teams. I believe that the umpire, under directions to restart the game following commercial breaks, controls the time between innings. Home teams with a lot of nationally-televised games (Dodgers, Mets, Yankees, Braves) are those that take over 2:40 between innings, while others (Royals, Blue Jays, Athletics) take under 2:30.

Mike has also done a very cool study on pace and defense.

Mid-inning relief changes last on average 3:15. Interestingly, Colorado, where there is an average break length between innings, allows pitchers the most time to warm up at 3:29. It is notoriously difficult to pitch in Coors, so it would make sense for relievers to be given some leeway with warm-up time. In Oakland, mid-inning changes only last 2:54 on average. Furthermore, the incoming reliever can dictate when he resumes play. Mike Adams and, unsurprisingly, Jonathan Papelbon, are in a league of their own, as it takes them four minutes to pick up play. A few A's pitchers (Andrew Bailey, Brad Ziegler, Jerry Blevins) keep it well under three.

The average time between at bats is 50 seconds. Carlos Pena is slow.

Pitchers only spend 11 seconds between pitches when issuing intentional walks. Otherwise, the game moves most quickly following called strikes. Balls in the dirt result in a loss of 10 seconds as compared to regular balls. Fouls with the runner going result in a loss of 10 seconds as compared to regular fouls.

How else might a game's pace be affected?

Touching BasesDecember 04, 2010
Thoughts on In Depth Baseball
By Jeremy Greenhouse

I like baseball heat maps. Really like them. They have captured the heat map that is my heart. I feel I should get that out of the way before I provide my thoughts on In Depth Baseball, TruMedia's baseball analytics platform.

During the 2010 postseason, I became aware of a new baseball analytics blog that specialized in such heat mappery. Behind the blog was one Rafe Anderson. Anderson had been a Boston Red Sox employee for six years before moving to TruMedia Networks, where he holds the titles of President and CEO. Now, Anderson has, along with programmer Jeff Stern, developed an analytics platform being marketed to MLB teams. I've had the opportunity to speak with Anderson on a couple of occasions, and he was generous enough to offer me a demo of In Depth Baseball (IDB).

IDB enters the marketplace in the same year as Bloomberg Sports (BBG). As they are in direct competition, I thought it would be natural to start by comparing IDB to BBG. Admittedly, I have had little experience with BBG.

BBG has a far sleeker layout than IDB. Here, take a look at screenshots of leaderboards from BBG and IDB. But IDB prides itself on not being "flashy," a possible dig at BBG's Flash-based platform. Consequently, IDB runs much more smoothly than BBG, while potentially at the same time making more sophisticated computations.

Now we arrive at the heat maps, a department that sets IDB apart from any platform I've seen before. Let's say you want to see the best contact hitters in the league. You go to the leaderboard and sort by contact rate, just as you would do on FanGraphs or anywhere else. But meanwhile, you can see an adjacent heat map showing the league average contact rate by strike zone location. And then, if you want to break that down into splits, such as LHBs vs. LHPs, both the leaderboards and heat maps update instantaneously. Furthermore, the heat maps are interactive in that you can isolate zones you want to look at by dragging your mouse into a certain area. After that, you can see who the best player in the league is in that zone, click on his name, and be taken to his player page, where the chosen filters remain constant. Other heat maps that I'm aware of are created in R, and it would take, conservatively, over a minute to process that much data. But it's not like the R ones even look any better than IDB's. The explanation I've been given is that Stern custom developed his own program, borrowing some fancy techniques that are used by chemical engineers. Well it's great, whatever it is. You can find quality heat mapping using IDB here and here.

Where IDB's heat maps sometimes fail are with smaller samples. For example, check the in play slugging heat maps used here. It's impossible to tell whether the observed trends are anything more than noise. Anderson says that the heat maps consider statistical significance, but from my experience, I've found that determining the right smoothing parameters is often more art than science. I would rather have an over-smoothed heat map than an under-smoothed one, as a heat map that shows no trends will at least tell you the player's mean performance, whereas a heat map with too much noise can lead you to draw false conclusions. It might be a failing of the analyst more so than the system to draw conclusions from such heat maps, because when you're looking at individual players, you probably want to choose metrics that stabilize quickly, like contact rate, called strike rate, or pitch frequency. But for analysts who don't regularly work with this sort of data, it would help if the smoothing parameters were refined for metrics such as in play slugging, which will rarely have a large enough sample to be highly consistent for individual players.

While the heat map is the bread and butter of In Depth Baseball, I feel that the most important part of any database system is how well it integrates video. Just as you can click on a player's splits to view different heat maps or spray charts instantaneously, his pitch-by-pitch log also updates. I don't think I can overstate how strongly I feel that every team should be using something more sophisticated than BATS to view video, and IDB obviously qualifies as a solution. The problem is that the pitches aren't directly linked to video streams, and instead, one must select certain pitches to a queue before watching them. If you want, you can pull up video of all Ryan Howard vs. LHP off-speed pitches in the last two years, but it would take a lot of clicks. I think it would make more sense if every video from the pitch log started on the queue, and then if you wanted to filter from there by using the splits section, videos would subsequently be removed.

I was highly impressed by the video quality, an area where IDB truly is "flashy." The Flash Player allows one to use slow motion, go frame by frame, or even change camera angles if multiple ones are available. I'm sure the playlists can be exported easily to hard drives if scouts don't want to come up with them on their own.

Bloomberg Sports holds an agreement with MLBAM, but IDB is fully independent outside of its team partnerships. Therefore, IDB has no license to video, and must borrow from teams. IDB has been able to work around this, as one thing Anderson stresses is that they use an Open API. You might be able to infer what that means, but from the TruMedia site, "This enables our partners to seamlessly integrate MLB analytics with relevant pitch by pitch video play lists within their own customizable user interface. Most importantly it allows organizations to keep their algorithms and metrics confidential." IDB has tools to incorporate HITf/x data or any other advanced data.

IDB already has an impressive advisory board, which gives them saber cred. I wouldn't be surprised if the fine folks at Complete Game Consulting have already played a hand in developing some of IDB's more advanced metrics. They have incorporated the "paint" set of metrics I believe to have been invented by Dan Brooks. IDB features "expected" values, too, and although I'm not quite sure how these are calculated, any metric with the word expected before it grabs my attention.

Another big thing is their "PZX" and "PVX" values, which measure angular velocity at the plate. They sound like something Matt Lentzner and Mike Fast discussed at this year's PITCHf/x summit, and if if I understand PVX and PVZ correctly, they could be the future way we measure movement (from the batter's point of view as opposed to the ball's). In addition, there are PVX vs. PVZ heat maps, so you can break down players by pitch movement the same way as by pitch location.

Alongside player heat maps are standard spray charts. The spray charts unfortunately use Gameday data, showing where the ball was picked up as opposed to where it was hit. Though you can mouse over a single hit to see the pitch details and video of it, for some reason you can't isolate zones like you can with the heat maps. So if I don't have the option of seeing video of all of a player's ground balls to the right side of the infield. It would make sense for IDB to add this feature.

There are other tools besides the league leaderboards and player dashboards, which contain the spray charts, heat maps, and video. One section which I didn't spend much time on is the "graphs" section, where you see a bunch of line graphs: a pitcher's fastball usage over the course of the season; a batter's contact rate by pitch velocity; a frequency distribution of a batter's ground ball angle. Pretty much any stat in line graph form. There's also a "comparisons" section, where you get an assortment of a player's heat maps side by side, such as how he does in different counts or by pitcher/batter handedness.

According to Anderson, umpire reports will be launched for the 2011 season, and they plan to venture into defense eventually as well.

While Bloomberg employs a team of programmers in research and development, Stern mostly by himself has created an incredibly powerful and efficient tool. Now, I've been wildly blown away by every database platform I've come across, but IDB certainly exceeds what is out there at all but a handful of MLB teams. What I could see making IDB so attractive to teams is that it is web based, and therefore available at all times. IDB looks fantastic on the iPad (I don't own one, so I guess everything I've seen on an iPad looks fantastic). Imagine watching a game in real time with iPad in hand and taking one click to instantly update a set of heat maps based on a change in the count or batter. So far, according to the Sports Business Journal, IDB calls the Padres and one other undisclosed team their clients. I have little doubt that IDB will continue to expand into a number of front offices, and with the news that TruMedia will be collaborating with Sportvision to provide MLB clubs with a minor league analytics platform, I am confident that the product will be that much better come Opening Day. I just hope that by then I'll still have the chance to see what IDB has had in store.

Touching BasesNovember 26, 2010
The Decade in Basic Fielding: Adjustments
By Jeremy Greenhouse

Last week, I looked at the decade's leaders in plays made per ball in play. Now, I'll take a look at the context in which they played.

This might not qualify as basic anymore, given the intensive amount of computation time that goes into these adjustments, but I do find them intuitive. I attempted to replicate the "without" part of Tom Tango's "With or Without You" system by finding how many plays the average fielder would have made given a specific fielder's set of circumstances. That entails deciding on a situation to control for, finding how often a fielder was in that situation, and calculating the rate of plays other fielders made in that situation. For example, third basemen are twice as likely to record an out on a ball in play if the batter is right-handed as opposed to left-handed. Therefore, if Eric Chavez faced right-handed batters 60% of the time this decade, while the league normally faces 58%, then we would need to take away a couple dozen plays made by Chavez to adjust for his advantage.

Below, I present the chart for batter handedness adjustments. The adjustment figure is the number of plays you would need to add to or subtract from each fielder's plays made due to context. The adjusted rate incorporates that adjustment.


batadj.jpg

A batter handedness adjustment doesn't make much of a difference for catchers, pitchers, or center fielders, but for players in the corners, it can be huge.

Last week, I found that Freddy Sanchez had the lowest rate of plays made at 2b, and Jack Wilson the highest at SS, and here we see that the Pirates must have faced a ridiculous number of RHBs.

The number of left-handed hitters the Yankees roll out could impact the numbers, as Nomar Garciaparra and Bill Mueller earn large adjustments, while Manny Ramirez comes out even worse than before.

Feel free to click on the links below to see similar charts to the one above.

Pitcher Handedness Adjustment

Pitcher handedness adjustments correlate with batter handedness adjustments. It seems to me, however, that batter handedness adjustments are way more useful in measuring fielding.

Those are easy to calculate and to comprehend. he next several are trickier. Calculating park adjustments when some players play every single day limits the "without" part of the sample. Here, take a look.

Ballpark Adjustment

Let's use, you guessed it, Derek Jeter and the Yankees as our starting point. Both Yankee stadiums have seemingly played exceedingly difficult for shortstops. Shortstops make plays in the Bronx on under 11% of balls in play, and the average is 12%. You might be thinking that Jeter drags down the average, but remember, I controlled for this by finding the rate of plays made when he wasn't on the field. You might also be thinking that the Yankees wide array of left-handed hitters drag down the average. That, I didn't account for. So it's tough to say what can be attributed to the ballpark. Maybe the grass is shorter or greener or something. Or maybe the Yankees play with poor fielding shortstops and hit with players who don't hit to that side of the field. The same could be said for Jimmy Rollins, who has dominated the shortstop position for the Phillies over the last decade, and his own lineup is also dominated by left-handed hitters. I think it would be too hard and probably not worthwhile to try to determine ballpark adjustments for infielders.

The conclusion that I think can be drawn from these ballpark adjustments is that Coors Field kills outfielders.

Pitcher Adjustment

I think there's some good stuff in there.

Jimmy Rollins and Orlando Cabrera have played in front of stingy pitchers, whereas Miguel Tejada and Rafael Furcal have benefited from pitcher generosity. Chipper Jones as both a left fielder and third baseman moves close to average when you control for the pitchers he's had to deal with.

Batter Adjustment

Rollins, playing behind pitchers who were unfriendly, fielded in front of hitters who helped him out a fair deal.

These next two will be heavily biased, but I thought they might be interesting.

First Baseman Adjustment
Center Field Adjustment

There are a lot of conflating factors here, as first basemen and center fielders might play every day with their teammates, killing the "without" sample, and they share a ballpark every day, bringing in other effects.

Anyway, I wasn't surprised to see that Chase Utley and Jimmy Rollins might have made more plays with a different first baseman.

With center fielders, I was looking for evidence of ball-hogging, but don't think I found any.

Batted Ball Adjustment

This is the only time I'm not using the entire 2000-2009 dataset, as a significant portion of balls were not classified. Most, if not all, unclassified balls went for hits, so the adjusted rates are all higher than the league average rates.

Three of the top five pitcher adjustments go to guys who played for the Braves, which means they generated a lot of ground balls. This results in the Joneses getting underappreciated as outfielders, especially Andruw, who I showed last week was one of the best at catching balls in the air, and now we see that he had hundreds of fewer opportunities than he would have playing for another team.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Touching BasesNovember 18, 2010
The Decade in Basic Fielding: Leaderboards
By Jeremy Greenhouse

The Gold Gloves were announced last week, and I know what you're thinking; if only there was another metric to evaluate fielders. Well, sorry to disappoint, but I don't have it in me to come up with an original acronym. Anyway, there was this really interesting thread on The Book Blog in which Tangotiger posted a simple yet powerful leaderboard consisting of outs made per ball in play for all active shortstops. Derek Jeter came in last. Spanning the entire 2000-2009 timeframe, one would have to have faced extraordinary luck to not deserve one's place at the very top or bottom of such a basic leaderboard. There's really no arguing with it. (If you want to argue, Colin Wyers went in depth on the subject at Baseball Prospectus.)

I found every fielder's out-per-ball-in-play rates as well as the average conversion rates at each position. Nothing special. No handedness or batted-ball adjustments, no plays-to-runs conversion. Below, I present the top five and bottom five at each position sorted by total plays above and below average.

allfield.jpg

Going from one to nine:

Greg Maddux was probably something like three standard deviations from the Major League mean with his pitching ability. That pales in comparison to his fielding prowess. He turned balls in play into outs as often as Carl Crawford and Ichiro Suzuki. Daniel Cabrera did not do a single thing well on the baseball field other than throw hard.

I've always said that Yankee fans should give Jorge Posada more credit for his fielding. Wait, that's not right. Maybe I mean Brett Gardner. Seeing Posada top a defensive leaderboard is throwing me off.

Albert Pujols: good at baseball.

Orlando Hudson is over 100 plays better than the next closest fielder at any position. You might say he's the basic man's Adam Everett. Freddy Sanchez rates as well as Hudson in several advanced fielding metrics. Considering Jack Wilson played counter Sanchez for many years, there could be a large ball-hogging effect going on.

There has been no ball-hogging effect on the left side of the Yankee infield. A-Rod finishes last for third basemen, and of course Jeter lags all shortstops.

I wonder why Carl Crawford never picked up center field, considering his greatness in left. I've noticed that Garret Anderson is often called underrated by television announcers, given his ability to rack up hits. When I learned about secondary offensive skills, I decided then he was overrated. Then I saw his fielding numbers, and it turns out he's pretty good in left. Maybe he's been rated properly all along.

Darin Erstad had a run where he was something like a true +30-run center fielder. The astute reader will notice a similarity between the LF and CF leaderboards. Foreshadowing.

Yes, Randy Winn appears in the top five of all three outfield positions. Also, Brad Hawpe: bad at fielding.

I made a bunch more leaderboards by varying the data I used as opposed to adjusting the original dataset, which I will do next week. For example, I restricted my sample to only RHBs or only LHBs.

Right-Handed Batters
Left-Handed Batters

If you click on the links, you will see an image similar to the one I used in this article. Different data, same methodology. I don't expect anyone to click on more than a couple, so I will provide brief commentary.

Batters pull grounders and go the other way on fly balls. This results in shortstops making fewer outs against left-handed batters than second basemen, first basemen, left fielders, or center fielders. At some point, it must be optimal for fielders to switch positions depending on the batter's tendencies. I'm sure once that started to happen, a rule would be put in place to deter such delays.

Mariano Rivera turned 10.55% of balls in play into outs himself when facing LHBs. Maddux was 7.52%, the league average was 4.36%, and Cabrera came in at 1.76%. That 10.55% mark can explain a fair amount of Rivera's extraordinary .263 career BABIP. He's a gifted athlete who is said to play a quality defensive center field. Plus jamming LHBs with his cutter can result in easy bouncers right back to the mound.

Right-Handed Pitchers
Left-Handed Pitchers

I don't know if any advanced fielding metrics control for pitcher handedness, but I'd imagine any adjustments made would be negligible.

Ground Balls
Air Balls
Bunts

Jeter has been very good at catching balls in the air in his career, but that only highlights his inability to field grounders. At least he might be better than Yuniesky Betancourt. A-Rod showed up in the top five among shortstops on air ball plays, but bottom five among shortstops and third basemen on grounders. Robin Ventura blew away the third base field by converting over 20% of grounders into outs. Damion Easley was first on grounders and close to last on balls in the air. Jason Varitek was last on grounders and first on popups.

Ichiro has forced out four players on ground balls.

There's a massive range for pitchers in how often they field their own bunts. Javier Vazquez and Carlos Zambrano control 50% of bunts themselves, while Jon Lieber and Ben Sheets make outs on under 25%.

Two Outs
Less Than Two Outs

Overall defensive efficiency is ten points higher with two outs than it is otherwise. I don't know if it follows that their should be a fielding adjustment.

Fenway Park
Coors Field

DERs at Coors and Fenway were .665 and .676, respectively. Brad Hawpe and Manny Ramirez were both 80 plays below average in their respective parks. It's tough to say if Jason Bay played good defense in Fenway or if Manny's insane awfulness made it appear that way. I've been under the impression that J.D. Drew is a really good defensive outfielder, yet he's made only 6.6% of plays in Fenway's oddly-shaped right field, while most RFs turn around 7.5% of balls into outs. Maybe there's a Coco Crisp ball-hog effect?
Juan Pierre showed up on the bottom five overall list for center fielders, but he played in an impossibly difficult Coors Field, and actually did well there.

Next week I'll take a look at basic fielding adjustments.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711. 2010 data is out!

Touching BasesNovember 11, 2010
Thoughts on the AL Cy Young
By Jeremy Greenhouse

I don't much mind groupthink so long as I'm part of the group. Well, then I don't really consider it groupthink, do I? Just a bunch of people being right. And I like being right.

So when Baseball Prospectus released its Internet Baseball Awards, I was confused. Felix Hernandez won the greatest consensus of any category. My pick for AL Cy Young was Cliff Lee. Either I'd badly miscalculated, or people have been converging on an opinion that could well be wrong.

Now, I'm not saying people are wrong. (Of course I do think they're wrong. I chose Cliff Lee.) It's just that there's no way Felix was so dominant that he deserves 80% of the vote. Lee and Price and Liriano and Lester and Weaver and Sabathia were all fantastic. So what makes Felix stand out?

Felix led the league in both innings pitched and ERA. I'm not really sure that I care about innings or ERA, though. Hold on. I obviously do care about innings pitched and ERA. But I see the numbers and I just think wouldn't it be nice if some smart people converted those numbers into a total value metric? Fortunately, the good folks at Baseball Reference, FanGraphs, and StatCorner have taken it upon themselves to provide us with WAR. Felix tops the AL on B-Ref, while Cliff Lee leads the Majors in WAR according to FanGraphs and StatCorner.

The difference between the methodologies is that Baseball Reference relies on ERA, whereas the others use defense-independent metrics. And why did Felix have such a superior ERA compared to Lee's all-time great strikeout-to-walk ratio?

Cliff Lee suffered a .347 BABIP with men on base while Felix Hernandez held opponents to a .239 mark.

It's easy to attribute ball-in-play results and event sequencing to luck, but if I were to do that I wouldn't have much else to write about. Therefore I looked into Lee's and Felix's pitching approaches with men on base and nobody on.

Felix's first full season was 2006, when he allowed a .357 BABIP with men on base. Since then, he has lowered his BABIP by at least 24 points in each successive year. If you believe in such trend analysis, then this would be evidence that Felix is doing something right with men on. Cliff Lee, in the two years since his reinvention, has allowed .306 and .264 BABIPs in man-on situations, indicating this year could have been nothing more than a fluke.

Most pitchers throw somewhat softer with men on base than with nobody on. Pitching from the stretch can lead to diminished velocity. Trying to induce groundballs means sacrificing velocity for movement. Justin Verlander is one guy who pitches with another gear at times. I found that he adds over a mile per hour to his fastball with men on, while previously it was shown that he adds velocity in high leverage spots and with higher pitch counts. On the other hand, Stephen Strasburg not only went to his two-seamer more often with men on base, but he also suffers pitching from the stretch.

Both Felix and Lee throw slightly harder with men on base, and both also significantly up their groundball rates. Lee throws more cutters with men on while Felix throws more tailing fastballs. The thing is, they've kept rather constant approaches from 2009-2010. Considering that Lee has better DIPS numbers with men on than does Felix, I fail to see evidence that Felix deserves credit for achieving better results than Lee. Felix added a full win in Clutch value this year. Lee lost a win. I don't think either deserved their respective fortunes.

I've looked at the numbers for quite a while, and I'm not all too confident with my pick. But I don't see how everyone else can be that confident with theirs. The competition was really really tight. I think Felix winning the AL Cy would mark a sign of progress for sabermetric thought. Felix winning by a landslide could mark a step backwards.

Touching BasesNovember 08, 2010
Batted Ball Location Leaderboards
By Jeremy Greenhouse

There has been a distinct void in batted ball leaderboards this year, as Dave Studeman has been saving up all the good stuff for this year's THT Annual. (Buy it!) This is the third and final year I'll be writing my tangential column, and now you can find the relevant data yourself in FanGraphs' player splits section. Without further ado, here are the best and worst pull hitters of 2010. I have a feeling who will be number one.

Value of Pulled Batted Balls

2010pull.jpg

The past couple of years, I've had an idea who would be the top pull hitter. Some hitters like Ryan Howard, Jim Thome, Adrian Gonzalez, and Derek Jeter are renowned for their opposite-field prowess. But this year, Jose Bautista's batted ball distribution made the mainstream, getting written up in Sports Illustrated, USA Today, and ESPN. His 57-run mark is the highest I've ever seen, and his 47 pulled homers are the fourth most in the Retrosheet era. The requisite spray chart:

joeybats.jpg

There was definitely something about the Blue Jays approach this year as a team. They led the league in home runs with 257, 46 more than any other team. They hit fewer line drives but more fly balls than anyone. Their 13.6 home run per fly ball rate was also tops, and way higher than last year's 10.4% mark. Bautista added ten percentage points to both his fly ball rates and home run per fly ball rates. Less noise was made over Vernon Wells, who somehow went from 0 WAR to 4 WAR, also posting a career-high number of pulled homers.

Dan Uggla, aided by Florida's short left-field fences, has added at least 30 pulled runs of value in every year of his career. Albert Pujols is the only player to appear in the top ten each of the last three years.

Juan Pierre appeared at the plate 734 times this year. Is that not dumbfounding? He is a really good baserunner, I suppose.

Value of Center Field Batted Balls

center10.jpg

Josh Hamilton hit 19 homers out to center, which is impressive. But I'm more interested to know about the ground rules concerning that lawn out there in straightaway center in the Ballpark at Arlington. Is it like the black at the old Yankee Stadium?

Good things happened when Hamilton, Carlos Gonzalez, and Joey Votto got the bat on the ball, as they boasted respective BABIPs of .384 and .390 and .361. Gonzalez and Votto have the 22nd and 23rd highest BABIPs of all-time. Only Shin-Soo Choo, Ichiro Suzuki, and Derek Jeter have higher BABIPs among active players, and I wouldn't be surprised to see CarGo or Votto pass all three of them.

Carlos Lee hit more balls to center field this year than he did last but wound up with 23 fewer hits

Value of Opposite Field Batted Balls

Oppo10.jpg

The pull list is dominated by right-handed hitters while the opposite-field list is dominated by left-handed hitters, which suggest there is value to hitting the ball to left field. The BABIP on balls to left is about 40 points higher than on balls to right.

It's crazy that Adrian Gonzalez does this damage in PETCO. As a Padre he's hit only a third of his home runs at home. He'd make a lot of sense on the Marlins or Red Sox.

Votto did it. He ended the year without a single popup. He hits the ball with power to all fields. His worst results came from pulling the ball, and when he did so he still added 16 runs thanks to a 47.6% HR/FB.

Jim Thome slugged .125 on grounders and 1.405 on balls he put in the air. On first thought, I thought that might be something to exploit, but interestingly, according to Baseball Reference, he has the exact same career .958 OPS facing groundball pitchers and flyball pitchers.

Aaron Hill had a .196 BABIP, so unless he was pulling the ball in the air, it wasn't happening for him.

Remember, you can now find all of this stuff on FanGraphs. My hope is that soon we'll be able to take the next step and analyze these numbers using HITf/x.

Touching BasesNovember 04, 2010
MLB Salaries Over Time
By Jeremy Greenhouse

minwage2.gif

Several data sources were used including most prominently The Baseball Almanac and The Bureau of Labor Statistics.


Touching BasesOctober 28, 2010
Three-Ball Strategies
By Jeremy Greenhouse

The folks at Basketball Prospectus recently found that three-balls were undervalued. Does that mean that there's been an inefficiency in the accepted market inefficiency? I don't know. Commenter Guy also had ideas for how to study three-ball strategies.

Some batters never swing on 3-0 counts. Take the difference between the average 3-0 swing zone and strike zone.

30strategy.gif Overall, only 6% of those pitches are swung at. Of those 6%, I estimate 17% would be called balls. If batters never swung at 3-0, that means they would walk about 36% of the time on that pitch, as opposed to the current rate of 35%. Sounds negligible, and it's likely that if batters are able to do damage on 3-0, then they're right to swing at times.

Upon swinging, batters hit .390 with a .760 slugging average. That does not include the 54% of swings that either miss or result in fouls, thereby bringing the count to 3-1. Using linear weights, I estimate that batters currently add about a run per 100 pitches by swinging on 3-0 rather than always taking. I don't think the pure strategy "always take 3-0" is correct. That said, I also think that there are some pitchers who are so bad at throwing strikes or hitters so bad at hitting that such a strategy would be viable.


Along similar lines, batters are more likely to swing at full-count pitches than 2-2 pitches. What if we were to map out 2-2 strategies on 3-2 pitches? Well, I'm not entirely sure this makes sense, but I tried to do it.

I made the payoffs on 2-2 counts equal to those on 3-2 counts, then predicted run value while controlling for batter/pitcher handedness and pitch type. Mapping both predictions onto the 3-2 distribution, I found the overall difference in expected output to be similar to the difference I found between never swinging on 3-0 and the current strategy. Again, the current strategy proved more optimal. Unfortunately, graphing the differences didn't produce anything intelligible.

Decades of baseball evolution have brought us to the point where radical changes to current strategies can mostly be ruled out. But achieving equilibrium is a complicated process, and we would be doing the game of baseball and baseball players a disservice to think that there is no room for improvement. I'm more comfortable saying that batters might swing too often on three-ball counts than I am suggesting what their strategy should be.

Touching BasesOctober 21, 2010
Count Oddities
By Jeremy Greenhouse

I've been doing a lot of thinking about game theory and how it relates to pitch selection and swing rates. I finally decided to run some numbers to find the baselines for swinging, pitch selection, and strike throwing based on the ball/strike count.

The rate at which pitchers throw strikes aligns perfectly with the average run expectancy in each count. However, batters' swing rates are not likewise dictated by run expectancy. Instead, batters like to swing more the deeper they get in the count.

Batters swing 74% of the time on full counts, by far the highest percentage of any count. At the other end, they swing at only 6% of 3-0 pitches.

Pitchers simply aren't good enough at throwing strikes on 3-0 to warrant batters mixing their strategy between swinging and taking. Pitchers only hit the zone about 60% of the time 3-0, whereas they would need to hit it at least 70% of the time to make batters consider swinging I believe. Strangely, batters are eight times as likely to swing on 3-1 as they do 3-0. I think straight takes on 3-1 might be a viable strategy at times.

We already know and accept that batter's don't act completely rationally on the first pitch. Some players just don't like swinging 0-0, so they don't, and that's that. Yet they up their swing rates from 27% on 0-0 to 40% on 1-0, even though pitchers have similar pitch selections and locations and more importantly, the reward of taking is greater.

There is a 50/50 split between fastballs and off-speed pitches on 0-2 and 1-2 counts. Naturally, fastballs are thrown in the zone at a higher frequency. What's odd is that batters swing at more off-speed pitches on those counts.

The big question is, How much do batters learn from pitch to pitch? The deeper into his repertoire a pitcher must go, the greater the advantage is for the batter. There are probably advantages to taking pitches besides drawing balls. I don't think this applies to the full count, though, which might be why the swing rate is too damn high.

Here's the relevant data. I should note that I used the same strike zone model for all counts, which means that more pitches would be called strikes on 3-0 than listed as being in the zone, and fewer strikes would be called than listed on 0-2.

Count FB% Zone% Swing%
3-0 95.2% 58.5% 6.6%
3-1 85.0% 57.5% 54.3%
2-0 81.6% 55.3% 40.0%
3-2 69.4% 54.0% 73.7%
2-1 68.5% 52.6% 58.7%
1-0 68.6% 52.0% 40.7%
0-0 68.1% 50.2% 26.7%
1-1 56.4% 46.5% 52.9%
2-2 54.0% 43.8% 65.4%
0-1 55.3% 41.8% 46.1%
1-2 49.2% 35.7% 57.8%
0-2 52.4% 29.0% 49.4%
Touching BasesOctober 16, 2010
A Look at Optimal Swing Rates
By Jeremy Greenhouse

There's been some discussion over at The Book Blog on whether or not batters swing too often at full count pitches. For me, this line of thought started when I read Dave Allen's research that showed that batters are more likely to swing 3-2 than 2-2. I'll get back to Dave's work in a moment, but first an aside on my theoretical understanding of the situation.

In equilibrium, pitchers want to throw strikes at such a rate that batters are indifferent toward swinging. The way I've figured it, and I really might have figured it wrong, that means that on 0-2 and 1-2 counts, pitchers want to throw at least 80% balls, while on 3-0 and 3-1 counts, they want to throw at least 70% strikes. In turn, that means that batters want to swing at 0-2 and 1-2 counts when they are at least 20% sure that a pitch is a strike and take on 3-0 and 3-1 when they are at least 30% sure that a pitch is a ball. The benefit of taking a pitch on 3-2 is obviously much greater than it is on 2-2, as the reward of a ball is a walk. What I've found unique about the 3-2 count, and again, my theoretical prediction might be off, is that it is the only hitter's count that dictates that pitchers throw more balls than strikes and that batters swing at pitches that are probably balls.

Back to Dave's work, because it turns out that he did a followup study asking, "do batters swing too often in a full count?" Dave showed the difference in value between taking a pitch and swinging at a pitch based on pitch location. The area in which batters are just as well off swinging as they are if they were to take should also be the area where batters swing 50% of the time. However, on a full count, batters swing 75% of the time in that area, according to Dave's research. I really like his methodology, and to me it is proof that batters do swing too often on full counts. Unless I'm missing some flaw, which is why I tried to repeat Dave's process at the player level.

The first player I tried was Albert Pujols, and he proved to be a good test case.

dispujols.jpg

Red means swing, blue means take, and white means indifference. The black contour line estimates the player's 50% swing rate.

The best hitter in the game seems to know exactly when he should be indecisive, so to speak.

My hope was that this type of analysis would vindicate guys like Vladimir Guerrero and Brett Gardner, above average hitters with unique hitting styles. Unfortunately, the data indicate that Vlad swings at too many pitches out of the zone and Gardner at too few. It's easy to say that Jeff Francoeur should learn to take a pitch, but to offer such advice to Vlad is tricky, and probably wrong. And if umpires didn't call such an absurd strike zone to Gardner, it's possible that he would be correct to swing so little.

One hitter who never swings, and correctly so, is Elvis Andrus. He must recognize his historic lack of power.

disandrus.jpg

And J.D. Drew knows where his bread is buttered.

disdrew.jpg

It was difficult to find evidence of any batter who should swing at pitches out of the strike zone. I was hoping that would be the case with Vlad. Miguel Cabrera is one such batter who might have good reason to be a free swinger.

discabrera.jpg

And lastly, Colby Rasmus is the most extreme low-ball swinger in the league, and this type of graph shows that he's also a low-ball hitter.

disrasmus.jpg

I like the type of information that these charts display. Using it as a prescriptive tool to say how often a specific batter should swing would be wrong, but I continue to think that on a league-wide level, batters swing too often on full counts.

Touching BasesOctober 14, 2010
Two Potential Reasons for Lower Scoring
By Jeremy Greenhouse

This year, scoring is down by almost a quarter of a run per game.

At the beginning of the season, Mike Fast showed that fastball velocities were rising. FanGraphs data indicates a continued upward trend. I spotted only two pitchers from 2009 who threw 96 MPH and were out of the league in 2010 (Juan Morillo and Tyler Yates), while there were about a dozen rookies who came in throwing that (Aroldis Chapman, Jordan Walden, Stephen Strasburg, Dan Cortes, Andrew Cashner, Alexi Ogando, Joe Bisenius, Jhan Martinez, Chris Sale, Greg Holland, Sergio Santos, Gregory Infante). I suppose it's normal for there to be more hard-throwing rookies entering the league than hard-throwing veterans retiring. Still, only 30 pitchers averaged 96, and that nearly half of them were rookies sounds exceptional.

Also, I checked to see whether the strike zone has changed. Red zones indicate a higher rate of called strikes, and blue lower.

zonediff.jpg

I'm not too confident in drawing any conclusions from this, but it appears that umpires might have gotten better at calling strikes on pitches at the knees.

Touching BasesOctober 07, 2010
Searching for Unusual Pitch Selections
By Jeremy Greenhouse

Michael Lewis and Bill Simmons have written that "baseball is an individual sport masquerading as a team one." Some have reasoned that this is why baseball lends itself to statistical analysis, but I don't think that's the reason. Sure, some individual sports, like tennis, are great for analysis, but with others like boxing, I wouldn't know where to begin. I believe that what sets baseball apart from other team sports is that it can better be classified as a sequential game as opposed to a simultaneous one.

Basketball, hockey, and soccer are good examples of simultaneous games, as concurrent player interaction makes it extremely difficult to isolate any single event from the play as a whole. Football is difficult to categorize, as there are ten minutes of high-octane game action which I'd call simultaneous play, but the rest of the game involves more discreet decision-making. Play calling lends itself beautifully to analytics. As for baseball, most of the game is played in turns. Each defender positions himself, the pitcher chooses a pitch type and location, and the batter decides whether or not to swing. The rest is a matter of execution.

David Gassko wrote an awesome article using game theory to explore the batter pitcher match-up. In his analysis of pitch selection, he used Brad Lidge as his example. Lidge throws a fastball and a slider. Really good ones at that. His task is to mix his pitches in such a way that the batter cannot gain an advantage by anticipating one way or the other. That mix will depend on the batter (it's often convenient to assume that pitchers have perfect information with regards to the batter; they do not.), the park, the umpire, and a bunch of other stuff. I'm going to focus on the count. The count should only matter in determining the rate at which he chooses to throw strikes. Now, there's strong evidence to suggest that baseball players don't act rationally with regards to the count. Dave Allen has shown that batters swing more often 3-2 than they do 2-2. But most pitchers will follow the count in the sense that they throw more fastballs when they need strikes and mix in their harder-to-control off-speed pitches when they can afford balls. Here is Lidge's pitch mix for his career, data courtesy of FanGraphs.

Lidgeps.jpg

That seems fine to me. I ordered the ball/strike count from from highest run expectancy to lowest, which should theoretically follow with highest fastball percentage to lowest.

A.J. Burnett, like Lidge, mainly sticks to two pitches. He might even adhere more strictly to the count than Lidge. When he falls behind, he refuses to throw a breaking ball. He hasn't thrown a 3-0 curveball since 2008. But when he has two strikes, he relies heavily on it.

Burnettps.jpg

And the best example of pitch selection based almost entirely on the count comes from Tim Wakefield.

Wakefieldps.jpg

I wanted to find a few pitchers who defy this trend. "Pitching backwards" is a common way to describe such an approach. I looked at a fair number of pitchers, and while some guys depend less on the count in selecting pitches than others, I didn't think I would find anybody who truly "pitched backwards." I e-mailed Rich Lederer, and he suggested I look into Bronson Arroyo. You should too.

Arroyops.jpg

I would guess that something funky's going on here. Arroyo's changeup probably isn't like your normal change. But since 2002, Baseball Info Solutions video scouts have been consistent in calling that pitch -- whatever it is -- a changeup. I don't know what to make of that. Still, how can he throw his curveball 30% of the time on a 2-0 count and 8% on an 0-2 count? Has Arroyo ever given an interview explaining his thought process? Are there any other pitchers at all similar to Arroyo?

The other way that pitchers can defy convention, other than by pitching backwards, is by not following a trend at all. Certain pitchers will only employ a certain pitch in certain counts.

Bobby Jenks has embraced the idea of the "out pitch." He’s a fastball-slider pitcher early in the count. When he gets to three balls, he’ll use the fastball exclusively. But when Jenks gets the count to 0-2 or 1-2, he busts out a curve nearly half the time. He neglects the pitch on other counts, but it’s this huge weapon in these scenarios.

Jenksps.jpg

Jenks isn’t alone. Another A.L. Central Closer who embraces his curveball as an out pitch is Joakim Soria. Soria, a four-pitch pitcher, mixes his fastball, slider and change regularly. His curveball, however, he keeps in his pocket until he gets to two strikes at which point it enters the hitters mind.

Soriaps.jpg

If you can think of any pitcher whose pitch selection puzzles you, please let me hear them.

Touching BasesSeptember 23, 2010
Pitching vs. Pitchers
By Jeremy Greenhouse

Somehow I got it in my mind that Vicente Padilla was the villain of baseball. Opponents hate him and teammates hate him even more. I'm not really sure how the idea got implanted in there (inception?), but it did, and I began to envision games in which Padilla simply exchanged beanballs with the opposing pitcher. This would go on until Joe Torre brought in Scott Proctor to relieve.

In fact, Padilla has been hit by a pitch once since 2004 and hasn't hit any opposing pitcher, although he does have one of the highest overall HBP rates of all-time. I started thinking whether any pitchers are prone to hitting other pitchers or getting hit themselves and the answer is no. Sure, big fat Joe Blanton has been hit twice this year, but that's only because he's big and fat. Kind of like Padilla. And Chris Volstad has hit three pitchers while having faced 131, which is rather impressive when you think about it. But he's never been hit himself. Unfortunately, I didn't find any evidence of pitcher's retaliating against each other. Still, I had the data, so I looked into how some approach pitching vs. pitchers

Again, I had envisioned Padilla breaking out his eephus pitch against other pitchers and embarrassing them, which would result in nobody throwing him any fastballs. Not the case. To think that there's some "I throw you fastballs, you throw me fastballs" code is rather silly. There's no correlation between throwing fastballs against pitchers and receiving them in return. It's more a matter of good hitting pitchers like CC Sabathia, Dontrelle Willis, Yovani Gallardo, Micah Owings, Mike Leake, Adam Wainwright, and Carlos Zambrano who receive a fair amount of breaking stuff. And for some reason pitchers like throwing Brad Penny junk, even though he can't hit. Jhoulys Chacin is one guy who has proven inept enough with the bat to be fed nothing but fastballs.

Cliff Lee is an interesting case. He's thrown over 100 offerings to pitchers, and all but 2% were fastballs. Furthermore, his fastballs against pitchers have been clocked one mile per hour faster than against regular batters. That means that he's probably not even throwing his cutter against pitchers, but instead only throwing his straight fastballs for easy strikes. The thing is, he's only been average against pitchers, and he's been Cliff Lee against everyone else.

Lee is an exception as a guy who throws his fastball harder against pitchers than others, which might only be the case because I'm including his cut fastballs, which skew the data. Only 10% of pitchers recorded higher fastball velocities against pitchers than otherwise. Roy Halladay has treated pitchers and non-pitchers most evenly. Andrew Miller, Javier Vazquez, Homer Bailey, Felipe Paulino, and Edinson Volquez all ease up a lot on their fastballs when facing pitchers.

Even fewer—under 5%—throw a higher rate of fastballs against pitchers than against non-pitchers, and Andrew Miller is the biggest oddity in that regard. I suppose a bigger enigma surrounding Miller is why he's still pitching in the Majors.

Although Lee throws the highest rate of fastballs against pitchers, that isn't especially exceptional, considering his already high usage of fastballs against everyone (75-80%). Knuckleballer R.A. Dickey throwing nearly half fastballs against pitchers might be the biggest change in approach of any pitcher. I was surprised to learn that Jorge De La Rosa, trusts his fastball 93-plus mile per hour enough to throw to pitchers, dealing it 85% of the time, but in normal situations, he throws it only 59% of the time. Other notable pitchers who throw more fastballs while facing their counterparts: Rich Harden, Edwin Jackson, Edinson Volquez, Pedro Martinez, Ian Kennedy, Ted Lilly, Chris Carpenter, Tim Lincecum.

James McDonald has allowed a .375 OBP against pitchers in his career.

Touching BasesSeptember 16, 2010
Year to Year Spray Charts
By Jeremy Greenhouse

Rich Lederer covered Jose Bautista's home run scatter plot on Tuesday, noting that he has yet to hit one out the other way. Bautista's spray chart this year differs sharply from last year's as well.

Perhaps Bautista's new patterns can be explained through mechanical changes. According to Frankie Pilliere, Bautista is moving his hands through the zone quicker, is starting his leg kick slightly sooner, and opening up on inside pitches.

Still, former teammate Alex Gonzalez, who Rich profiled way back when, also adapted the Blue Jays swing-for-the-fences approach. Can his change in batted ball locations be explained by a new-found approach?

On the other hand, Elvis Andrus is no longer pulling the ball, and has seen his ISO drop to 40 points, the lowest mark in the leagues, and he plays half his games in Arlington.

Similarly, Matt Kemp, possibly the most disappointing player in the league this year, evidently hasn't gotten around on pitches. He might have lost speed over the offseason, considering he went from a plus center fielder/baserunner to a guy with right around the worst UZR and stolen base numbers I've ever seen, and maybe he lost bat speed too.

I tend to think of BABIP luck for a batter as a dying quail that drops in for a hit once in a while. He controls where he hits it, but not how often it falls in. I'm beginning to think that I've underestimated the amount of randomness that can effect a batter's spray charts. A split second difference in timing is the difference between hitting the ball well and popping it up or rolling it over or something. Even though Bautista is undoubtedly hitting the ball with more authority, he's probably lucky to have done so. While I think that looking at spray chart differences can signal a change in approach, I would still expect all of these guys to regress heavily to their mean next year, both in terms of performance and batted ball locations.

Touching BasesSeptember 09, 2010
Another Quantitative Approach to Studying Release Point Consistency
By Jeremy Greenhouse

Jeff Sullivan in this very space on January 19, 2006:

We know an awful lot about pitchers. We know how hard they throw, how many batters they strike out, what kinds of pitches they have, and whether their deliveries are fluid and easy or violent and rough. This is all objective and indisputable information that has a lot of value when it comes to projecting a pitcher's future health and success.

One thing we don't know much about, though, is the consistency of a pitcher's release point. The fact that we don't have a good way of measuring what's arguably the most important part of being a good pitcher is one of the more ironic twists of modern analysis.

Well, by 2007, PITCHf/x had become all the rage. The data is available now, but I'm not sure how widely release points have been studied.

PITCHf/x estimates the ball's location at a mark 50 feet from home plate. Pitchers often shift their spot on the rubber, resulting in variations of the horizontal component of the release point. This doesn't doesn't necessarily mean that the pitcher isn't repeating his delivery, though. Therefore, I decided to only look at the vertical component. Furthermore, some pitchers use different arm slots for different pitch types, and curveballs have a higher initial trajectory than fastballs. My methodology was the find the standard deviation of a pitcher's vertical release point for the fastest 20% of his pitches. Since cameras are calibrated ever so slightly differently in every ballpark, and even in every series to some extent, I looked at pitchers at both the season and game level.

While intuitive reasoning would suggest release point consistency is automatically a positive, I didn't immediately notice anything that would allow for such a broad claim. Still, I did see how release point consistency correlates with some other things.

Pitchers with lower arm slots have more trouble with release point consistency. This makes sense because pitchers with low arm angles tend to be less skilled and practiced than more traditional over-the-top pitchers. The sidearm motion could be naturally harder to repeat. It could also be a PITCH/x issue. Higher variance in release points coincide with higher variance in movement and velocity as well.

On to some examples. Javier Lopez, a sidearmer, is the worst at maintaining a consistent release point.

Lopezrelease.gif

Perhaps he's changing his arm slot intentionally. Jose Contreras has been an effective pitcher who deals from multiple release points. Unlike Lopez, though, Contreras has separate, consistent release point clusters, which makes it easy to see that it is part of his approach.

Contrerasrelease2.gif

And now for something different, Alberto Castillo:

castillorelease2.gif

David Huff is a good example of a pitcher who has a very consistent release point.

Huffrelease2.gif

In fact, Frank Viola said in 2009, ""Huff has textbook mechanics. Everything is right there. His release point is consistent with all his pitches."

Chris Carpenter and Kevin Slowey have consistent release points, too.

Touching BasesAugust 24, 2010
Contrasting Swing Zones
By Jeremy Greenhouse

One of my favorite players in baseball is a gritty corner outfielder who plays for my hometown team, and although fans derided him as a backup during the off-season, he's proven the doubters wrong so far by playing in 116 games in spite of his lack of power and ridiculed style of hitting. I decided to compare him to Brett Gardner.

FrancoeurGardner.png

What you see above are the players with the highest swing rate in the league (60.9%) and the lowest (31.1%). The contour lines indicate the area inside which each batter is 50% likely to swing at a pitch. This means that a pitch that might hit Jeff Francoeur's knee, and he's as likely to swing at it as a pitch right down the pipe to Gardner.

These graphs are all from the catcher's point of view, and the handedness of the batter is indicated by which side his name is on.

Finding players who have the biggest and smallest swing zones is the easy part. What about inside/outside? For interesting left-handed hitters, that's Andres Torres and Justin Morneau who differ most sharply.

TorresMorneau.png

As for righties, Michael Young and Shane Victorino are notable. Victorino, like Torres, is a switch-hitter, but I only included pitches when they were batting from the relevant side of the plate.

victorinoyoung.png

I was surprised to learn that Colby Rasmus extends his 50-50 swing zone a foot below the strike zone. Ronny Paulino hits from the opposite batter's box which makes his zone appear shifted, but it's actually very similar to that of Rasmus, but shifted a foot up.

RasmusPaulino.png

And the only player to compare to Pablo Sandoval is himself.

sandoval.png

Touching BasesAugust 19, 2010
On Count-Based Linear Weights
By Jeremy Greenhouse

Ever since the work of Joe P. Sheehan, pitch-by-pitch run values have been a staple of PITCHf/x analysis. More recently, Bloomberg analysts Craig Glaser and Pat Andriola really got me thinking about what these values might mean.

We all know that Cliff Lee's walk rate is otherworldly. But last week, Jeff Sullivan wrote, "Of the 201 pitchers in baseball with at least 50 innings pitched, Lee's three-ball count rate is lower than 67 individual walk rates." That is an awesome piece of information. Let's say you have a pitcher who somehow manages a walk rate identical to Lee's, and we can say he has the same strikeout and home run rates too. But what if we knew that this pitcher had, say, twice as many three-ball counts as Lee. They may have been of equal value, but surely Lee projects better going forward.

FanGraphs has a whole assortment of what they call plate discipline stats. In essence, these stats are trying to separate the process from the results. A pitcher has a high strikeout rate. Does he throw a lot of strikes or does he induce out-of-zone swings? A batter has a high strikeout rate. Does he never swing or does he never make contact?*

*To those who do such things, please don't use contact rate to predict strikeout rate.

Here's where count-based linear weights come into play. Everything that happened before the result of a plate appearance can be summed up best by the count. A pitcher who walks nobody has better process if he never even goes to three-ball counts, like Cliff Lee.

Using Retrosheet data since 2002, I found the expected run value of the final pitch of every plate appearance, excluding intentional walks. So if a player homers on the first pitch of an at-bat, that goes down as 0 runs toward his count-based linear weights. In turn, a pitcher will have a worse score if he walks a batter on a 3-0 count than a 3-2 count. Here are the values straight from Joe's article. Harry Pavlidis and others have used updated values.

Count  Runs/PA
3&0    0.207
3&1    0.137
2&0    0.097
3&2    0.062
2&1    0.035
1&0    0.034
0&0    0.000
1&1   -0.016
2&2   -0.037
0&1   -0.043
1&2   -0.083
0&2   -0.104

Barry Bonds and Curt Schilling stand unparalleled in getting into quality counts. Angel Berroa and Kirk Rueter not so much. Players who get into good counts but have bad results more often than not are burned by BABIP.

As for the top and bottom performers of 2009, here are the hitters:

Player PAs Total lwts Count lwts
Chipper Jones 577 9.5 11.1
Lance Berkman 546 24.6 8.7
Albert Pujols 665 59.0 8.6
Adrian Gonzalez 657 35.8 8.2
Nick Swisher 655 15.6 7.7
Ivan Rodriguez 447 -20.8 -10.9
Jose Lopez 642 -5.8 -10.9
David Eckstein 553 -16.4 -10.9
Miguel Olivo 414 -1.8 -11.1
Clint Barmes 607 -10.9 -16.1

And the pitchers:

Player PAs xFIP Count lwts
Cliff Lee 1103 3.69 -20.9
Roy Halladay 962 3.05 -20.8
Justin Verlander 968 3.26 -19.6
Johan Santana 689 4.13 -16.6
Cole Hamels 885 3.69 -16.1
Kyle Davies 532 5.12 4.8
Doug Davis 871 4.68 5.1
Joe Saunders 837 4.8 5.2
Trevor Cahill 764 4.92 5.2
Zach Miner 405 4.86 6.1

After spending some time with the data, I've unfortunately yet to find much predictive power in the metric, beyond what we can get out of normal peripheral stats. Nevertheless, I think there's value to a count-based linear weight as a DIPS-type metric for pitchers.

Touching BasesAugust 13, 2010
Prince or Hall vs. Paul?
By Jeremy Greenhouse

Paul Maholm was recently named the most underhyped player in baseball. Perusing his opposing batter history on Baseball Reference, I could see why some would think he was underhyped. Prince Fielder has a .071/.152/.071 line against Maholm in 46 career plate appearances. On the other hand, Bill Hall, sporting a .581/.639/1.032 clip in 36 PAs, probably doesn't really see what all the fuss is about. So who would you rather have against Paul Maholm?

Going by The Book, first we look at career numbers to get the largest possible sample. Better yet, we can look at a projection system, which distills those career numbers, adjusts them for age and weighs them by season. ZiPS projects Fielder at a .401 wOBA and Hall at a .302 wOBA. Fielder is a superstar while Hall is a utility man. We've got that out of the way. So how to explain the Maholm divide?

The Book says to next look at platoon splits. Fittingly, Hall and Fielder have identical .348 wOBAs against southpaws. Furthermore, Maholm has a massive career platoon difference of 100 points in wOBA. That closes the gap, and that's about as far as The Book goes. To get the rest of the way there, I thought PITCHf/x might come in handy, so using movement, velocity, and location as my inputs against LHPs, I tried to predict their success against Maholm's offerings.

Maholm throws both his two-seam and four-seam fastballs around 88-90 miles per hour, and throws them on just over half of his pitches. His two-seamer has better movement in my opinion, and has certainly achieved better results, yet interestingly, he throws it less often to same-handed hitters. I'm not sure this is a wise move overall—he might be handicapped by wanting to throw his two-seamer only to his arm side—but against Prince Fielder, his choice of fastball has certainly paid off. I grabbed 1,000 fastballs against Fielder from LHPs and plotted Fielder's success (RV100) by pitch movement. I also added lines to indicate the average movement of Maholm's two fastballs.

Princemovement.png

Fielder is above average on risers but below average on sinkers. Movement is not the only reason that Maholm's four-seam fastball stifles Fielder. The location of Maholm's four-seamers also coincides with Fielder's weakness

Princelocation.png

In fact, Fielder has swung at 23 Maholm four-seamers. All but five he has either fouled off or swung through. As for the five he put into play, all of them were grounders, and only one was a single. Two were double plays. So to sum up, Maholm uses his four-seam fastball a lot facing lefties, and it just so happens that said fastball matches up perfectly against Fielder.

Furthermore, Maholm's slider, his best pitch, is death on Fielder, and LHBs in general. However, Maholm only uses his slider 7% of the time against righties. Instead, he takes the changeup out of his pocket and also uses the curve a bit more. But his changeup isn't as good a pitch as his slider, even when accounting for the platoon differential. And against Hall, Maholm's choice of off-speed pitches is asking for trouble.

Maholm's changeup comes in at 83, his slider at 80, and his curve at 73, and they follow the PITCHf/x spectrum of movement. His changeup is in the top-right quadrant, dropping the least out of his off-speed pitches and moving the most toward his arm side. His slider is right near the origin, with average values of 0 inches in horizontal and vertical movement. And his curveball is diametrically opposed opposed from his changeup, as it breaks down and in towards righties. Conventional wisdom and PITCHf/x analysis both say that the slider has the largest platoon split of all off-speed pitches, so perhaps Maholm is right to scrap it against righties. But Hall apparently isn't a normal righty. Against off-speed pitches, here is how he does based on horizontal and vertical movement:

Hall.png

The troughs in both charts appear in the areas where Maholm throws his slider. This means that sliders might be the best pitch to throw Hall. There's room inside to throw the slider, and he's also willing to chase them in the dirt when LHPs try to backfoot him. But Hall destroys offspeed pitches left out in the zone.

Hall has been thrown twelve curves from Maholm. He swung at three of them, connecting for two singles and a double. He was also hit by one of them, and most of the rest went for balls. Hall's put four changeups into play, good for a groundout, a single, a double, and a home run. Again, most of the rest were balls.

Prince Fielder is soon to sign a contract worth over $100 million, while Bill Hall might be out of baseball in a year. Yet in certain contexts, Hall might be the better player. Given both batter's substantial platoon split, and more importantly the large platoon split of Paul Maholm, you could project Fielder and Hall to hit Maholm equally. And digging deeper, it is evident that Maholm's strengths match Fielder's weaknesses and Maholm's weaknesses match Hall's strengths. The case can be legitimately made that Bill Hall projects to be a better hitter than Prince Fielder against Paul Maholm.

Touching BasesAugust 12, 2010
WAR and the Rule 5 Draft
By Jeremy Greenhouse

The Rule 5 Draft dates back over a century, and Retrosheet has a fair chunk of Rule 5 data. The Rule 5 draft as we know it began somewhere around 1965, so I took all drafted players since then and their WAR in the following years. As it turns out, the Rule 5 Draft is a market for more-or-less freely-available replacement-level talent.

Most years, 80-90% of one-time Rule 5 picks either don't play or accumulate 0 WAR. That means that in the first year after being drafted, 35% don't play, while 55% occupy a Major League roster and play at replacement level. Five years removed, 70% of Rule 5 picks aren't playing, but at least most of those who do are competent Major Leaguers.

Many Rule 5 picks don't play for the team that drafted them. For example, Bobby Bonilla was a Pirate before he was taken by the White Sox, but he was traded back to Pittsburgh before he became Bobby Bonilla. Johan Santana was drafted by the Marlins, but that was only in a pre-arranged swap of picks with the Twins. And Josh Hamilton played only a year for the Reds, yet that in turn was only because the Reds were able to buy him from the Cubs, who had selected him in the Rule 5 Draft.

Only 14 players have amassed 2 WAR the year after they were taken. Doug Corbett picked up a whopping 5.9 WAR. Ted Abernathy, 10 years into his Major League career, was somehow a Rule 5 pick, and he quickly had the best year of his career at 5.6 WAR, finishing 20th in MVP voting. After that, the familiar faces of Joakim Soria, Dan Uggla, and Josh Hamilton made the most immediate impacts. 14 players have been drafted twice, and Shane Victorino is the most successful.

The Twins have been the best drafters, and that doesn't even count their trade for Santana. Minnesota was the team that got that value out of Corbett, and the Twins also sapped all the talent out of Shane Mack after selecting him in December of 1989, which you can see from the table below.

Year Age Tm Lg PA WAR Salary
1987 23 SDP NL 267 -0.2 $62,500
1988 24 SDP NL 140 0.7 $73,500
1990 26 MIN AL 353 2.5 $105,000
1991 27 MIN AL 489 4.9 $270,000
1992 28 MIN AL 692 6.0 $1,075,000
1993 29 MIN AL 553 1.1 $3,050,000
1994 30 MIN AL 347 4.0 $3,250,000
1997 33 BOS AL 146 0.7 $1,850,000
1998 34 OAK AL 2 -0.1 $450,000
1998 34 KCR AL 229 0.2

The Pirates have seemingly been pillaged by the Rule 5 draft, but again, they were able to reclaim Bonilla, which offsets some of their losses. The real question is, why didn't the Pirates protect Bonilla in the first place? They took another hit when they let Bip Roberts go. The Pirates had drafted Roberts twice, and were able to sign him when they used their first-round pick on him the second time, but he was plucked clean by the Padres, and went on to develop into a nice player. The Diamondbacks, in their short time, have only had a handful of players taken from them, but those include Dan Uggla and Luis Ayala.

The Giants and Red Sox have made about 20 Rule 5 picks each, and have had 0 pan out as players, unless you want to count Javier Lopez. I don't. In fact, many teams have gotten no return from the Rule 5.

Evaluating a Rule 5 pick is in parts straightforward. The drafted player will make the league minimum salary. $50,000 per selection is $50,000. The tricky part is how much value to place on losing the flexibility of a 40-man roster spot. Most Rule 5 picks never become more than replacement level, especially not in that first year when they're guaranteed a roster spot. I'd say that five players a year are, or become, better than replacement level, while 15 picks are made per year. So if a team covets a player, using a Rule 5 pick on him can be worth the while, but 10 picks in, teams are just as well off passing on their selections, which they often do. I don't see any hidden value in the Rule 5 Draft. I struggle to even see the purpose of this outdated draft model. A boring draft makes for boring analysis.

Touching BasesAugust 05, 2010
Does Pedigree Matter?
By Jeremy Greenhouse

Ben Zobrist was a sixth-round pick who had done nothing special in the first three years of his Major League career, but then put up one of the best seasons in baseball in 2009. Ryan Zimmerman, the fourth overall pick of the 2005 draft, put all the pieces together and became one of the best players in baseball in 2009. We perceive them differently mainly because Zimmerman is a much better player, but the point I'd like to make is that their original draft status--their pedigree--also factors into how we think of these guys. Should it affect our projections going forward?

Projections are hard. Instead, I broke players into three groups depending on whether they surpassed their previous year's WAR, fell short of their previous year, or they didn't play at all. Data courtesy of baseballprojection.com. And using Retrosheet, I broke players' pedigrees into five grades. Top 10 draft picks, rest of the first round, second-third rounds, third-tenth rounds, and anything after that.

As you can see below, from the first year in the Majors to the second, first-round draft picks (the As and Bs) have a much higher improvement rate than lesser prospects.

Draftonetwo.jpg

There are a lot of things going on here. First of all, The better prospects are younger, and are therefore more likely to improve. Also, The better prospects are given more leeway to fail, so there is a much lower percentage who do not play in the subsequent year. And yes, I think that at this point, they are probably better players than their counterparts, production being equal.

How about year two to year three?

drafttwothree.jpg

More of the same. Higher pedigree players are still improving at a higher rate.

draftthreefour.jpg

This effect is starting to appear consistent. Let's keep going.

Draftourfive.jpg

We need until the fifth year to see pedigree becoming negligible.

draftfivesix.jpg

Controlling for the quality of the player by creating a projection is necessary to make any conclusions. Nevertheless, I think the matter warrants further consideration. Projection systems are sometimes built to use data as far back as college, but I haven't heard of any that include draft position, really the only prospect grading system for which there is a large volume of discrete data. A draft pick provides a snapshot of what up to 30 MLB teams all with presumably independent and sophisticated thought processes thought of a single player at a single time. That picture fades, but even when a player makes the Majors, it's still part of his history.

Touching BasesAugust 03, 2010
Three Up Three Down: Gaining Steam and Losing Gas
By Jeremy Greenhouse

Grouping by pitch count, I averaged the 25% fastest pitches for every starter. Here are three who possess another gear:

And three who don't:

Touching BasesAugust 03, 2010
Velocity and Height
By Jeremy Greenhouse

Although tall pitchers have the advantages of long arms and long strides, there is a larger player universe of shorter pitchers, so shorter pitchers compensate with other attributes. Therefore, one would expect there to be little correlation between height and fastball velocity for Major Leaguers. I took the top 10% fastest pitches for each pitcher 2008-2009, and found little relationship between velocity and height.

heightvelo.jpg

Clearly, that's Wakefield who is the only pitcher unable to reach 80 MPH. Only pitchers between 6 feet and 6'6" have hit 100, which I suppose is interesting.

The inherent advantages of being tall are masked by looking at things this way. Still, I thought that I would be able to find some sort of height benefit by looking beyond raw velocity. I was wrong.

My idea was to look at the difference between the velocity normally estimated at the 50-foot mark by PITCHf/x cameras and the velocity estimated at home plate. Hypothetically, I thought, tall pitchers should release the ball closer to home plate than shorter pitchers, and therefore there should be a smaller difference between the starting and ending velocity of such a pitch. The data didn't back that up, and the more I think about it, the less the hypothesis makes sense.

I tried to fit a model using height, velocity, release point, and spin to predict the drop in velocity from start to finish. Kei Igawa lost the least velocity, and Igawa's been known in his brief time for an extremely long stride, while Shaun Marcum lost the most velocity, and I was able to find some research showing he had a short stride. Yet I think the list is mostly random.

If two objects, acted upon by different forces, are traveling at the same velocity at any given point with the same atmospherics, then the original point of impetus shouldn't really make a difference in their rate of deceleration. So unless a pitcher is releasing the ball within 50 feet, I think the initial velocity is the only PITCHf/x recording one needs, and height doesn't matter with this type of data.

Be sure to check out Eric Seidman's work on perceived velocity.

Touching BasesJuly 29, 2010
The Bridge to Mariano
By Jeremy Greenhouse

Once upon a time, there was a man named Jeff. A man named Jeff and a man named Joe. Well, maybe you already know how the story begins.

The Great Mariano Rivera, the Hammer of God, had been banished to the bullpen, a failed starter. But John Wetteland welcomed him with open arms.

“You hand the ball to Buck,” Wetteland explained. “And Buck hands the ball to me.”

“Thank God for that,” said Mo.

But on October 8, 1995, Game 5 of the ALCS, Mariano handed the ball to Buck, and Buck handed it to Jack McDowell.

A man named Jeff. Jeffrey Allan Nelson had an idea. And a man with an idea is a powerful thing. Nelson was sitting in the Mariners bullpen during this, the first night of the Yankees Dynasty. Instead of celebrating his team’s victory, Nelson lost himself in thought. If only Wetteland had followed Mariano. What if bullpen roles were rigidly defined? No way would the Yankees give up runs! Bullpen roles so defined that the Yankees can forfeit wins by adhering to meaningless statistics used only in rotisserie leagues, arbitration cases and in deciding the Rolaids Relief Man Award!! Mmm, Rolaids.

Within a month, Joe Torre replaced Showalter as Yankees manager. Another month, and Nelson was shipped to the Bronx. The rest, as they say, was history, as they say.

In 1996, Nelson pitched in a team-leading 73 games, Rivera became the best reliever in baseball, and the Yankees won their first World Series in 18 years. And Wetteland won his Rolaids Relief Man Award.

But Wetteland left New York, and here’s where the story gets interesting.

Jeff pitched his plan to Joe.

Step 1: Assemble the best group of position players and starting pitchers in baseball so that the bullpen doesn’t really matter.
Step 2: Install Rivera as closer, ensuring a dominant bullpen.
Step 3: Build a fucking bridge.

And so it was. Joe Torre commissioned the building of a bridge. The Bridge to Mariano. Jeff was the architect, but he recruited his childhood friend Mike Stanton to help him build. Together, alternating shifts, they built the bridge. And what a bridge it was. It had aqueducts and arches and triangles and suspensions and all that stuff that makes bridges not spectacularly collapse. Quieter than the Bridge on the River Kwai. More flip than the Flipper Bridge. It was the most important bridge in the history of bridges. From 1997-2000, Stanton pitched to a 4.17 ERA and Nelson pitched to a 3.08. Their pitching was fine, and not much was made of it at the time. But what a bridge! How can you blame them for being pedestrian relievers when they were so busy building a fucking bridge?!?

Alas, in 2000, Jeff was passed over from the All-Star team by Joe, and upon leaving the Yankees, Nelson bitterly decreed, “Tear down this bridge.” Mariano was left bridgeless.

“Thank God for that,” said Mo.

The Yankees Dynasty crumbled with the departure of Nelson. Who could have known that the guy pitching 70-80 slightly leveraged innings per year could have been so influential? But as it turned out, Jeff was more than baseball. Jeff had pioneered, engineered and maintained the Bridge to Mariano. And Jeff left the bridge in ruins.

Upon Jeff’s departure, trolls could be seen patrolling the remains of the Bridge to Mariano. Yes, the trolls were the only ones who had realized the importance of the bridge. To the trolls, Jeff had been more than a decent relief pitcher. Old Nellie had also been blessed with the ability to try to pick a runner off first when there was already a runner on third! The gall! The ingenuity! There was once a dream that was the Yankees Dynasty, the trolls thought. And we fear that it will not survive the offseason. The trolls sought the bridge’s resurrection.

The Yankees acquired better relievers in those later years, having led the Majors in WPA in the decade since, but nary a relief man could pay the troll toll. Not a Flash, not a Proctor, not even the Rules Joba could recreate the Bridge to Mariano. For Farnsworth’s fastball flew forever straight. The eighth inning! And the dulcet melodies of the rotation beckoned Hughes. The eighth inning! Who can be the bridge to Mariano? The eighth inning!!

Years from now, when the Yankees struggle to find Mariano’s successor; most fans will miss the Greatest Closer of All-Time. But let this serve as a reminder; the trolls were right. Bullpen is principal to victory, yet Rivera was never key to the bullpen. It was always the Bridge to Mariano.

So we march on, analysts against the trolls, traversing an endless bridge to nowhere.

Touching BasesJuly 26, 2010
Working Hard or Working Fast?
By Jeremy Greenhouse

"The wrong way, but faster." Max Power

I could point to a dozen articles discussing the varying shapes and sizes of the strike zone, but when my friend Don asked whether umpires really change their zone depending on the score, I drew a blank. Factors such as the identity of the pitcher and the ball-strike count influence an umpire's process, but only so that he can do the job to the best of his ability. Yet for some reason, it's been casually accepted by some that umpires might be so unprofessional that they call a larger strike zone in a blowout to quicken the pace of the game.

Fortunately, this assertion is not backed up by any evidence, as umpires appear to call consistent zones depending on the score. Below, I plot the 25%, 50%, and 75% contour lines for called strikes based on four different score differentials. The zones are jumbled and mostly indistinguishable, so, on the whole, umpires do not call to the score.

scorezones.gif

Perhaps there are some umpires who regularly schedule early dinner reservations, but the only ump I'm willing to openly critique is the only umpire who invites such criticism: Joe West.

I graphed West's strike zone at the point where he is equally as likely to call a strike as he is a ball. I also dug up the two Red Sox vs. Yankees games that West umpired, and plotted those ball/strike calls. West, you may remember, publicly denounced the length of these games. However, I found no evidence of bias. If anything, West has squeezed batters in Sox/Yanks games and batters in blowout games (blue line).

westzone.gif

Umps aren't alone in being accused of unprofessionalism. Weeks ago, Patrick Sullivan* questioned the commonly-held wisdom that players try to get out of the ballpark ASAP during getaway games. It's hard to believe that batter would swing at bad pitches just because they're playing in the final game of a series, but that's what I checked for.

*You can follow Sully on Twitter, if only to observe him incessantly hound the insufferable Boston media. For example, "Shaughnessy on May 9: 'Beltre is emerging as an Edgar Renteria or Rasheed Wallace, take your pick.'"

Getaway Rest of Series
Time 2:56:27 2:55:15
Day Game 68% 14%
Innings 9.20 9.16
Runs 4.54 4.67
Hits 8.94 9.03
Errors 0.60 0.60
P/PA 3.83 3.81

You'd be hard-pressed to find statistical evidence that umpires and players sacrifice quality for expediency.

Touching BasesJuly 20, 2010
How Does Time on the DL Affect Fastball Velocity?
By Jeremy Greenhouse

Rotobase's injury database contains disabled list data dating back to 2002. Incidentally, that is as far back as FanGraphs carries Baseball Info Solutions' velocity data. So my question is, how long does it take a pitcher to get back up to speed?

First, I've plotted the number of days a player spends on the DL against the difference in velocity between the month he was put on the DL and the month he returns.

VeloDL.jpg

Joe Martinez, returning from three hairline fractures caused by a line drive off his skull, displayed the biggest jump in velocity, as you can see in the 2009 section of this graph.. And Brad Penny in 2008, who was plagued with tendinitis in his right shoulder, took the biggest hit of any pitcher, as demonstrated here.

To the point, there's no correlation between the two variables. That's not to say that the severity of an injury has no bearing on fastball velocity-it most certainly does. It means that the sampling biases in this study may overwhelm the effects of an injury. No pitcher will return to Major League Baseball if his injury is too debilitating. The pool of players who do return from injury is strongly biased towards those players who were not cripplingly injured. Even so, perhaps pitchers continue to show effects after they return from the DL to the Majors. Below, I present a table showing the average difference in fastball velocity between the month he hit the DL and all subsequent months after coming off.

Return Month Velocity Difference
First 0.02
Second 0.07
Third 0.23
Fourth 0.41

Velocity increases the further removed a pitcher is from the DL. Players are continually recovering. Still, velocity is generally higher in months after hitting the DL than the immediate month before. What about if we look at the month before that. If a player was on the DL from June 1 to June 30, then how did he throw in April as compared to July?

Return Month Velocity Difference
First -0.20
Second -0.12
Third 0.07
Fourth 0.12

Not surprisingly, this shows that pitchers exhibit symptoms of injury (diminished velocity) in the immediate month prior to hitting the DL more so than in the preceding months.

This effect was exacerbated when I looked at pitchers recovering from Tommy John surgery. Because recovery from Tommy John takes over a full year, this was the only time that I used data from different seasons for a single pitcher, but I still identified over 50 cases where a pitcher recovered from TJ.

Return Month Before (1) Difference Before (2) Difference
First -0.12 -0.51
Second 0.08 -0.29
Third 0.41 -0.02
Fourth 0.26 -0.15

Tommy John alumni pick up velocity the longer they are allowed to stay in the Majors. But most of them do not find the velocity they had in the months before the surgery.

Back to the original 826 players who made the trip to and from the DL in a single year. The injury database is set up in such a way that there are many binary variables indicating whether the injury was to this body part or that, so what else to do but run a linear regression? Nothing was statistically significant, but upper arm injuries seem to exhibit the greatest negative effect on velocity.

Methodological Details:

FanGraphs provides monthly velocity splits, so, for every pitcher who hit the DL, I found all the months they pitched before coming off the DL and all the months they pitched after going on the DL. So if a pitcher's stint was from June 15-July 15, I used his June month as before (1) and his July month as return (1). May and August would therefore be before (2) and return (2), respectively. If a pitcher was on the DL from June 5-June 25, then I excluded June, and used May and July as the before and after months. I adjusted each pitcher's fastball velocity reading by the month and by his team. More pitchers go on the DL in April than come off it, which could have skewed results, as seasonal temperature effects could throw off velocity by a full MPH. So the two Chicago teams and the Indians were bumped up nearly a percent in fastball velocity in April, while the Angels in July were knocked down a bit, for example.

Again, thanks to Rotobase and FanGraphs.

Touching BasesJuly 15, 2010
Stuff of the Futures
By Jeremy Greenhouse

One of my favorite qualities of the incredibly rich PITCHf/x data is that it allows one to analyze a small sample and draw some substantial conclusions about a pitcher. Harry Pavlidis has been publishing his Arms of the Week series for some time, and he's already taken a look at the southpaws of the Futures Game. Twenty-four pitchers unveiled their stuff to a world-wide audience on Sunday, and here's what I got.

When I say that conclusions are there for the drawing, I mean that with a guy like Tanner Scheppers, whose fastball reads 98 miles per hour, we can comfortably say that he could fit right in with the Rangers' bullpen. The Rangers, to their credit, want Scheppers to start, but he's got the classic power fastball you see from late-inning dynamos like Jonathan Broxton, Brian Wilson, and Daniel Bard. Scheppers flashed a breaking pitch twice, which was very solid. As a starter, he profiles as A.J. Burnett 2.0.

Scheppers was the most impressive, whereas Jeremy Hellickson was the most important. Hellickson is breathing down the neck of Wade Davis, and his performance did little to quell the fears of the Rays' fifth starter. Reportedly a pitcher who sits 91-93, Hellickson was able to work at 93-94 with average movement on his fastball. He probably was dialing it up a bit for his brief stint in the limelight. There have been reports that he's been tinkering with a two-seam fastball, and he might have thrown a couple, but I'd say it's his weakest pitch, unless it is used exclusively to same-handed batters. His breaking pitches were fine (he throws two types of curves), he didn't show his cutter, and I try to stay away from analyzing the effectiveness of changeups based on velocity and movement (his was an 84-MPH straight change).

The next-best prospect who pitched was Julio Teheran. He showcased his 96-MPH four-seam fastball, which should be a plus pitch. His breaking stuff is advanced enough that it's easy to see why he would be dominating the low levels of the minors. I'd guess his perfect-world comp would be Josh Beckett.

Henderson Alvarez of the Blue Jays is currently starting, and impressing, in High-A, but to me he profiles more as a right-handed reliever. His best pitch appears to be a sweeping low-80s slider, and his hard fastball runs away from RHBs, so unless his changeup develops into something, Alvarez looks like a sinker/slider guy out of the pen.

Simon Castro has a good enough slider, but his fastball lacked luster. A 91-MPH tailing fastball will get hit in the Majors, so he'll need to cut down on his walk rate. He pitches with very little separation between his fastball and his change.

The Rays' Alexander Torres displayed some strong stuff, but he obviously has trouble commanding it, with a career Minor League walk rate above five per nine. His boring fastball ran 94-95 and he threw one breaking pitch with serious life. Unfortunately, it sailed a foot high. Very similar pitcher to Gio Gonzalez for me.

Trystan Magnuson's best pitch is a cut fastball that comes in at 88, moving across the plate. He also throws a split-finger fastball at 88. And his actual fastball is only a bit harder at 92-93, which makes for a unique repertoire. I don't know how much success it'll have.

What exactly is Anthony Slama the future of? He's 26 years old and he strikes guys out in relief. Fastball, slider, change. He'll destroy righties, but I don't think he'll ever be a closer/setup guy due to his projected massive platoon split.

Jordan Lyles' off-speed stuff has developed past his limited fastball. His changeup dives away from lefties, his slider can neutralize righties, and his curve will most definitely play. But it's telling that in a game where he had to throw a total of 15 pitches, only six of them were fastballs. They say pitching backwards can work in the N.L. Central, though.

Bryan Morris threw exactly one pitch, and oh what a pitch it was. 93.3 miles per hour. Bad movement. 0.38 StuffRV/100. Thanks for coming.

I like Mike Minor. Renowned as a collegiate, command, polished, you might as well say crafty, lefty, he came out with a surprisingly strong fastball. 93 with life. He threw changeups as his other offering, neglecting to toss in a breaking ball.

Stolmy Pimentel's pitch of note is his curve. Thrown at only 72 miles per hour, it moves nearly a foot across the plate, but doesn't drop much at all. Bronson Arroyo has a curveball like that in his arsenal, but not many others do.

Zach Britton threw only fastballs and sliders, but both of those pitches are more than big league ready. He has a hard, heavy sinker that will give lefties nightmares, can add some velocity with his four-seamer, and he boasts a true slider. You just don't see a left-handed pitcher with that biting slider and power fastball too often, and when you do, he can dominate. I think Britton's a stud, and the strikeouts will come.

Shelby Miller's got a live arm, and if you didn't know about his 95-MPH rising fastball, now you do.

Hector Noesi has been terrific this year, with a 6.35 strikeout-to-walk ratio in the minors. One of many Yankees between A and AA dominating the competition. His stuff, highlighted by a 93-MPH heater, does profile as a back-end guy, but that doesn't mean his impeccable command can't pull him to the front end.

Philippe Valiquette might have been throwing two types of fastballs. He might not have been. Tune in next time to find out. Why was this guy pitching in this game? Bleh.

Jeurys Familia dialed it up to 98. I'm very surprised to see that he's a starter in the minors, considering. At 20 years old, he can afford to throw one off-speed pitch out of a dozen offerings. Lots of time to work on that secondary stuff and that command. For now, that velo will do.

Zach Wheeler, a 2009 draft pick, throws hard, and he threw a single changeup with extreme movement. Very good changeup. He didn't get a chance to use his curve, which he called his out pitch last year.

Christian Friedrich threw three fastballs, and that was it. It was a rising fastball, and you never know how that will play in Coors.

Eduardo Sanchez also threw nothing but fastballs. A couple ticks harder than Friedrich, but he doesn't have the advantage of being left-handed. The most interesting note about Sanchez is that he was born a week apart from me. Therefore, I will pretend to be his distant cousin in order to obtain free access to Redbirds games. He will gain more from our relationship than I ever could.

Touching BasesJuly 13, 2010
Testing Outfield Arms
By Jeremy Greenhouse

Over at The Hardball Times, John Walsh used to write one of my favorite pieces of the year; a ranking of the game's best outfield arms. Walsh would find every outfielder's "kill" and "hold" rates in five distinct situations. Walsh has taken a hiatus from the exercise this year, so I'd like to pick up on the research, adding Gameday's hit location data to the mix.

Walsh has already covered 2008, yet I've chosen to use both 2008 and 2009 data in my study. The hit location coordinates provided by Gameday make it difficult to decipher the exact distance of a ball to the outfield. But the batted ball angle relative to home plate can be calculated. Fortunately, Walsh outlined two parameters in which distance is more or less immaterial, and only the angle matters.

1. Single with runner on first base (second base unoccupied).
2. Single with runner on second base.

All singles land somewhere in front of the outfielder. And it turns out, the success of the base runner depends little on whether the outfield single was a grounder, a line drive, or a fly ball.

Excluding all two-out plays, I found the rates at which base runners advanced or were thrown out attempting to advance, depending on the batted ball angle.

FirstThird.gif

On singles directly at the left fielder, base runners attempt to advance first to third only 5% of the time. Right at the center fielder, base runners risk it 15% of the time, and 25% of the time on balls to the right fielder. 40% in the left-center gap and 60% in the right-center gap. These figures coincide with how often balls are hit to each location, meaning that outfielders align themselves sensibly. What doesn't make sense is that runners are thrown out trying to advance on balls to center as often as they're thrown out trying to advance on balls to right. Sure, right fielders have better arms than center fielders, but center fielders are closer to third base and get to the ball faster. I don't know what the numbers should look like if base runners advanced optimally, but I do know that the rate at which runners attempt to advance should be directly proportionate to the the rate at which runners are thrown out attempting to advance.

That theorem holds when base runners on second try to score on singles.

SecondHome.gif

Singles targeted at corner outfielders are 50-50 plays for the third-base coach/base runner, and that risk/reward proposition can fluctuate depending on the number of outs, the upcoming batter, the current pitcher, and all that stuff. Center fielders, who are positioned farther from the plate and have to circumvent the pitcher's mound with their throws, are tested at a 75% rate. There is a higher frequency of singles to center, specifically of the ground ball variety, with a man on second than with a man on first due to the infield alignment.

I compared the expected rates to what actually happened to evaluate base runners and outfield arms. So if a runner advanced first to third on a ball right at the right fielder, they would both accumulate .75 extra bases and -.05 extra outs.

Here are my top five and bottom five base runners at advancing on singles.

Name Extra Bases Extra Outs
Chone Figgins 23.1 -2.4
Erick Aybar 15.7 -0.5
Troy Tulowitzki 12.0 -1.3
Ian Kinsler 11.0 -1.4
Matt Kemp 10.1 -1.6
Carlos Lee -10.0 1.4
Brian McCann -14.0 0.5
Bengie Molina -13.5 0.7
Jorge Posada -4.1 4.4
Prince Fielder -12.6 2.4

The Angels are a very aggressive base running team, which pays off with guys like Figgins and Aybar. Matt Kemp's fielding and base running production have taken significant, almost shocking, hits this year. Jorge Posada is the worst base runner I've ever seen, and he's probably one of the worst of all-time. Considering his defense, which has never drawn positive reviews either, his Hall of Fame case will be very interesting.

To evaluate outfield arms, I included a regressed version of these base running scores.

Name Extra Bases Extra Outs
Hunter Pence -11.3 8.3
Michael Bourn -1.1 6.3
Jeff Francoeur -1.2 5.8
Jayson Werth -11.0 3.4
Adam Jones -17.6 1.2
Jason Bay 6.7 -2.9
Jermaine Dye 10.3 -2.3
Shin-Soo Choo 11.9 -2.1
Brad Hawpe 13.5 -2.5
Brian Giles 11.4 -3.3

Baseball Reference actually carries these stats. Hunter Pence, a right fielder, was tried 100 times on singles with a man on second. He held the runner at third 40 times, leaving 60 tries for him to nail the runner. He succeeded on ten, which is quite an impressive rate. All three of the Phillies outfielders have been successful holding the running game. Bourn was also an above average base runner, and Ichiro, renowned for his arm and base running, was merely good in each. Shin-Soo Choo is the biggest surprise I found, as I've heard that he has "80" arm strength before.

And the rest:

Touching BasesJuly 07, 2010
Stolen Bases and PITCHf/x
By Jeremy Greenhouse

I tend to think that pitchers have more control over the running game than catchers. Catchers control their "POP" times, while the pitcher controls his time to the plate, pickoff move, pitch location, and pitch type. The last two factors are probably the least significant in determining the success of a stolen base attempt, but they're the most quantifiable thanks to PITCHf/x.

Below is the success rate of stolen base attempts from 2008-2009 based on the pitch location.

Steal%20Location.gif

The trendline is clear. The catcher has no chance at throwing out a baserunner on anything that's less than a foot off the ground. Balls at the belt and up give the catcher a 70% chance at throwing out the runner, and pitches (pitch outs) in either batter's box really level the playing field.

Looking at these charts, I don't see why there aren't more lefty-throwing catchers. SB success rates are even at 76% regardless of the batter handedness, so throwing through the batter doesn't pose much of a problem. In fact, a pitch has to be located a foot off the plate for the batter's handedness to have a 10% difference on SB%. And, again, most of that is just due to pitch outs being thrown in the opposing batter's box.

Speaking of pitch outs, base runners were safe only 45% of the time on pitches classified as 'PO' by Gameday stringers. And considering the 70-75% success rate on regular fastballs, a couple tenths of a run are gained by pitching out when the runner is on the move. However, the data I'm using show that runners were in fact running during only 15-20% of all pitch outs. Furthermore, The difference between a pitch out and a regular fastball in terms of pitch type linear weights is at least a tenth of a run. Therefore, as currently employed, pitch outs would have to nab runners about 90% of the time to break even. I've always believed that the pitch out (and hit-and-run) have been over-utilized tactics, and I'm waiting to see some data refute that.

Jorge Posada and Mike Napoli, who both struggle throwing out runners anyway, call for a very high rate of pitch outs. Backups Jeff Mathis and Jose Molina, who are far from defensively challenged, also call for their share of pitch outs, so those calls are likely coming from the bench. Humberto Quintero and the lethally armed Lou Marson never call for pitch outs.

There are several reasons to throw more fastballs with a man on first than with the bases empty; there's a chance for the double play, incentive to avoid the passed ball, and, of course, to control the running game. John Baker and Joe Mauer both caught about 60% fastballs with nobody on, but 70% with a man on first. Along that same line, here's a snippet of the leader board for pitchers who throw more fastballs with a man on than with the bases empty.

Rank: Pitcher Bases Empty Man On First Team
1: Anibal Sanchez 53.20% 71.30% Marlins
2: Sean West 59.08% 76.69% Marlins
3: Josh Johnson 61.83% 79.05% Marlins
4: Boof Bonser 53.72% 70.38% Twins
5: Matt Guerrier 52.94% 69.35% Twins
6: Chris Volstad 62.31% 78.64% Marlins
8: Leo Nunez 53.67% 66.89% Marlins
9: Burke Badenhop 65.83% 78.80% Marlins
11: Scott Baker 59.56% 71.87% Twins
12: Mark Buehrle 69.74% 81.39% White Sox
13: Andrew Miller 66.02% 77.34% Marlins
14: Nick Blackburn 53.02% 64.23% Twins

I don't know if it's Mauer and Baker, or if these organizations stress this strategy, or pure coincidence, but it's something. I included Mark Buehrle on the list because the man is, or he surely should be, legendary at fielding his position.

Of note, Gerald Laird, a terrific thrower, received fewer fastballs with a man on first than otherwise. Bartolo Colon and Trevor Hoffman shied away from their fastballs with a man on.

As for the other pitch types, base runners are successful 80-85% of the time running on off-speed pitches. Interestingly, on SB attempts of third, the lowest success rate has come on the knuckleball. Tim Wakefield must do a better job holding runners on second than he does on first. From the following, you can see that due to Wakefield, it appears that diminished velocity deters steals of third.

Velo%20Steals.gif

Combining velocity and location in a regression doesn't accomplish as much as I was hoping in terms of sorting out what catchers have had more or less difficult opportunities to gun down runners. Every catcher, save two, was expected to throw out 74-79% of base runners based on these factors alone. Only Wakefield's personal catchers Kevin Cash and George Kottaras have been forced to throw on especially difficult pitches. And they still have better numbers than Jason Varitek and Victor Martinez.

Touching BasesJuly 01, 2010
Strike Zone Sizes Crouching Batters
By Jeremy Greenhouse

We consider the strike zone a static area, although, in reality, it is a moving target. "As the batter is prepared to swing at a pitched ball," an umpire has to guess the height of the batter's letters and his knees. This moment is imprecise, yet PITCHf/x analysts must try to capture the top and bottom of the strike zone to get the most out of the PITCHf/x data.

As I see it, there are several ways to either directly observe or infer the parameters of the strike zone. One is to follow the work of John Walsh, Dan Fox, Ike Hall, Josh Kalk, Dan Turkenkopf, Mike Fast, Jeff Zimmerman, Ike Hall, and others, who all find the probability of a pitch being called a strike at any given location. It is helpful to know the edges of the zone without such rigorous analysis as these, as they necessitate large volumes of data. Instead, we know the plate is 17 inches wide. That serves just fine for the width of the zone. And we hope that we know the batter's height. Unlike weight, which varies year to year and is sometimes a touchy subject for athletes, height is consistent throughout a player's playing career, and should be fairly accurate. In some Pedroian cases, we'll hear that the guy is even smaller than listed. That's not the a big problem, though. The issue with using height, and height alone, is that batters have different stances. Fortunately, there are stringers at every game who mark what they believe to represent the top and bottom of the strike zone are for each batter. By linking the Retrosheet and Gameday databases, I found each batter's height and average top and bottom strike zone values.

Mike Fast has looked into the subject before, and I'm borrowing ideas from him, as well as an image from him below. The other guy whose data has proven useful to me in this study is actually the "Batting Stance Guy." BSG claims to offer "the least marketable skill in America," though, for me, it's quite useful.

BattingStanceGuys.gif

You can estimate the top point of a batter's strike zone as 56% of his height, and the bottom as 26%. But I think we can do better. I took 130,000 pitches vs. RHBs that crossed over the heart of the plate, spanning a foot in width. Using the top and bottom strike zone values provided for each pitch, the average top and bottom strike zone values for each batter, the batter's height, and finally a regressed version using the 2nd and 3rd categories, I found the percent of pitches that agree with the umpire's ball/strike call.

  Stringer SZ Average SZ Height Regress'd
High 85.08% 88.03% 89.92% 89.99%
Strike 85.46% 85.78% 85.83% 85.95%
Low 95.60% 97.28% 97.66% 97.67%

It would appear that height is the best predictor, but certainly the values inputted by the stringers can add some value. Yet there are still outliers.

Toby Hall is one of the crouchiest players in baseball, and Batting Stance Guy demonstrates as much in this video. He also stresses the bent knees of Vernon Wells and Albert Pujols, whose crouches I can envision, but unfortunately they can't be fully captured in a regression. And Alex Rios has a big crouch, which was even commented on by Christina Kahrl in a past BP Annual. She wrote, "Alex Rios' stance reminds me of Von Hayes--spread low, slightly knock-kneed, and will he, like Hayes, always just be that slightly less than expected but still-good player," to which I say, bite your tongue, Christina Kahrl. Von Hayes is an icon.

Most of the batters who have higher strike zones than their height would indicate are pitchers. Many pitchers stand at the plate stiff as a board. As for position players, BSG accentuates the straight front leg in Adrian Gonzalez's stance. Jhonny Peralta's stance is unique, too. And Chase Utley also has an upright stance, which is somewhat notable, but more importantly, Batting Stance Guy also does an impression of the Von Hayes crouch in the linked Phillies video, and any time you have the opportunity to reference Von Hayes, it's a no brainer.

Touching BasesJune 29, 2010
Shifts Happen
By Jeremy Greenhouse

Last week, I explored the difference between those players who hit with the shift and those who do not. It would be useful to show that the shift does, in fact, play a part in BABIP, and the observed effect was not only a product of different player pools. So I took the 16 players I believe to be semi-regularly shifted and found their groundball data with men on base vs. with no men on. This serves as a proxy that shows whether the defense is shifting them or not. Below is a plot of the 16 batters' groundball average based on trajectory angle and, below that, a plot showing the frequency at which these batters hit to each angle.

Shift.gif

With men on, these pull hitters are able to pick up more hits on balls up the middle and in the 3-4 hole. The shift is most effective on balls in these locations, so this makes sense that these vacated holes result in hits. However, I think balls directly at the first baseman go for hits more often with men on base because the first baseman has to hold the runner on and not because the shift is off. The only place where there is an improved BABIP when the bases are empty is on balls down the third base line.

I've heard the argument that the shift takes away the outer part of the plate from the pitcher. Under this logic, the shift actually works to the hitter's advantage, as any ball that's on the outer half can be easily taken the other way for an automatic hit, and therefore the pitcher must pitch predictably inside. Using the same sample, I split the plate into halves and found the groundball distribution.

Shiftinout.gif

I think the takeaway here is that it's not natural for these guys to hit down the third base line. So unless they decide to change their approach dramatically, i.e. bunt, the defense can vacate third base, and the pitcher can pitch outside with no fear of a hit going right down the line.

The other unusual infield alignment, besides the shift, is the infield in. I searched for all grounders with a man on third in the seventh inning or later, which is when the infield might be drawn in. I just began the process of linking the Gameday database to Retrosheet, so unfortunately, I don't yet have data that indicates the number of outs or the score during each at bat. Instead, I broke the data into two groups based on whether the final score of the game was close (one or two runs) or not. In a blowout, teams never bring the infield in.

Infieldin.gif

I don't have much confidence in the crude distinction between these two groups. This neither proves nor disproves that that batting average on groundballs goes up .100 points with the infield in. There might be evidence that bringing the infield in surrenders hits on balls in the holes, but not necessarily at the fielders.

Finally, I looked at bunts. I took all bunts that occurred with the bases empty, so I knew the batter was bunting for a hit, and split the data by handedness.

Bunts.gif

RHBs are most successful bunting down the first base line, where they bunt more often than LHBs. LHBs are most successful bunting toward third, where they bunt more often than RHBs.

I feel like there are wins to be had here. The difference between a third baseman playing in for a bunt or playing behind 2nd base in a shift isn't trivial in preventing runs. I don't know if it would be asking too much for the bench coach to study spray charts and plan defensive alignments for the opposition, but then again, I don't know what a bench coach does. What does a bench coach do?

Touching BasesJune 24, 2010
Shift Morneau Shift?
By Jeremy Greenhouse

Inspired by my possible doppelganger Ben Lindbergh, I decided to revisit the topic that brought me to this here very site: the shift. Ben wrote an in-depth piece at Baseball Prospectus about J.D. Drew and the shift on Monday, concluding that, "We don’t know precisely how Drew would respond to an escalation of the shift, and if the current state of affairs persists, we never will, but it’s probably worth it for teams to find out; it seems fairly certain that Drew is winning this battle of offense-against-defense game theory thus far." So my question is, who else might benefit from an altered defensive alignment?

Max Marchi and Ricky Zanker have explored aspects of graphing batted ball distributions. Building on their work, I came up with my own model. Using MLBAM-provided batted ball location data from 2008-present and Peter Jensen's gameday translations, I found the batted ball angle of all non-bunt grounders from left-handed hitters with no one on base, as well as whether or not the batter reached safely. I sorted the data into two groups, the first of which contained 2,500 grounders from 15 "shifted" batters, your Howards and Giambis. The rest of the 32,000 grounders formed the second group. I then fitted a binomial LOESS smoothing curve to the data. Here is the resulting model:

LHBShifts.gif

Allow me to explain. The top portion of the graph shows BABIP on grounders. There are three big differences between the red line (shift) are the blue line (no shift). First, at -15 degrees, shifted players have the benefit of a vacated shortstop position, and are therefore better than twice as likely to pick up a hit on a batted ball to that vector. Next, at 0 degrees, straight up the middle, shifted players have under a 50% chance at reaching base, while non-shifted players are up above 60%. And finally, balls directed toward the 3-4 hole are much more likely to go for hits when there is no shift. So, to sum up the obvious, implementing a shift allows hits on batted balls toward left field, but in exchange, balls up the middle and in the hole are converted into outs at a higher rate. On the bottom of the graph is a histogram. On average, shifted players hit a higher percentage of balls toward the second baseman, and many fewer balls toward the shortstop. The other notable difference is that shifted players have hit fewer balls up the middle than their counterparts, even though the defense is aligned to prevent hits on balls up the middle.

While it would be nice to have reliable measures pf batted ball speed and batter speed (the two other considerations that help determine groundball average), I had to make do without. So I predicted both of the above fits against my dataset to come up with expected averages for shift and no shift. Here's how the shifted players stack up:

"Angle" is the average batted ball angle. "BABIP" is the rate at which the batter reaches base safely. "No Shift" is the predicted BABIP using the no shift model, and "Shift" is the predicted BABIP using the shift model.

Name Angle BABIP No Shift Shift
Ryan Howard 19.7 .167 .238 .199
Carlos Pena 23.7 .212 .213 .174
Adam Dunn 19.6 .203 .232 .193
Jim Thome 16.6 .141 .239 .201
Jason Giambi 20.1 .147 .226 .190
Jack Cust 18.5 .205 .223 .189
Chase Utley 17.9 .273 .242 .208
David Ortiz 19.2 .168 .221 .189
Travis Hafner 10.9 .203 .257 .227
Ken Griffey 17.4 .180 .225 .196
Mark Teixeira 18.6 .259 .227 .198
Carlos Delgado 18.1 .128 .234 .207
Prince Fielder 11.0 .271 .261 .235
Mike Jacobs 16.4 .199 .227 .202
League Average 11.9 .243 .246 .225
Justin Morneau 9.8 .239 .248 .231

You might notice that the league-average BABIP on non-shifted players is 20 points higher than it is for shifted players. This doesn't mean that the shift uniformly lowers BABIP by 20 points. This means that the type of player who gets shifted is bad at reaching base via groundballs. So when comparing the two models, keep the averages in mind, and for players who are speedy, such as Jimmy Rollins, understand that the shift may not be a viable option.

I might be wrong about Justin Morneau, and maybe he isn't shifted regularly, but if he is, it's a mistake. So when it comes to Shift Morneau Shift,* I say "No Shift!"

*Credit to my friend Pat for starting the baseball T.V. shows Twitter topic and my buddy Steve for coming up with Deal Morneau Deal.

Carlos Pena has far and away the most skewed groundball angle toward his pull side. Most of these guys are obvious shift candidates. Fielder and Morneau maybe not so much. But these aren't the only players for whom the shift matters. So how about the non-shifted guys?

I found the difference between the "Shift" column and the "No Shift" column for those batters with at least 25 groundballs hit. Three rookies and J.D. Drew himself top the list. Brennan Boesch, Jason Heyward, and Ike Davis have all been hugely successful, exceeding even the most optimistic of expectations. But maybe their pace will slow once defenses learn how to play them. The exaggerated infield shift is certainly an option. It's also likely that their luck will soon run out, as their grounders have simply found holes. Luck has nothing to do with J.D. Drew's success on grounders. If people would just take a look at his spray chart data, they'd know to shift him, but unfortunately, too many are of the line of thought that it doesn't matter how you play him, since he's hit 30 homers in a season only once and is paid $70 million. J.D. Drew does something funny to people's minds.

Here are five players I would strongly consider shifting against, followed by the rest of my dataset.

Name Angle BABIP No Shift Shift
J.D. Drew 18.6 .253 .256 .203
Garrett Jones 17.4 .304 .248 .203
Chase Headley 20.0 .236 .241 .196
Adam LaRoche 20.4 .178 .225 .187
Alex Gordon 19.4 .309 .234 .197

Touching BasesJune 22, 2010
Expected Platoon Splits
By Jeremy Greenhouse

A couple of weeks ago, MGL formulated a regression equation that estimated platoon splits based on different pitch types. Max Marchi has found the average run values for different pitch types by batter handedness as well. I ran my own regression equation using pitch velocity and movement to find an expected value of pitches against batters of different handedness.

Pitchers are often placed in the bullpen if they prove incapable of getting opposite-handed batters out. In relief, the ability to get same-handed batters out can be leveraged. In fact, the majority of players with large expected platoon splits are relievers.

Mike Macdougal, a sinker/slider pitcher with a tailing sinker and a sweeping slider has the largest expected platoon split in my sample. As for left-handed pitchers, I was very surprised to learn that Daniel Ray Herrera has a strong platoon split. The changeup is the great neutralizer when it comes to the platoon advantage, and I've always thought of the screwball as a mutant changeup in that it also moves toward same-handed batters. But Herrera is useless against righties. That Herrera has a high LOOGY score is just another mark in his favor for sabermetric fans. I hope by now we all know about the joy of his screwball. But even when he was in college, one stat-savvy fan wrote a ballad for Herrera, and Herrera has since become the mascot for collegesplits.com* Similarly, Hideki Okajima, whose over-the-top delivery I would think allows same-handed hitters to see the ball out of his hand, actually has much greater success against lefties than righties.

*I like to think of Yankee farmhand Pat Venditte as the current Herrera. Seen as trick-pitchers by scouts (Herrera because of his screwball, Venditte because he's a switch-pitcher), both Herrera and Venditte have encountered nothing but success. Venditte has been putting up better numbers in the Minors than he did as a walk-on-turned-All-American at Creighton. At 25 years old, Venditte has thrown 36 innings in High-A this year, striking out 48, walking 9, and allowing one homer. People say that his gimmick won't work when he has to face Major League hitters, but I say the game's the same, just gets more fierce. I fear that the only reason the Yankees have yet to promote him is that they don't want to disrupt the structure of every baseball database in the world, as pitcher-handedness has never been tracked by at-bat. Anyway, if I had to guess, I'd think Venditte would perform better as a southpaw, given that he has subpar stuff from both sides, yet he still tries to get it done conventionally as a righty. His sidearm approach as a lefty could at least give Major Leaguers a different look.

Sinkerballer Fausto Carmona has the largest expected platoon split for a starter. He's struck out as many lefties as he's walked in his career, but for some reason he's found more success as a starter than he did in the bullpen, where he had one of the most disastrous runs as a closer of all-time. Carmona's former battery-mate CC Sabathia is also Carmona's counterpart when it comes to left-handed starters expected platoon splits. However, Sabathia is fine against righties, and otherworldly against lefties, which is why he's never been considered as a reliever.

I think J.A. Happ would have the most to gain of any starter by being placed in the bullpen, in spite of his quality changeup. Dontrelle Willis, too. Why hasn't he been tried in the bullpen? Junkballer Matthew Mahoney has one of the few expected reverse platoon splits, although that hasn't come to fruition in his time in the Majors. Chris Tillman, too, has an expected reverse platoon split, so I think it's wise that the Orioles break him in as a starter and keep him in the rotation if only at AAA. And Jennry Mejia's cutter, like Mariano Rivera's, should be either \as good or better against lefties as it is to righties, so that's another reason he should be given every attempt to start. It's Oliver Perez who might be better suited for the bullpen, as he would have utility as a LOOGY.

Joe Maddon and the Rays have surrendered the platoon advantage against changeup specialists a couple times this year. Maddon has stacked the lineup with same-handed batters against such pitchers, and even ordered switch-hitters to bat from their unnatural side. The switch-hitter thing is just crazy, but maybe there's something to a reverse platoon splits with changeup guys. The Rays' front office is known for going the extra 2%, which includes PITCHf/x analysis. But if the decision is coming from any higher up than Maddon, I don't know what data they're looking at. (If Maddon is making the decision, it's off of splits from this year and whatever biases come from being no-hit twice by chaneup artists.) RHP Shaun Marcum and LHP John Danks have been better against opposite-handed batters than same-handed batters, but I don't see anything in their PITCHf/x profile that would suggest their projected platoon splits should be so far from the mean. It's much easier to say which pitchers' reverse platoon splits are fake (I'd say a couple of Giants in Jeremy Affeldt and Sergio Romo) than whose are real.

In doing this analysis, the pitcher in whom I was most interested was Justin Masterson. Ever since he broke into the Bigs, the word was that his sidearm delivery was more suited for relief than starting. His performance has been acceptable as a starter, but his enormous platoon split has reinforced the notion in some minds that he should start. I didn't include him in my sample, since he's a sidearmer, but I predicted his out-of-sample performance anyway. His slider is a fine pitch to both RHBs and LHBs. To righties, both of his fastballs are truly unique pitches, and have been hugely successful. The problem is that his sinker is his best pitch, and he chooses not to throw it to lefties. And his four-seam fastball is rendered ineffective against LHBs, so he's handcuffed himself to only his breaking ball. Without another offering, I don't think he'll ever be able to get lefties out.

Touching BasesJune 17, 2010
Stuff on Stuff
By Jeremy Greenhouse

So I ran my StuffRV numbers yesterday, and you know what that means? Gallimaufry!

  • Where else to start but Strasburg? Best stuff ever for a starter? Best stuff ever. Stephen Strasburg has been compared to Ubaldo Jimenez in terms of stuff, but after taking a closer look at the PITCHf/x data, I don't think they're especially close. Strasburg's four-seamer comes in at 98, faster than Ubaldo throws, and his 97-MPH sinker moves more than Ubaldo's fastball. Strasburg's 91-MPH changeup would make for an excellent fastball, given its negative vertical movement. Strasburg's curve, the best pitch in baseball for my money, is much sharper than Ubaldo's, though Ubaldo does boast an impressive slider.

  • If Ubaldo Jimenez threw submarine, "The U-Boat" would be the greatest nickname of all-time.

  • Strasburg, No. 6, is the only starting pitcher ranked in the top 25 of overall stuff. Topping the list by a wide margin is Matt Thornton. The dearth of southpaws who throw 96 likely skew the results in his favor. The other top-five pitchers all sport sterling fastball-slider repertoires- Henry Rodriguez, Daniel Bard, Kevin Jepsen, and Brian Wilson. Rodriguez has actually lost a fair amount of stuff from last year, when he threw half his pitches at 100 MPH and above. Now he's down to 97. Also, Jonathan Broxton, despite sacrificing a couple tick of velocity in favor of control, remains in the top ten. When Broxton's .371 BABIP regresses, maybe his 0.92 ERA will start to look a bit more like his 0.67 FIP. I say Buy low on him.

  • I have no idea if Citi Field's PITCHf/x system is calibrated correctly, but Jenrry Mejia has been throwing a fair share of fastballs that cut toward his glove side. Most fastballs tail at least somewhat to the arm side. Mejia still needs to command his pitches, but I believe a couple decades ago there was another Latin American 20-year-old learning to harness a fastball with incredible cutting movement who went on to close games in New York. At least the Yankees let Mo fail as a starter before he moved to the pen.

  • Speaking of Mariano Rivera, he still has terrific stuff, but he has taken a downturn from past years. Not just in velocity (93 to 91), but in movement as well (loss of an inch). Clayton Kershaw is another elite pitcher when it comes to stuff, probably the top starting left-hander, but he, too, has lost some of his velocity from last year. He's negated that by favoring his slider over his curve. When Kershaw was a top prospect, his curve rose to fame fame after Vin Scully dubbed it "Public Enemy Number 1," but the pitch has either lost a lot of its snap, or was overrated to begin with, and the decreased usage is a wise decision.

  • Francisco Liriano has risen a long ways to nearly reach the summit at which Kershaw has plateaued. Liriano's return from surgery has been well-documented, and the fact that his stuff now ranks up there with Kershaw and Brett Anderson makes me yearn for his PITCHf/x data back when Liriano was throwing 95 pre-injury.

  • As a testament to the importance of stuff, Carlos Marmol threw 91 in 2006 when he struck out 6.9 batters per nine. In the following three years, he threw 93-94 and managed K/9 rates between 11 and 13. This year, his stuff has taken another impressive leap, including an uptick in velo to 95 MPH, and he has a strikeout rate of 17. His slider is nuts.

  • I'm convinced that if he's not already a good pitcher, Charlie Morton will become one. Like Morton, Evan Meek of the Pirates had gaudily awful numbers a couple years ago, but the Bucs stuck with him, and his 95-MPH fastball and electric curveball certainly play now.

    Chad Cordero is back and pitching in the Major Leagues. I predict that, like this, won't end well.

  • Touching BasesJune 15, 2010
    Bimodal Distributions
    By Jeremy Greenhouse

    Dave Allen has written at length about Mariano Rivera's pitch locations. PITCHf/x has recorded over 2,500 Mo-thrown pitches, and from the following graph, you can see that Rivera spots his fastball on either side of the plate, but is able to avoid the middle.

    Riverabimodal.gif

    Dave described this horizontal scattering as a bimodal distribution, which Rob Neyer in turn called his "new favorite baseball term." Chris Moore, too, was intrigued, and he found that Rivera is indeed the best at hitting the corners. "On average, Rivera places his pitches 4.4 inches away from the very edge of the plate."

    I'm interested in who can throw to both sides of the plate, but avoid the middle. So I broke the plate into thirds and counted the number of each pitcher's separate pitch types in each zone. Overall, I came up with a list of about 60 pitchers who threw fewer pitches in the middle zone than they did in either of the outer thirds. Andy Pettitte, Carl Pavano, Jake Peavy, and Livan Hernandez command multiple pitches on both sides of the plate. Rivera, of course, stood out, as he throws only 20% of pitches over the plate in the middle third, while other pitchers are 25% and up. But using the invaluable Texas Leaguers' PITCHf/x tool, which provided the above graph for Rivera, I'd like to take a look at some other pitchers who manage visible bimodal distributions.

    Here's Shaun Marcum, who throws the third-softest fastball in the American League, but commands it better than nearly anyone.

    Marcumbimodal.gif

    Livan Hernandez's fastball shows a bimodal distribution, but unlike Rivera and Marcum, he doesn't keep the batters guessing. He only throws his fastball outside.

    Livan vs. RHB:

    LivanRHB.gif

    Livan vs. LHB:

    LivanLHB.gif

    Since Livan demonstrates the ability to throw his fastball to both sides of the plate, shouldn't he keep hitters honest by coming in on them once in a while?

    Hiroki Kuroda follows a similar approach to Livan, but more impressively, he avoids the heart of the plate with his slider.

    Kuroda vs. RHB:

    KurodaRHB.gif

    Kuroda vs. LHB:

    KurodaLHB.gif

    But I prefer pitchers who can throw the same pitch to both sides of the plate against the same batter, like Jamie Moyer's cutter to righties.

    MoyerRHB.gif

    Touching BasesJune 10, 2010
    Spitballing on Command
    By Jeremy Greenhouse

    At best, quantifying command is really difficult. At worst it's a foolish endeavor. The reason is that, while we may know the precise location of a pitch thanks to PITCHf/x data, we have no idea of the pitcher's intention. Perhaps pitchers could fill out a survey after every inning, or perhaps someone could track the target of the catcher's glove. Maybe these data are being collected somewhere, but they certainly aren't publicly available. But we beat on.

    Mike Fast in the 2009 Hardball Times Annual took a shot at measuring Cliff Lee's command, and Dave Allen tried with Mariano Rivera. Borrowing ideas from both of them, I attempted to rank a group of pitchers by command.

    My sample consists of pitches that I have classified as four-seam fastballs in RHB vs. RHP matchups on 0-0 counts. 100 pitchers have thrown at least 200 such pitches, giving me over 60,000 data points.

    First, I came up with a heat map. It shows what you'd expect. Fastballs up-and-in or down-and-away are most successful. Then I predicted each pitch's expected run value based on such location. Here are the top six:

    Greg Maddux
    Trevor Hoffman
    Yusmeiro Petit
    Phil Hughes
    Paul Byrd

    Maddux's command is legendary, so it speaks wellthat he ranks so highly. I'm pretty sure all of these guys have good reputations for command. And the bottom 5:

    Seth McClung
    Fausto Carmona
    Rich Harden
    Dennis Sarfate
    Matt Albers

    Looking at a pitcher's walk rate usually suffices in grading command. Since 2007, all of these guys have surrendered their fair share of walks, and all those balls show up in the numbers.

    So I think that method has legs. I controlled for a fair amount of things (batter/pitcher handedness, count, pitch type), but one could go even further and regress the league-wide locational run values to each batter's own heat map. The sample sizes get small, so for left-handed fastballs to left-handed batters, I'd probably combine 0-2 counts with 1-2 counts, and use both two-seam and four-seam fastballs. Regression to the mean and stuff.

    I also tried clustering analysis. In a situation as specific as RHB vs. RHP, 0-0 count, pitchers generally have more types of pitch offerings to choose from than pitch locations. With fastballs, you either go high heat or throw at the knees. With sliders, there's back foot or back door. Curves are intended to be thrown either anywhere in the dirt or anywhere in the zone. Anyway, those are the assumptions you need to make if you believe clustering makes sense. Furthermore, if you're limited to k-means clustering, you might as well assume that all pitchers have two intended locations for their fastballs. That's what I did, anyway. So I gave each pitcher his own two separate cluster centers, and found each pitch's standard deviation from those centers, grouping by pitcher. Here were the leaders:

    Greg Maddux
    Brett Myers
    Joakim Soria
    Ricky Nolasco
    John Lackey

    Maddux is no Rivera, but he's head-and-shoulders above the other 99 pitchers in my sample when it comes to command, so it lends validation to the power of PITCHf/x that two rudimentary analyses can pull out Maddux's needles from the haystack. The bottom five:

    Matt Garza
    Brandon Morrow
    Seth McClung
    Rich Harden
    David Aardsma

    I believe that Aardsma's four-seam fastball is an outlier in several ways. Though I'm not disregarding this piece of data, I don't think it means what it's supposed to mean. But all of these guys are prone to the walk. It would be weird be if somebody had excellent command outside the strike zone, so that his expected run values based on location graded out poorly, but he had really tight clusters of pitches. This would indicate good command but poor approach. I always get that feeling watching Dice-K.

    So Maddux, Nolasco, Hughes, and Petit are in the top ten of both lists. I know Maddux and Nolasco have great reputations for control; I'm unsure about the other two. Garza, Sarfate, Harden, and McClung show up in the bottom ten of both lists, Sarfate and McClung definitely have no aptitude for command.

    The ultimate goal here is to evaluate pitchers. I feel confident that with a sample of 50 pitches, I could assess a guy's stuff. I think a pitcher would need to have thrown over 1,000 pitches, assuming he's not walking the ballpark, to provide an ample PITCHf/x sample for evaluating command, given the need to drill down the data by pitch types, batter types, and counts. And it takes precisely 4,242 pitches to get a good read on a pitcher's intangibles.

    Touching BasesJune 08, 2010
    Dollars per WAR
    By Jeremy Greenhouse

    When it comes to free agent signings, baseball fans love making snap decisions and playing GM. Some contracts, like Evan Longoria's or Ryan Howard's, are rather easy to judge. To objectively evaluate others, you need a whole lot of context. I'd like to provide a bit of that context using the informative and interactive Google Motion Charts. (If you want to view the charts, you need Flash, and if you're using Chrome, you need to open them in a new tab or incognito. For some reason, Google doesn't want its browser to have access to its apps.)

    The baseball databank has salary data going back to 1985, and Sean Smith's WAR database well covers that time frame. As the Collective Bargaining Agreement stands, players in their first few years of MLB service time have their salary set by the team (league minimum $400,000). After that, players face several years where they are eligible for arbitration, and finally, with over six years of service time, they can become free agents. Here is how each group of players has been valued over time.

    The less experienced players have seen their salaries rise steadily since 1985. But I'd like to focus more on the more interesting group of players who have over seven-plus years experience. Many mark 1998 as the year that baseball recovered from 1994. Indeed, from 1998 to 2003, the market rate for "free agent" WAR rose $500,000 per year, which signifies financial health. Consequently, over half of all MLB salaries went to these "free agent" players during the time period. However, these players produced approximately 75-80% of the league's WAR, whether they accounted for 40% or 65% of the league's salary. Free agents are no longer in vogue, as teams realize the value of the more inexperienced players, and are less willing to pay for for production from more experienced players. From the chart, you can see that over the last couple years, free agent prices might be on the decline, while cheap talent has become less cheap.

    I'm also interested in dollars per WAR at the team level. I broke down the data into five increments of five years apiece stretching from 1985-2009, and found the average yearly WAR, salary, and dollars per WAR for all 30 teams. You might be familiar with a graph of this nature, plotting a team's payroll against a team's success.

    This demonstrates the positive, non-linear relationship between pay and performance. The size of each point represents whether a point falls above or below an imagined regression line. I've highlighted both teams from Florida, and both teams from New York. The Marlins and Rays, occupied by the smallest dots, appear to get the most out of limited resources since 2005. But have they identified market inefficiencies, or are they just cheap? The Yankees and Mets portray the most bloated dots, and perhaps dole out the most bloated contracts. So are their payrolls' driven by reckless spending, or is the free agent market more practical to them?

    In Baseball Between the Numbers, Nate Silver penned a seminal piece in which he stated that the marginal value of a win is most valuable for teams closest to the playoffs. Many point out that the more a team spends, the more it wins Few point out that the more a team wins, the more it should spend. Breaking the data down further, I ran the salary and WAR numbers by team for only players with over six years experience. This way, we can see if the Rays and Marlins have shrewdly spent in the free agent market, or if they simply stayed away from signing veterans altogether, thereby controlling costs. If the Yankees and Mets have been winning games by outbidding other teams in free agent auctions, they would be afflicted by the winner's curse. They would pay above-market rates for free agents. However, they do not, as evidenced by the color of their dots. The shading of each point represents the Dollars per WAR paid for a team's most experienced players. Due to their position in the standings, the Mets and Yankees find more value in the free agent market than others do, so New York teams allocate more resources in it. But they spend about as efficiently as others.

    While the Marlins may spend their money efficiently, this is only because they more or less avoid free agents, not because they make wise free agent signings. In fact, the teams that have spent least on free agents over the last five years have been less successful when dipping their toes in the free agent waters. The average Dollars per WAR for seven-plus year players has been around $4.5 million, which shows up as greenish-yellowish in the chart. The yellow/red points indicate teams that have spent inefficiently on free agents. Turns out, Seattle, San Francisco, Baltimore, San Diego, Washington, Kansas City, Pittsburgh, and Florida have had the worst fortune in the free agent market. None of these teams have dabbled too heavily, but they've all paid well above market rate, and the Padres are the only one of them to have made the playoffs. Meanwhile, the Yanks and Mets pay right around market rate. The Blue Jays have somehow managed to acquire good, experienced players on the cheap.

    Touching BasesJune 03, 2010
    WAR Aging Curves
    By Jeremy Greenhouse

    WAR, short for Wins Above Replacement, is an all-encompassing metric of a player's value. It incorporates hitting, defense, baserunning, durability, and spits out one number. Using Sean Smith's invaluable WAR database, I studied positional player aging.

    We know that speed and defense peak early and that power and walks peak late. With WAR, we can throw everything together. Overall player value was originally posited to peak between ages 28-32, but the subject has been revisited and peak age revised to somewhere around 26-30. Here's my basic aging curve.

    To develop this curve, I found all examples of players playing in two consecutive seasons, excluding the first and last year's of a player's career, since those tend to be somewhat fluky. I then computed the average difference in WAR between such seasons.

    While players between 30 and 35 years old are often the best in the Majors, they are likely in decline. In general, I find that players improve at a decreasing rate until they're 27 or so and then decline at an increasing rate. I'm not trying to toss my hat into the J.C. Bradbury vs. MGL debate, but I'm using that as my benchmark for further aging curves.

    My intention is to find how players, given a certain set of characteristics, age as compared to others. Height and weight are fairly consistent attributes, but unfortunately, height and weight data are unreliable for baseball players. Nevertheless, it would make sense that players with different body types would age their own separate ways, so I used body mass index to differentiate between big and small players.

    Bigger is better, although the aging curves move along more or less parallel lines. You might say that bigger players age less gracefully than smaller players, but that could be just because they are better and therefore have more room to collapse. Regression to the mean works more heavily on players farther from the mean.

    Next, I separated players by career defensive ability, as defined by the sum of the positional and total zone components of WAR.

    Bad defenders are good hitters, otherwise they wouldn't play. I would imagine that during a bad defender's peak, he is a passable fielder. But as he ages and his defense deteriorates at a pace that outstrips the offensive decline of good defenders, the good defenders become better all-around players than the bad ones.

    Separating by career hitting value,

    Bad hitters peak two years earlier than good hitters. My guess is that good hitters use their power, which peaks late, while bad hitters get by with their speed, which peaks early.

    Bill James once submitted that "young players with old player's skills...tend to peak early and fade away earlier than other players." Old player skills consist of striking out, walking, hitting for power, and being slow. Separating players by career baserunning value yielded no trend. I also looked at strikeout and walk rates. To do so, I had to limit my sample to years after 1954.

    This evidence indicates that high-strikeout players do indeed peak a year earlier than low-strikeout players, but they also have a smoother aging curve than their counterparts. If they fade away faster, it's only because they weren't as good in the first place

    By walk rate,

    High-walk players actually peak a year later than low-walk players, but fade faster.

    There are some lessons on regression to the mean in here. Better players appear to decline quickly because there's more room for them to collapse in case of an injury. I'm not making any conclusions about aging curves for types of players with old player skills or any such subset, since the more specifically I drill down a type of player, the smaller the sample becomes. Even so, big or small, old player skills or no, the Ryan Howard contract was a mistake.

    Touching BasesMay 27, 2010
    A PITCHf/x Look at Drew Storen
    By Jeremy Greenhouse

    Drew Storen is, for a variety of reasons, one of my favorite baseball players. I interviewed Storen this time last year, after which (because of which?) he was drafted with the tenth pick by the Washington Nationals due to his ability to throw 92 with movement.

    Storen is one of the few players I've seen comment on the PITCHf/x sytem, telling Baseball Prospecus' interview laureate David Laurila,

    "It’s awesome because you’re able to see how much movement you get on the ball, although it almost feels like you need a college degree to check out and understand some of the graphs they have on that Brooks site. But it’s interesting to see how much movement you get on your fastball, because you don’t really realize it. When you’re on the mound it’s kind of tough to see the movement that you have and a lot of times you have to rely on the catcher. "How was that?" or "What do you think?" It’s good to be able to see what the difference in movement is that you get on each pitch."

    Storen fast-tracked his way to the big leagues, posting a gaudy 64-11 strikeout-to-walk ratio (his stated metric of choice) in the minors, and has made five appearances in middle relief for the Nats in the month of May, throwing nearly 100 pitches.

    Storen has thrown four pitch types thus far: two types of fastballs and two types of breaking pitches.

    Starting off with his fastball, Storen throws a four-seamer between 94 and 96 miles per hour and his two-seamer a tick slower. His four-seamer flies a little too true for my liking, averaging ten inches in vertical movement, which is a danger zone for a pitch of that velocity. Coming into last night, Storen had used his four-seamer 13 times, twelve to right-handed hitters, throwing only two of them in the strike zone. Last night, however, he threw the pitch eight times, inducing four swinging strikes. His two-seamer is a quality pitch, similar in velocity and movement to an A.J. Burnett two-seam offering. He throws both types of fastballs to any hitter, regardless of batter handedness. His choice of fastball depends on whether he wants to locate the pitch on his arm side or his glove side.

    Storen%20Fastball.jpg

    As for his off-speed pitches, he throws a true slider you often see from power righties coming in from the bullpen, and he also has mixed in a slurve a handful of times. Only two miles per hour slower than his slider, Storen's slurve achieves seven inches greater movement. Few pitchers (Burnett, Felix, Jepsen, Anderson, Lindstrom) can make a breaking ball drop seven inches at the type of velocity Storen throws his slurve, so I hope he mixes it in even more than he has.

    Coming up as Stanford's closer, Storen supposedly threw about 92, getting by thanks to excellent command. He's continued to throw strikes as a pro, but from what he's shown in the Majors, his velocity was either being under-reported, or he's kicked it up a notch, and his breaking pitches also have shown good bite. I look forward to watching him close games for the Nationals in the near future.

    Touching BasesMay 25, 2010
    90>95?
    By Jeremy Greenhouse

    Power vs. finesse. It's the classic debate. Spanning over 60 feet 6 inches, the difference between a 90 mile-per-hour fastball and a 95-MPH heater makes up a couple hundredths of a second. More importantly, those 5 MPH represent the difference between fringe stuff and an above-average Major League fastball. So how do pitchers compensate for shortcomings in velocity?

    Throwing left handed is the simplest solution. The demand for southpaws is so great and the supply so scarce that the price for a lefty far surpasses that of an equally talented righty. Put another way, left-handed pitchers can accomplish more with less. So left-handed pitchers were excluded from my sample.

    My sample consisted of of over 100,000 pitches from the past two calendar years. I grouped pitches by batter handedness as well as by velocity--depending on whether the velocity rounded off to 90 MPH or 95.

    First, I looked at pitch location. The color scales that portray run value are the same for both images, so you can compare them directly.

    9095.png

    Soft tossers can't survive by living up in the zone. A 90-MPH pitch can be thrown in the perfect spot in on the hands, and it still won't have the same success on average as a 95-MPH pitch that misses by half a foot. However, pitchers who throw 90 experience just as much success throwing down and away to same-handed batters as pitchers who throw 95. In this regard, pitch location can be a true equalizer. Joakim Soria locates his 90-MPH fastball so well that it's in the upper echelon of all fastballs, while Daniel Cabrera has located his 95 MPH fastballs so poorly that he's out of the league.

    I also looked at pitch movement. The magnitude of the effect of pitch movement is much smaller than that of pitch location. Below, run value is plotted against horizontal movement in the solid-line portion of the graph, while a histogram for horizontal movement can be found at the bottom.

    9095hmov.png

    A 90-MPH pitch with average movement is a disaster. Even a 90-MPH pitch with great tail can't match an average 95-MPH pitch unless the 90-MPH pitch also has sink on it. But if a pitcher can really cut the ball so that it acts as a cutter, or even a slider for some, it can match an average 95-MPH fastball.

    And vertical movement:

    9095vmov.png

    I find this to be an interesting trend. The 90-MPH pitchers are better off throwing rising fastballs, while 95-MPH pitchers are just as well off throwing sinkers or risers, so long as they stay out of that ten-inch danger zone to which the batter is accustomed.

    In combining both horizontal and vertical movement, it's evident that Peter Moylan generates enough movement on his fastball to throw it at elite levels, while Cabrera, again, has a mediocre-to-awful fastball in spite of his velo. Remember, I'm only including 95 MPH pitches, so imagine how bad his fastball must have been in 2009 at 91 MPH. Cabrera is the poster boy for pitchers who can throw gas but have no command or movement, rendering their fastball ineffective. Kevin Jepsen, Jonathan Broxton, and Brian Wilson are examples of pitchers whose 90-MPH pitches are better than most pitchers' 95s, since those guys are throwing off speed at 90. Also of note: Jenrry Mejia's fastball has excellent movement.

    Mixing location and movement into a regression, here are the best 90-MPH fastballs with at least 100 thrown:

    Jared Burton
    David Robertson
    Peter Moylan
    Ryan Franklin
    Brian Sanches
    Joakim Soria
    Zack Greinke
    Cory Wade
    Roy Halladay
    Mariano Rivera

    David Robertson continues to be the man. No pitcher's 90-MPH fastball penetrates the top tenth of my sample, but all of these pitchers are squarely above average. They show that 90 MPH can beat 95, especially when the 95 is coming from the likes of:

    Manny Acosta
    Jason Bulger
    Mitchell Boggs
    Craig Hansen
    Daniel Cabrera

    Cabrera's 95 MPH fastball was the third worst fastball in my sample, and no other 95-MPH fastball fell in the bottom 40. The 90-MPH version of Cabrera's fastball was arguably better than his previous iteration.

    Touching BasesMay 20, 2010
    Lidge's Pitches
    By Jeremy Greenhouse

    Brad Lidge is a two-pitch pitcher. His arsenal consists of mid-90s fastball and a high-80s slider. From 2008-2009, Lidge faced a few hundred 0-2 and 1-2 counts in which he had to choose a putaway pitch. While Lidge generally splits his pitch selection right down the middle, in situations when he's well ahead of the batter, he goes to his slider over 60% of the time. And he gets results.

    Fastball Slider
    Strikeout 9% 26%
    Ball 57% 43%

    PITCHf/x analysts like to use a metric called run value to assess the value of a pitch. Basically, you control for the count and measure the change in run expectancy for a given pitch. So for Lidge, his fastball has been worth a negative 1.5 runs per 100 pitches, while his slider has been worth a positive 1.5 runs per 100. In these 0-2 and 1-2 situations, the trend is similar. So why does he throw fastballs at all if the slider is his bread-and-butter?

    Well, we don't really care about the result of the pitch as much as we do the outcome of the at bat. So how did Lidge ultimately fare at the end of each plate appearance?

    Fastball Slider
    Out Made 81% 74%

    Turns out, Lidge's fastball wasn't ineffective. In a way, it was more effective than his slider. That 57% ball rate might be intentional. Perhaps his advantage in the count allows him to use his fastball as a setup pitch.

    Against righties, Lidge threw 50 fastballs that resulted in a prolonged plate appearance. He proceeded to strike out over half of these batters and allowed only six to reach base. Of course, any pitcher's numbers will seem otherworldly when the context is restricted to two-strike counts, but as Dave Allen has shown, a fastball generally makes for a better setup pitch than a slider.

    How Lidge's slider works off his fastball.

    Lidge.jpg

    Whether or not Lidge tries to raise the eye level of the batter with his mid-90s fastball, when his heater goes for a ball, it's the perfect setup for his slider.

    While some pitchers' off-speed pitches exhibit superior run values, the fastball's grunt work may be the driving force behind such off-speed success.

    Touching BasesMay 13, 2010
    Pitching to the Ump
    By Jeremy Greenhouse

    A couple of days ago, Ben Walker of the Associated Press reported that teams are scouting umpires. I decided to check on the data to see whether pitchers have been changing their approach based on the umpire.

    Umpires' zones vary from game to game, yet some umpires develop reputations around the league for perhaps calling the high strike or maybe sleeping next to an ice bucket. For most umpires, the PITCHf/x system has recorded enough data for an analyst to create a strikezone probability distribution. I'm not going to name any specific umpires, since that might come off like I was trying to evaluate them, which I'm really not, but I did make these probability distributions for the league on average as well as for each umpire, controlling solely for batter handedness. I hypothesized that the difference in a pitcher's expected called strike percentage without controlling for the umpire vs. the same pitcher's expected called strike percentage while controlling for the umpire could be attributed to the pitcher's knowledge of the umpire.

    I found that, given the internal consistency in the data, there is certainly some skill to this effect, but the magnitude of the effect was small I think. , Livan Hernandez, who you may recall was on the same page as Eric Gregg back in 1997, actually has, by the numbers, done the worst job of adjusting for the umpire, as his pitches were 4% less likely to be called strikes given his distribution of umpires than given an average umpire. While the reliability tests I ran showed that Livan was consistently below average at "pitching to the umpire," I dug deeper, and I can't shake the feeling that luck plays a huge part of it. Sorting through umpires, I couldn't find any difference in Liva's approach. But maybe that's the problem. His approach is consistent, and it's the umpires who change. Here, I present a pair of charts displaying data on Livan Hernandez pitching to an umpire who has called a couple of his games.

    I've taken the difference between the average strike zones and a given umpire's strike zone. Blue areas represent spaces where this umpire calls fewer strikes than average, and red areas represent spaces where an umpire is more generous. I made a density estimation to model the distribution of Livan Hernandez's pitches against batters of each handedness, and then plotted a contour line that displays where he's generally pitched over the last few years. I finally plotted the locations of the individual pitches that Livan has thrown with this specific umpire calling the game.

    LivanLHB.jpg

    It turns out that against righties, this has been Livan's favorite umpire. The ump does a great job calling pitches below the knees, and he gives pitchers the down-and-away strike, which is right in the center of where Livan generally pitches. So Livan, who has been 7% more likely to have a pitch called a strike with this umpire behind the plate than an average ump, hasn't actually done anything different. This ump just suits his style.

    Meanwhile, against lefties, Livan pitches exclusively away, and he hasn't changed up his approach, even though this umpire does not tend to give pitchers that call. So in this way, Livan, without doing anything differently, is failing to "pitch to the ump."

    This type of information could also be of value to a manager in deciding whether to throw a sinkerballer who pitches down in the zone or a power pitcher who goes up the ladder. I don't think that pitchers should, or do, change their approach much based on the umpire behind the plate. However, every inch counts, so the information can't hurt.


    Touching BasesApril 28, 2010
    Some Research on BABIP Using PITCHf/x Data
    By Jeremy Greenhouse

    The advent of PITCHf/x has created a contingent of DIPS apostates. Dave Allen has done a substantial amount of research on how to evaluate the quality of a pitch in terms of run value, and I'd like to use similar methods while focusing solely on BABIP.

    First, heat maps for plate location, a topic which Dave has already researched. You can click on the image to enlarge it, but the gist is that pitchers who can jam batters or force them to put low-and-outside pitches in play will achieve low BABIPs, while pitches extending from down-and-in to up-and-away yield high BABIPs.

    However, few pitchers actually have significant control over both the location of the pitch and whether or not the batter puts it in play. It turns out that the range of expected BABIP for pitchers based on the location of pitches put in play is 25 points, except for one outlier. The average BABIP in RHB vs. LHP matchups is around .310, and the maximum expected BABIP for such situations was .325. The second-lowest was Scott Feldman at .300, whose actual BABIP against lefties these last couple of years was .265. I think Feldman's cutter has successfully jammed lefties, and if you look at the RHB vs. LHP heat map, you can see a thick blue area up at the hands where lefties manage a BABIP of about .100.

    Mariano Rivera's expected BABIP against LHBs based on pitch location came out to .270 compared to an actual BABIP of .225. No other pitcher had an expected BABIP below .290. Dave has written extensively about Rivera's ability to control his BABIP by commanding his pitches. I think Mo is unique in this regard. Maybe Greg Maddux in his prime was controlling BABIP by locating his pitches, but I think any pitcher who can consistently force batters to put well-located pitches into play is an exception.

    Next, release points. You can see that those pitches thrown at extreme release points result in different BABIPs than pitches at traditional release points. Some of this is the nature of local regression not regressing, or "smoothing," enough for outliers, but nevertheless, I think sidearmers can legitimately control BABIP. The range in expected BABIP for pitchers when based on release points is three times as large as it is when based on pitch location.

    Darren O'Day, Peter Moylan, Joe Smith, Justin Masterson, J.P. Howell, Brian Shouse, and Trever Miller all throw at low arm angles and I think that is why they have been able to control BABIP against same-handed batters. Hideki Okajima and Trevor Hoffman, while not sidearm, also have unusual release points against same-handed batters that I think have contributed to deflated BABIPs.

    Sidebar: Dave jinxed Brett Anderson with his fantastic post on FanGraphs about Anderson's release points varying by batter handedness. Even though Anderson has switched to a uniform release point regardless of the batter, he still has had one of the ten most extreme differences in horizontal release points depending on batter handedness. Alberto Castillo shifts 2.5 feet on the rubber, while Ben Sheets, Hoffman, Fu-Te Ni, and Francisco Liriano are the only other pitchers who move approximately a foot in the direction of the batter. At the other end, Jose Contreras, Darren O'Day, Felipe Paulino, and Manny Corpas shift about a foot the other way. Turns out there's no evident relationship between how much pitchers move on the rubber and their platoon splits. I suppose if there was a correlation, you'd see more guys doing it.

    The effect of release points on BABIP might actually be the effect of pitch movement. I've yet to break BABIP down by pitch movement, but I did find the average BABIPs on pitch types.

    Pitch Type RHP vs. RHB RHP vs. LHB LHP vs. RHB LHP vs. LHB
    CB 0.289 0.299 0.302 0.307
    CH 0.303 0.292 0.290 0.296
    F2 0.309 0.330 0.319 0.306
    F4 0.300 0.312 0.301 0.309
    FC 0.275 0.295 0.305 0.298
    SF 0.303 0.289 0.303 0.304
    SL 0.286 0.309 0.308 0.276
    KN 0.283 0.284

    Part of the reason sinker/slider guys have large platoon splits is because those two pitches exhibit the largest BABIP platoon splits. Changeups and splitters show reverse platoon splits with regards to BABIP. The first group of pitchers found with the ability to maintain a sub-.300 BABIP was knuckleballers, and knuckleballs do indeed have the lowest BABIP of any pitch type.

    Touching BasesApril 22, 2010
    Clusters in the Outfield (Part 2)
    By Jeremy Greenhouse

    Last week in this very space, I used cluster analysis to try to quantify a hitter's spray chart. Commenter "Nightfly" asked, "Are the sample sizes for switch-hitters large enough to run a comparison of, say, Victor Martinez against himself, from each side of the plate?" So instead of comparing hitters to each other as I did last time, I'm going to juxtapose players against themselves. I ran the numbers to see which switch-hitters had the biggest gap between cluster centers, grouping by handedness. It turns out, Carlos Beltran is a pull hitter from both sides of the plate, which forces outfielders to shade five yards in either direction depending on whether he's batting righty or lefty. And to answer your question, Nightfly, no, Victor Martinez cannot throw out baserunners.

    Beltranclusters.jpg

    I changed the color scheme and symbols of the graph at the suggestions of commenters Studes and Alex, and as always, I'd appreciate any advice on how to improve the visuals provided.

    That outfielders position themselves differently based on the batter's handedness is intuitive, but what other more subtle clues might improve outfielder positioning? Rich Lederer and commenter Fat Ted suggest I incorporate PITCHf/x data into my analysis.
    For the upcoming analysis, I'm going to adhere to Peter Jensen's advice that I only look at balls that were caught by outfielders which improves the accuracy of the data but limits the sample.

    First, I looked at how batted ball location fluctuates based on pitch type. It turns out that an outfielder only has to move several feet in general if he knows whether a fastball (two-seam, four-seam, cut) or an off-speed pitch (curve, slider, change, split, knuckle) is coming.

    pitchtypeclusters.jpg

    Juan Rivera, a right-handed batter, is one player who really gets around on off-speed pitches.

    Riveraclusters.jpg

    Meanwhile, Miguel Montero, a left-handed batter, nearly broke my clustering algorithm with his inability to pull fastballs. A visiting right fielder might fare just as well turning balls in play into outs by positioning himself in the Chase field pool when Montero is gearing up for a fastball.

    Monterospray.jpg

    I also looked at patterns dealing with pitch location by splitting the plate into halves. In addition to the fact that batters tend to go the other way with outside pitches and pull inside pitches, Balls on the outer half are also driven slightly farther than balls inside

    pitchlocationclusters.jpg

    Some hitters, like Jacoby Ellsbury and Ian Kinsler, can't drive inside pitches the other way with authority, which I imagine would be useful information to outfielders.

    Ellsburyclusters.jpg

    Touching BasesApril 15, 2010
    Clusters in the Outfield
    By Jeremy Greenhouse

    "I waved in my outfielders. When they got in around me, I said, 'Sit down there on the grass right behind me. I'm pitching this last guy without an outfield.'" -- Satchel

    Outfielder positioning has been a hot topic over at The Book Blog recently. Max Marchi has done some great research on the topic of defense positioning.

    Using MLBAM data, which reports the location of where the ball was fielded, as well as Peter Jensen's Gameday translations, I queried the hit locations of balls in the air that left the infield but stayed in the ballpark. I restricted my sample to only hitters who had at least 100 balls in the air from one side of the plate through 2008-2009. I then ran a k-means algorithm that split the spray chart into three different clusters. I wouldn't say that the centers of each cluster indicate where a fielder might be positioned, since a lot more than just getting to balls goes into positioning, but one might put it that they indicate the middle of a fielder's area of responsibility. I think of it as a tidy way to quantify someone's spray chart.

    For example, Joe Mauer hits the ball in the air the other way a lot. The left-fielder is responsible for three times as many fly balls off Mauer's bat as the right fielder. Conversely, Carlos Pena pulls a fair share of his fly balls. Assigning each ball to a fielder yields the following chart:

    PenaMauer.jpg

    Logically, a fielder would get to the most balls the fastest by standing in the middle of his zone. Again, that often doesn't align with the actual job of the fielder, which is to prevent runs. Averaging the clusters produces the following centers:

    PenaMauer2.jpg

    So the difference in the average hit locations between a great pull hitter and a great opposite-field hitter comes out to around 30 feet.

    The most interesting and informative chart is probably the one that splits batters by handedness.

    RHBLHB.jpg

    On average, corner outfielders have to move 15-20 feet depending on the handedness of the batter. This is the result of pulled balls traveling farther than opposite-field balls. The center fielder only moves five feet in general. Grouping by pitcher handedness didn't produce any visibly different results.

    Now, I'll look at some of the most extreme differences in cluster centers. While Pena and Mauer have an extreme difference in the rate of balls they put in play to each field, their clusters were in close proximity as compared to Scott Podsednik and Ray Durham, whose centers were 50-100 feet apart.

    PodsednikDurham.jpg

    As for right-handed batters, Derek Jeter is the only player who hits a higher rate of balls in the air to the opposite field than Joe Mauer. Jeter leaves the right fielder responsible for over half of his fly balls, and he forces the right fielder to play closer to the line than any other right-handed batter. I'll compare him to Jesus Flores.

    JeterFlores.jpg

    All of the previous charts have dealt with fly ball angle, but fly ball distance is just as important in outfield positioning. The first pair I noticed was Cody Ross and Gregor Blanco

    RossBlanco.jpg

    Here, we see some of the unreliability in either the GameDay location data or the pixels-to-feet. Cody Ross has power, and power to center, but something is off. He doesn't routinely hit 400-foot flies that stay in the ballpark. Oh, well.

    Looking at pull power to left, there's a more realistic difference between Chris Iannetta and Ryan Roberts.

    IanettaRoberts.jpg

    And the obvious choice for the final coupling is Luis Gonzalez and Paul Bako.

    GonzalezBako.jpg

    The only player for whom my clustering algorithm spat out something funky was Clete Thomas. His spray chart is unusual in that he appears to have decent power to left-center, but not so much to right-center, which creates a distinct region in left-center where no fielder would ever play, and leaves a neighboring vacancy where the center fielder is traditionally positioned.

    CleteThomas.jpg

    Touching BasesApril 05, 2010
    Finally Joining the Old Guard
    By Jeremy Greenhouse

    Bill Simmons is one of my favorite writers on the planet. An inspiration. The highlight of my 11th-grade Physics class was being pulled from class by a friend who told me that I had made Simmons’ mailbag. Reading his latest piece on Friday made me smile. Profusely. But I thought it would be funny for a Sabermetrician to write the exact opposite type of piece.

    Question: Who’s going to have the biggest decline in baseball this year -- Ben Zobrist, Joel Pineiro, J.A. Happ, or Joe Mauer?

    Answer: None of the above. The answer is me.

    See, I’ve loved writing about baseball these past two years, developing stats too complicated for the common fan’s liking. Did I respect the work of ESPN, Murray Chass, Dan Shaughnessy, Mike Lupica, and everyone else in that community? Of course not. I just hated the ignorance of it, the concept that opinion could trump data. If whimsies always prevailed, what was the point of analysis? I longed for the future when I could say things like, “Brett Gardner has had the 11th highest WAR rate among outfielders in the last two years. Calling him a fourth outfielder is batshit crazy.” And there wouldn’t be some dude calling WFAN and WEPN saying, “Well, I think…”

    Look at that last sentence again.

    Fundamentally, it’s fundamental. I just admitted I longed to be objective with my analysis.

    My first favorite player was Scott Brosius, New York's flappable third baseman. I don’t know why. I was a third baseman as a kid and I guess I just thought he was a really good defender. Fun to watch. Don’t really care if he stood the test of time, although he does rate well by WOWY. I just enjoyed watching him barehand bunts and hit World Series home runs off BYK like any True Yankee. Hence, my attitude for the past few years could be summed up like this:

    “Who cares if your favorite player/team sucks? I’m just presenting the data, no need to take offense. Shouldn’t change how you enjoy watching the game.”

    Things shifted this winter when a guy told me that I live in my mother’s basement. Instinctively, I understood that I don’t live in my mother’s basement. I live in a dorm room on campus. Why would somebody tell me that I live in my mother’s basement when he has no bearing for that remark? Why would you try to purposely offend me, when you don’t even know me? Why are you so angry? Calm down, bro. Have a beer.

    Baseball friends I trusted kept telling me, “Think of it existentially. The mother’s basement is a metaphor for “the past,” and the guy was really talking about himself. So in actuality, he was saying that he lives in the past and feels that you’re encroaching upon his territory. He’s getting older and more out of touch, and he’s uncomfortable with change.”

    I wanted to believe it. Cautiously, nervously, I started researching where my tuition and room and board were coming from, begrudgingly coming to one conclusion: I do “live in my mother’s basement.” My mother’s basement is a painfully unoriginal insult disguised as a cliché. I am my parents’ genes with arms and legs. I am dependent on my parents. Does this paragraph make sense? No. Ignoring logic…that’s the trick. And the nonsense indicated that my sensibilities were wrong.

    Little did I know, the ball was rolling for me. I spent March making myriad friends and clearing my acne and losing my virginity and GTLing, and not speaking with a nasally voice for mostly unselfish reasons (The world is a better place when I’m socially active.), but also because I realized that the only way to avoid insults from the old guard is to conform. I even understand why mainstream guys take it so personally whenever a stat junky spouts out an informed baseball study. It’s too hard to be a Sabermetrician these days. Takes a lot more time than you might think.

    Without further ado, I am leaving the world of Sabermetrics. Getting out of my mother’s basement makes life more fun. At least for me.

    Touching BasesApril 03, 2010
    Stakeholders - Minnesota Twins
    By Jeremy Greenhouse

    From now through the beginning of the regular season, we will not be posting in-depth round-tables previewing each division like we have in years past. Instead we will feature brief back-and-forths with "stakeholders" from all 30 teams. A collection of bloggers, analysts, mainstream writers and senior front office personnel will join us to discuss a specific team's hopes for 2010. Some will be in-depth, some light, some analytical, some less so but they should all be fun to read and we are thrilled about the lineup of guests we have teed up. Today it's Aaron Gleeman on the Minnesota Twins.

    Jeremy Greenhouse: If you were the Twins new stats guy, what would be your first order of business?

    Aaron Gleeman: Order lunch. I never crunch numbers on an empty stomach. After that, I'd push to set up a meeting with the decision-makers to present some of the concepts and stats I'd be using, because the analysis means nothing if the front office doesn't understand or value the underlying concepts and based on their statements so far they don't yet.

    JG: Bill Smith seems to be guided more by faith than science. So is Locke his best "Lost" comp?

    AG: Well, he can't be Hurley any more because he dropped something like 50 pounds, so Locke might be the best comp. Right now I suspect the new stat guy's best "Lost" comp is probably Artz or maybe even the pilot who got yanked out of the plane in the first episode. Also, if there's a Kate comp working in the Twins' front office my head may explode.

    JG: How many wins does the loss of Joe Nathan cost the team, and how would you handle the Twins bullpen?

    AG: My best guess is that Joe Nathan's injury costs the Twins three or four wins. I'd like to see them try a true "closer-by-committee" because they have 4-5 capable right-handers and Jose Mijares is death on lefties, but despite Ron Gardenhire using that phrase to describe his ninth-inning plans I think he'll settle on one guy for the job within a few weeks.

    JG: Twins starters don't strikeout many guys. The defense rated poorly in terms of UZR last year. Something's gotta give. Do you think the defense turns it around this year, or would the staff be better served with starters who strike out more than 4.5 batters per nine (Nick Blackburn)?

    AG: It'll be an interesting experiment, for sure, because the Twins' pitching staffs have long been fly-ball heavy with great control and mediocre strikeout rates, yet their outfield defense has the potential to be pretty bad if past numbers prove accurate and their infield defense has the potential to be very good with J.J. Hardy and Orlando Hudson up the middle and Nick Punto getting most of the starts at third base. Beyond that no one knows how the new ballpark will play and they're switching from turf to grass. I think the key will be whether Denard Span's scouting reports or small-sample size UZRs end up telling the story about his ability in center field.

    JG: Delmon Young. Positive or negative WAR?

    AG: Positive, but not by a ton. Delmon Young lost a bunch of weight this offseason, he's still pretty young, and everyone takes any positive thing he does as a sign that it's all coming together finally, but I'm definitely not a believer. He swings at everything, his bat speed is often sluggish, and he's yet to show any of the supposed power potential Twins fans have been hearing about for years now. He's also a horrible, clumsy defender, so he'd need to really have a strong year at the plate to post a solid WAR.

    JG: Joe Mauer. That's not a question. That's a statement of fact.

    AG: Preach.

    JG: What are you hearing about Target Field in terms of aesthetics and how it will play?

    AG: Everyone seems to love it, which I think is a combination of the Twins doing a really nice job putting the place together and the fact that Minnesotans have been watching baseball in a warehouse for a couple decades. It seems very tough to predict how new ballparks will play, but I suspect it'll be more hitter-friendly than the Metrodome was in recent years. I'm just hoping it's not too extreme in either direction.

    JG: Have you ever considered calling yourself Aaron Gleeman III to gain credibility with Twins fans?

    AG: I don't have the je ne sais quoi to pull that off like LaVelle E. Neal III (or LEN3, if you're nasty). I'd probably go with "Trey" in that scenario, although Hillman has kind of ruined that for all the III's out there.

    JG: The Twins are the best, most talented team in the division to be sure. So what are you most nervous about heading into the season?

    AG: I think the impact of losing Nathan has generally been overstated, but the bullpen is definitely in flux right now and whether or not the closer role is overvalued he's still one hell of a reliever. I'm probably most nervous about that, along with Justin Morneau's health. But at the end of the day I think you're right that they have the most talented team in what figures to once again be a pretty weak division.

    Aaron Gleeman is the Senior Baseball Editor at Rotoworld and owner of aarongleeman.com. He was the co-founder and main operator of The Hardball Times before leaving to write for NBC Sports, where he writes the Baseball Daily Dose column for Rotoworld, and he, along with Craig Calcaterra, D.J. Short, and Drew Silva write the constantly updated HardballTalk blog.

    Touching BasesApril 01, 2010
    Whose Stuff Plays Up?
    By Jeremy Greenhouse

    Relievers hold several advantages over starters. For one, relievers don't have to worry about pacing themselves. Moreover, they never have to face the same batter twice in one outing. So Steve Treder has determined that throughout history, "reliever ERAs have been consistently better, almost always by a factor of between 5% and 10%" To prove that the difference in ERA is, in fact, a difference in difficulty rather than skill level, you need to find pitchers who have both started and relieved, and compare their performance in each role. Tangotiger has come up with a rule of thumb to quantify what you'd expect if you were to convert a starter to a reliever. "Basically, use the “rule of 17”: difference in BABIP is 17 points higher as starter. K/PA is 17% higher as reliever. And HR per contacted PA is 17% higher as starter. Walk rate is FLAT."

    But every pitcher is different. You'll hear every now and then that somebody has a "bullpen mentality." And some are more suited for the bullpen because their stuff "plays up." So I went into my PITCHf/x database and pulled out the pitch-by-pitch data for all 118 pitchers who had thrown at least 100 fastballs as both a starter and a reliever from 2007-2009.

    FBV.jpg

    85% of the variance in a pitcher's fastball velocity when he switches roles can be explained by his previous fastball velocity. In general, pitchers add about 0.7 miles per hour to their fastball by making the switch from starter to reliever. But there are exceptions. Hong-Chih Kuo is a true outlier. His fastball has been 3.4 MPH faster in the pen. It's possible that Kuo has built up arm strength since he quit starting a couple years ago. But maybe he was simply more suited for the pen, and the Dodgers found the right position for him. Conversely, Felipe Paulino has pitched to better results as a starter, which could be attributable to his unusual ability to throw harder in that role. As a starter, he's managed to break the 95-MPH threshold with his fastball, which makes him a breakout candidate for 2010, especially considering his career 6.40 ERA vs. 4.23 xFIP.

    How about changes in pitching styles? In the bullpen, a pitcher can survive with only two pitches, while starters need to keep extra pitches in their back pocket for the third and fourth times through the order.

    FBP.jpg

    Pitchers throw 3% more fastballs in relief, but there are wide swings depending on the pitcher. I'm still including only pitchers with at least 100 fastballs thrown in both roles, and it takes much longer for fastball rate to stabilize than fastball velocity, so that explains some of the variance. I think that some pitchers throw more breaking balls in the bullpen because they've pick up a platoon advantage. This certainly applies to Julian Tavarez, who has used his breaking balls more often than his fastballs since entering a relief role and becoming something of a ROOGY.

    Finally, a Google Motion Chart containing number of pitches, StuffRV/100, fastball percentage, and fastball velocity for the 118 pitchers in my data set.

    Touching BasesMarch 25, 2010
    Most Impvoved PITCHf/x Pitches of 2009
    By Jeremy Greenhouse

    At Fangraphs, you can find the most valuable pitches in baseball. FanGraphs uses Baseball Info Solutions data and assigns pitches a run value based on the results of each pitch. Tim Lincecum's changeup comes out on top. A couple weeks ago, I tried my hand at finding the best pitches of 2009 by using PITCHf/x data and assigning each pitch a run value based on the pitch's physical characteristics. I didn't grant a winner, but gun to my head,* I'd have to say Matt Thornton's four-seamer or Zack Greinke's slider. As I learned in 8th-grade tee ball, no award series is complete without handing out trophies for the most improved. (Thanks again Coach Hover!)

    *Actually, gun to my head, I'd have to say, "Please stop holding a gun to my head." I can't imagine anyone would be willing to use lethal force to obtain my opinion on this matter.

    A quick-and-dirty, yet effective, way to tell whether or not a pitcher has improved in a given year is to simply find the difference in his fastball velocity from the previous year. Check out Dave Allen's take from this time last year. In 2009, Homer Bailey, Carl Pavano, Barry Zito, Justin Verlander, and Jon Lester all exceeded expectations. Their success could be tracked back to substantial increases in fastball velocity. I'm hoping that by looking at differences in "fxRV," which incorporates measures of velocity as well as movement and location, I will be able to find some pitchers who improved their fastballs by sacrificing velocity in favor of movement or command. Here are the top five most improved fastballs of 2009. The velocity delta is represented in terms of miles per hour and the fxRV delta in units of run value per 100 pitches.

    Pitcher Pitch Type Velo Delta fxRV Delta
    Mark Lowe F4 1.66 -0.87
    Wandy Rodriguez F2 0.87 -0.69
    Joel Pineiro F2 -0.65 -0.60
    Scott Feldman FC 0.91 -0.55
    David Aardsma F4 -0.57 -0.50

    Mark Lowe's fastball jumped from Jon Garland to Jonathan Broxton quality. Velocity was evidently the trick for Lowe, who upped his pre-2009 four-seam velocity from 94.6 MPH to 96.2 MPH. Wandy Rodriguez also greatly benefited from a boost in velo, but at the same time, he managed to add sink to his two-seamer. That's a tough task to pull off. Scott Feldman's cutter was one of the most valuable pitches in baseball last year, and there's good reason why. He broke the 90-MPH threshold with the cutter while generating an extra inch of horizontal movement. He threw it about twice as often in 2009 as he did in 2008. It wasn't the best cutter in the game—we know who that belongs to—but it was easily the most improved.

    And then there's Joel Pineiro and David Aardsma. I'm not sure what I can possibly add to the discussion concerning Joel Pineiro and his sinker. I love that the numbers back up the excessive number of stories. Pineiro traded velocity for movement and command, and it made his sinker a better pitch that yielded better results. Pineiro's fastball was thrown 71% of the time last year as compared to sub-60% in years past, and its effectiveness went from 20 runs below average to 20 runs above average. I think that PITCHf/x data can be an aid to coaches in that the data can show what pitchers might want to focus on in terms of release point, velocity, movement, or location. I think Dave Duncan might inherently possess this knowledge. There's an adage that sinkerballers with tired arms throw heavier and better sinkers. PITCHf/x data can determine if the adage holds water.

    Aardsma threw the highest rate of fastballs in the league last year at 87%, and he did so because he traded in velocity for overall quality. And like Pineiro's sinker, Aardsma's impressive four-seamer was well chronicled. Geoff Baker doesn't miss a beat.

    The other key was Wetteland, pitching coach Rick Adair and manager Don Wakamatsu convincing Aardsma he didn't have to blow hitters away by overthrowing. They told him his fastball could still get hitters out if he took a little off it in order to hit his targets more consistently.

    "If your strength is a fastball, then your strength is a fastball," Wetteland said. "Just work on where you can control it to all spots of the zone. Get the control down to the point where you can do what you do best..."

    The key to me was, if I was already beating them with my fastball, don't try to throw it any harder," Aardsma said. "Whenever I get in trouble, it isn't because I'm throwing fastballs, it's because I'm getting behind in the count with them.

    In addition, Dave Allen found reason for Aardsma's four-seam improvement.

    At the other end, Rich Hill's four-seamer was the antithesis to Lowe's. Pre-2009, both pitchers' fastballs were mediocre. Lowe's became one of the best in baseball whereas Hill's became possibly the worst.

    As for the most improved breaking balls...

    Pitcher Pitch Type Velo Delta fxRV Delta
    Ubaldo Jimenez SL 0.24 -0.66
    Erik Bedard CB 0.23 -0.62
    Justin Verlander CB -0.28 -0.62
    Barry Zito CB 2.42 -0.59
    Zach Duke CB 0.25 -0.57

    Ubaldo Jimenez found his slider last year, and he didn't shy away from it. In 2008, Ubaldo ran his fastball at 94.9 miles per hour. Even though no starting pitcher threw harder than his 96.1 MPH in 2009, Ubaldo actually dropped his fastball usage to 62.7% in 2009 against 69.8% in 2008. That's because his slider was his most improved pitch. I'm having trouble pinpointing exactly what Jimenez changed, but I think it was just a matter of throwing more strikes. Justin Verlander's curve was an entirely different animal last year. Same velocity, but twice as much movement. Erik Bedard's curve has always been really, really good. It was possibly the most unhittable pitch in baseball last year, though.

    Meanwhile, Cole Hamels' curveball regressed so badly last year that he might want to rethink the pitch. He didn't throw it for strikes, he didn't get any swings, he didn't get any whiffs. I'm not sure what he was trying to accomplish with the curve last year, but he didn't get it done. Buster Olney reports that Hamels is indeed working on his curve.

    I'm skeptical that the fxRV system adds any value to measuring the effectiveness of changeups and other off-speed pitches, since they're mainly built on deception and sequencing. Anyway, as compared to past years, Justin Verlander's change had better fading action, and Ryan Dempster's splitter had better bottom.

    Touching BasesMarch 18, 2010
    Kevin Jepsen: Sleeper
    By Jeremy Greenhouse

    He's got Brian Wilson's fastball/slider combo plus A.J. Burnett's curve. Kevin Jepsen's stuff is that good.

    I first noticed Jepsen when he topped my "Stuff" leaderboard back in September. He had only thrown 330 pitches on the year at that point, so I didn't make much of it, but the numbers ranked him right up there with Wilson.

    He then burst upon my radar in the ALCS last year when his stuff blew away a couple Yankees as well as Carson Cistulli and myself. In 2002, Francisco Rodriguez was the Halo rookie who made waves in the playoffs. In 2008, Jose Arredondo captured some of that K-Rod magic. Now I'm not saying Jepsen will have the subsequent success of K-Rod or the sophomore slide of Arredondo. But I'm thinking he's closer to the former than the latter.

    Jepsen had allowed 5.4 walks per nine innings before being called up to the Majors in 2008. Since then, he's proven that he can harness his electric stuff in 63 regular season innings. His career MLB BB/9 is 3.3, better than both Wilson's and Burnett's. His strikeout rate has been somewhat lower than expected, though at nearly eight Ks per nine, it's nothing to sneeze at. Kept the ball on the ground? Check. Career 55% ground ball rate. So what's with that glaring 4.86 ERA that's holding him back from being widely regarded as a potential breakout candidate in 2010? A .360 BABIP and 61.9% strand rate. Gotta love it when bad-luck indicators line up like that. Jepsen's career 2.86 FIP is a full two runs lower than his ERA. In the last two years, Damaso Marte's ERA-FIP of 1.32 is the next closest to Jepsen's among relievers with at least 60 innings pitched.

    PECOTA and ZiPS project Jepsen for an earned run average well north of five. CHONE is more bullish, projecting an ERA of 4.14. Still, every projection system forecasts major regression in 2010 from last year, which is fair, considering he has outperformed in MLB compared to his Minor League numbers. Why should you believe that Jepsen can continue to outdo his pre-2008 track record?

    On the PITCHf/x front, The Orange County Register's Sam Miller's got you covered. The whole article is worth a read, but allow me to quote heavily from it.

    Here’s what changed:

    The first photo shows Jepsen’s pitch movement (from the catcher’s perspective) in April and June. The second one is his pitch movement from July 1 on.

    In the first photo, Jepsen basically has two pitches — a fastball that usually bores in on righties, and a curveball with downward movement. In the second photo, allllll that space in between them is filled with the green pitch. It’s labeled a slider, and it behaves like a hard slider — 89 mph, with movement away from the right-hander — but is perhaps best called a cutter. The LA Times quoted the Angels as saying it’s a cutter that Mike Butcher taught Jepsen in early July. Later in the season, Jepsen seems to refer to it as his slider. Cutters and sliders aren’t that different, and the label isn’t as important as how well it worked.

    When hitters swung at it from Aug. 1 on, they got nothing but air 55 percent of the time. Zack Greinke’s slider might be the best in the game, and he got whiffs on 42 percent of swings. Frankie Rodriguez gets whiffs on 28 percent. Billy Wagner, 37 percent. Mariano Rivera’s cutter: 25 percent. It was, for Jepsen, a massively good pitch. (Here’s a not-very-good example of it.)

    There's not really much to add to that. Miller concludes that Jepsen "now projects as a possible future closer. Maybe by the end of this year." I'm inclined to agree. Brian Fuentes wavered down the stretch last season, which cast a seed of doubt in manager Mike Scioscia's mind. Pre-All-Star break, Fuentes added 1.4 WPA, but from the midsummer classic on, he lost -0.5 WPA.

    "Both guys have been an important part of the back end of the bullpen," Scioscia told Brittany Ghiroli in mid-September. "But if there are some matches that could be advantageous [to use Jepsen], we will try to take advantage of [them]."

    Fuentes had the lowest fastball velocity of his career since he inherited the Closer role. His 19.7% whiff rate fell well short of his 26.4% career average. He also threw only 47.7% of his pitches in the strike zone compared to a 51.95% career rate. While Jepsen's FIP has fallen short of his ERA, Fuentes pitched to better results than his peripherals would suggest. His tentative hold on the ninth inning job is slipping. If you're playing fantasy baseball, I doubt you'd even need to draft Kevin Jepsen to own him. But be ready to scoop him up off the waiver wire, because I have a feeling that once the season starts and he gets another chance to show everybody his stuff, he's going to pick up helium.

    Touching BasesMarch 04, 2010
    Best PITCHf/x Pitches of 2009
    By Jeremy Greenhouse
    The PITCHf/x system uses two cameras to track pitches between pitcher and batter, determining the coordinates of the ball x(t), y(t), z(t) at times t in 1/60-sec intervals. The resulting trajectory is a nine-parameter (or 9P) fit corresponding to constant acceleration in each of the three coordinates. The 9P fit is an approximate solution to the exact equations of motion. All quantities reported in the PITCHf/x data base, such as the pitch speed, the location of the pitch as it crosses the plate, the break (or pfx) of the pitch, etc., are derived from the fitted trajectory rather than from the original data. -- Alan Nathan

    Velocity, movement, location, release point are age old-terms in the baseball lexicon that have been quantified thanks to pitchf/x. Chris Moore in August published a groundbreaking study ranking the best fastballs in baseball using factors given by pitchf/x including velocity, horizontal location, vertical location, horizontal movement, and vertical movement. I will try my hand at a similar analysis. The goal is to measure a pitch's quality using only the inputs provided by pitchf/x. I've decided to use the same five parameters as Moore, also opting against adjusting for release point, and instead simply excluding all pitchers I classified as sidearm. I've tried to control for count and handedness as well. I'm calling the metric fxRV, as its units are in terms of run value.

    Top Five Fastballs

    Player Type Pitches Usage rv100 fxRV100 Velocity
    Matt Thornton F4 857 75.04% -1.42 -1.27 95.74
    Lance Cormier FC 617 51.25% -1.68 -1.11 87.46
    Cliff Lee FC 587 15.31% -0.38 -0.96 85.85
    Justin Verlander F4 2220 57.05% -1.29 -0.81 95.99
    Jason Motte F4 634 68.32% 0.04 -0.77 96.17

    Matt Thornton has top five stuff of any reliever in baseball and Justin Verlander has top five stuff of any starter. That type of velocity from a respective lefty and starter is unparalleled. Clayton Kershawas a left-handed starter will be entering that territory soon with his 94-MPH fastball. Verlander elevates his fastball more than just about anyone in the game with the exception of Kevin Millwood. According to FanGraphs, Lance Cormier has increased his cutter percentage each of the last four years to the point that he is now throwing it over half of the time. And looking at his pitch type values, he might want to entirely scrap his four-seam fastball, since it has never been an above average pitch while his cutter was fantastic last year. I'm puzzled by Motte's poor run value on his fastball. He's too good to fail as a reliever. Patience, TLR.

    My numbers say that Danys Baez' fastball is in line for some regression this year, despite successful results. At the other end of the spectrum, Baez' teammate Chris Tillman has a quality fastball, even though it was ten runs below average last year. And Barry Zito's fastball is aggressively bad.

    Top Five Breaking Balls

    Player Type Pitches Usage rv100 fxRV100 Velocity
    Erik Bedard CB 438 32.57% -1.50 -1.85 77.67
    Zack Greinke SL 765 22.12% -2.90 -1.57 85.63
    Gio Gonzalez CB 515 28.61% -1.41 -1.48 78.68
    Bronson Arroyo CB1 596 17.85% -2.00 -1.47 75.00
    Daniel Bard CB 221 25.46% -2.12 -1.46 83.93

    Erik Bedard* and Gio Gonzalez both have big yakkers. Watching these guys on TV is fun, since a sweeping curveball from a left-handed pitcher as viewed from the off-center center field camera appears to be heading right for a left-handed batter's skull only to break over the inside part of the plate, hopefully as the batter's knee buckles: the old Barry Zito phenomenon. Joe Posnanski has called Zack Greinke's slider "devastating," "the best in the American League", and "his "God-given gift." It's a good pitch. Bronson Arroyo is to pitch classification systems as Bronson Arroyo's name is to Tim McCarver's brain. Nevertheless, his curveball(s?) are good pitches.

    Kevin Jepsen didn't qualify for the leaderboard, but his curveball is superb. It gets similar movement to Bedard's curve, but comes in six miles per hour faster, albeit from the right side. Jepsen gets his curve down in the zone very well, too. He also throws a 96 MPH fastball and 90 MPH slider. I'm very, very high on Kevin Jepsen. Jonathan Broxton's four-seam fastball and slider were both within a spot of the top five. Daniel Cabrera? Yeah, he's bad.

    *Ironically**, there's also a Canadian speed skater named Eric Bedard. If short track were regularly televised, I swear I would watch.

    **I find it ironic that I don't know what irony means.

    Top Five Off-Speed Pitches

    Player Type Pitches Usage rv100 fxRV100 Velocity
    Burke Badenhop CH 142 12.40% -1.88 -1.09 82.19
    Bronson Arroyo CH 518 15.51% -1.98 -0.92 79.29
    Jered Weaver CH 594 16.56% -1.36 -0.84 80.21
    Brandon League SF 181 16.54% -2.32 -0.80 85.22
    Sean O'Sullivan CH 145 16.08% -3.56 -0.78 76.13

    The four pitchers besides Brandon League are all on this list because they can command their off-speed pitches. Nothing in my system accounts for the deception of a change. League's splitter, however, was labeled by Matthew Carruth as the toughest pitch in the league to hit because of its 35% whiff rate. Burke Badenhop does a terrific job of getting his changeup down and away from opposite-handed hitters, and his pitch has a lot of "sink." Jered Weaver and Sean O'Sullivan generate a lot of "rise" on their changeups, though that's not necessarily a good thing, since Clayton Kershaw gets the second most rise on his change in the league, but it's a highly crude pitch. He can't locate it either.

    Interestingly, Jonathan Papelbon had one of the worst splitters in baseball last year. He rarely threw it in the strike zone. I was happy to see that Daniel Ray Herrera's screwball was listed as a quality off-speed pitch. The world needs more screwballs.

    Touching BasesMarch 02, 2010
    Stakeholders - Pittsburgh Pirates
    By Jeremy Greenhouse

    From now through the beginning of the regular season, we will not be posting in-depth round-tables previewing each division like we have in years past. Instead we will feature brief back-and-forths with "stakeholders" from all 30 teams. A collection of bloggers, analysts, mainstream writers and senior front office personnel will join us to discuss a specific team's hopes for 2010. Some will be in-depth, some light, some analytical, some less so but they should all be fun to read and we are thrilled about the lineup of guests we have teed up. Today it's Joe P. Sheehan on the Pittsburgh Pirates.

    Jeremy Greenhouse: As an alumnus of Baseball Analysts, is it difficult dealing with the constant presence of fans and media?

    Joe Sheehan: No, but I get confused with the other Joe Sheehan a lot.

    JG: Can you describe Neal Huntington's style as general manager, and if you'd like, you can also compare him to a character from "The Wire."

    JS: I've never really seen "The Wire."

    JG: I recommend it.

    JS: I really don’t have anything to compare him to. He’s been very open to different ideas. I don’t work directly with him, though. It appears he listens to the different sides of an argument whether it’s what (director, baseball systems development) Dan Fox has to say or a scout. It seems as if he’s not wedded to one side or the other. I don’t want to over-state what I do, as I only have a slightly closer perspective than an outsider. I don’t want to make it sound like I know what Neal’s doing. It appears he’s doing what we would expect—using all forms of information he can get to make informed decisions. Some work out, some aren’t 100%, but that's the nature of decisions. It's very comforting to know that the process appears to be sound.

    JG: Turning to baseball, I'm most interested in a Pirates' outfield that has a lot of potential. Can you talk about your expectations for all of them?

    JS: Andrew McCutchen is great. Watching him last year come up from the minors without missing a beat to replace a lot of the production we were getting from Nate McLouth was exciting. He handles himself really well. His style defensively is fun to watch. He hit a couple triples that when watching the game, it’s like, "Oh my God. He hit another gear going second to third." Garrett Jones, I don’t want to say came out of nowhere, since we liked him as a minor league free agent, but I don’t think anybody expected him to do what he did this year at the start of last year. Even though he was old for a rookie, he has a shot of building on what he did last year. As for Lastings Milledge, for a long time Milledge was known with Cole Hamels for their facts, but he's coming along. I’m not really that connected with the player development side, but everything you hear since we've acquired him, the work he's put in, everything was positive. He's still on the younger side. While he hasn’t had the tremendous success at the Majors that he has at AAA, we hope that he can continue some of that minor league success going forward. Ryan Church is solid, and he'll find some at bats. And our Rule 5 pick John Raynor is going to contribute, and we've got Brian Myrow banging on the door at AAA too, depending on whether he plays first base or the outfield.

    JG: I assume you still work with pitchf/x data, so what minor league pitcher do you most look forward to pitchf/xing?

    JS: This year, probably Brad Lincoln because he’s the closest out of our minor leaguers. Rudy Owens is another interesting guy, but in terms of guys who are close, I’d probably say Lincoln. Rudy was in A-Ball this year, so he's further away. In the future, I'm looking forward to seeing all the high school pitchers we drafted last year.

    JG: I was doing some pitchf/x work of my own and I noticed that Ryan Doumit can’t layoff pitches below his knees. He probably already knows he doesn't have the best plate discipline, but if you find something like that, will you approach the player or how does the team go about doing that? What's that process like?

    JS: I haven’t interacted with any players. It’s a little tough to go to a player with very specific instructions, because it's almost like you don’t want to make them over-think things. If you tell any player that a pitcher is throwing 55% fastballs, 40% something else, 5% something else, then the right play is to wait for the fastball. But if you tell that to the player, and he doesn't get fastballs for two at bats, then he’s not going to trust you anymore. Over a huge timeline you would be right, and you’d come out ahead, but if for two at bats he’s listening to you and you get bad luck, you lose some trust and he'll think you don’t know what you’re talking about.

    JG: So do you filter information through the coaching staff?

    JS: That’s primarily where the interaction will take place. Dan or an advanced scout, there's something they might see, and they might communicate it to (pitching coach) Joe Kerrigan or (batting coach) Don Long. You can tell the coaching staff different stuff you can't tell players because if you're overwhelming the player, it's slowing their at bat down, and they're missing pitches. So you can talk to the coaching staff in more detail. I would think that that’s more the way the process happens.

    JG: What are your and the Pirates' goals for this season?

    JS: It’s to improve. It's to be better than we were last year. That's the goal for every team every year. I don’t know if there’s a number you want to say if you don’t win "x" games, you fail, and if you win "x" games you succeed. We want to get better. We want to improve our depth at the minor league level and get better at the major league level. I want to get better at my job. Everyone wants to get better at their jobs. If we do that for a good stretch of the season, the talent, wins and results will come.

    Joe P. Sheehan is the Baseball Operations Data Analyst for the Pittsburgh Pirates. Before that, he wrote the Command Post column for Baseball Analysts.

    Touching BasesFebruary 28, 2010
    Stakeholders - New York Mets
    By Jeremy Greenhouse

    From now through the beginning of the regular season, we will not be posting in-depth round-tables previewing each division like we have in years past. Instead we will feature brief back-and-forths with "stakeholders" from all 30 teams. A collection of bloggers, analysts, mainstream writers and senior front office personnel will join us to discuss a specific team's hopes for 2010. Some will be in-depth, some light, some analytical, some less so but they should all be fun to read and we are thrilled about the lineup of guests we have teed up. Today it's Pat Andriola on the New York Mets.

    Pat was one of the first people to introduce me to sabermetrics. I returned the favor by introducing him to "The Wire", which he had finished the night before our interview. We used that as a jumping off point.

    Jeremy Greenhouse: If Omar Minaya were a character from "The Wire," who would he be?

    Pat Andriola: I need a minute to think about this...You know who I think it is, it’s Pryzbylewski. Prezbo is clearly a guy, like Omar as a GM, who is thrown into a certain situation. Prezbo was in the police department where everything lines up for him to be there, but maybe it’s not the best situation for him. Like Prezbo was better off at school, maybe Minaya should be on the sidelines as a scout—head of scouting—because he gets a deer in the headlights look as GM. He makes some silly signings, like Prezbo shoots a cop accidentally. I think that’s it. That’s my on the spot answer.

    JG: Nice one. I like that. Let’s talk about the core a little. Or you can just rant on Francesca.

    PA: I wrote an article a couple years back on MetsGeek about the core. Right now, Wright, Reyes, Beltran, and Santana I would say is the core.

    JG: Is Bay in that core?

    PA: Right, I mean what is the core? It means nothing. It’s such a silly term. It’s basically a group of really good players. Like a lot of teams have a core of really good players. The Phillies have a core of really good players. The Yankees have a core of really good players. The question is: can you surround this bunch of really good players with other good players to be competitive? I think Wright is going to have a really good year this year. I think Reyes is going to have a nice year. Santana, we’ll see about the surgery. We’ll see about Bay and how he handles left field in Citi. I think they’ll all be fine. I’m not really worried about them. There are bigger question marks than the core.

    JG: So what are your thoughts on Citi Field so far? How do you think Wright and Bay handle it this year?

    PA: Aesthetically, I love Citi Field. And I think it does work well for the Mets. It’s very simplistic, but it really does help Reyes to have more room in the outfield to spray the ball and get triples. I mean he didn’t have enough time to take full advantage of it and understand the park and play to the park. If you saw Angel Pagan, Pagan had a bunch of triples last year. And for Pagan to be able to hit liners into the gap and get to third base, that’s the least Reyes could do.

    JG: What has to happen for the Mets to make the playoffs?

    PA: For the Mets to make the playoffs, I think it comes down to the rotation. Basically, you have Johan at the front. I think he’ll be fine. I think Pelfrey will have a better year than he did last year. I’m a huge Pelfrey fan. So basically it comes down to Perez, Maine, and Niese or whoever else they put in the fifth spot. I’m overly optimistic about the rotation. I’m not about the lineup. But I feel like Perez is going to have a good year. People forget he had some pretty good years 2-3 years ago. I think Maine's fine as a fourth starter. Niese I’m a huge fan of. He’s coming back from a really, really tough injury—the guy literally collapsed on the mound—so it’s tough. Even if it doesn’t work out, they got some good backup options. I wrote an article on The Hardball Times a couple weeks ago about how much I like Nelson Figueroa. I think he can step in if necessary. And if the Mets are competitive at the deadline, they have the prospects to trade for a starting pitcher.

    But will the offense produce? Obviously there are so many question marks. Other than David Wright, who’s something of a question mark in himself, there’s no guarantee. We don’t know how Bay’s going to adjust to Citi Field and the NL. We don’t know about Beltran. We know about Francoeur, but that’s a different story. Murphy and Tatis at first, Castillo at second, Reyes coming back, the catcher is now Barajas, Thole, Santos, Chris Coste, everyone else you want to throw in there. The offense has so many question marks. It's clearly possible, they have enough talent, the question will be when they play out the season, how’s the talent going to come together?

    JG: How many WAR would you say for that first base platoon?

    PA: Assuming for just the guys on the Mets right now, basically just Tatis and Murphy, it all depends on how Murphy does defensively. I think Murphy will put up one WAR. I think Tatis will put up—I say two WAR combined. I think they both put up one. That’s basically because I think Murphy will be pretty good defensively this year.

    JG: How good defensively? I mean considering the positional adjustment. Do you think he’s a league average hitter?

    PA: Oh yeah, he’s definitely a league average hitter. I’m not a big Murphy fan personally. I don’t think he’s good enough to play first base every day. I definitely think he’s good enough to hit .270/.335/.4-whatever.

    JG: I know you're an atheist, but how do you explain the existence of Jenrry Mejia?

    PA: If you’re going to say that it’s God, it has to be that God hates the Dominican Republic to the point where he makes it so destitute that the only option young kids can turn to is baseball, and that’s why Mejia is so good. So maybe that’s the only God point rather than God created his right arm.

    I love Mejia, I’ve talked about him forever. I’m really worried the Mets are going to put him in the bullpen to start the season. I hope that doesn’t happen. I hope they put him back in Binghamton next year. His peripherals in Binghamton were really solid last year. I hope he continues to prosper there and move up the ranks. I don’t want to see him get thrown in. He has that look of a set-up guy or closer that people can think "Oh, this is one of those late-inning guys, a K-Rod because of that electric arm." And they can forget that he can actually be a very good starter if they leave him in the minors for long enough.

    JG: Where would you rank Fernando Martinez in the top 100?

    PA: You saw what I wrote on THT. I got a little heat for that. Project prospect, which I think is the premier web site for prospect analytics right now, they put him 10. I would actually be less bullish than that. I would probably put him at 20 right now. So I did my rankings for the Mets, I put F-Mart first. He’s proven so much at such a young age, I don’t buy into the ceiling argument for Mejia just yet because I think F-Mart’s ceiling is just as high if not higher. So I would put F-Mart 20, and I need to see more from Mejia than just the one year. I know the scouts drool over him. I drool over him. But I would still put him around 40-45ish.

    Pat Andriola is a junior at Tufts University who writes for The Hardball Times. He just finished an economics internship in Major League Baseball's Labor Relations Department. He can be followed on Twitter @tuftspat.

    Touching BasesFebruary 25, 2010
    Shot Location Efficiency
    By Jeremy Greenhouse

    A couple weeks ago, I wrote an article using data from basketballgeek showing shot location visualizations. The logical next step from visualizing the data is to use it for more analytical purposes. So I set about to build a model to predict points based on shot location.

    Here is the expected field goal percentage based on shot location. The data set runs from 2006-2007 to this year's All-Star Break and contains over 600,000 shots.

    nbafg.jpg

    That is the starting point for my model. I take the expected field goal percentage for a given spot on the floor, and multiply it by either two or three, depending on whether the shot is an attempted two pointer or three pointer.

    Another part of my model is offensive rebounding rate. From the field goal percentage chart, you can see that some three point locations are as high percentage shots as some two point locations, yet the value of a three pointer is inherently higher. Offensive rebounding rate on three pointers as compared to long two pointers is another reason that mid-range jumpshots are inefficient plays.

    nbaor.jpg

    The value of an offensive rebound is contested in the basketball analytics community, as I recently learned. I understand why player evaluations based on linear weights don't work at all in basketball, but I'm not sure why they wouldn't work on the team level. Why can't we say that the average value of an offensive rebound is roughly equal to the average value of adding another possession. If somebody can enlighten me on if and why this assumption is faulty, I would appreciate it. Regardless, the average possession yields something like 1.05 points, so for each shot location, I multiplied the expected missed field goal percentage by the expected offensive rebounding percentage and again multiplied that by 1.05.

    Then, I found the shooting foul rate based on shot location. This was a challenge, since the play by play files don't chart foul locations. I therefore used three resources to try to predict shooting foul locations. Ryan Parker collected data that tracks the locations of nearly every event over ten games, including 200 or so shooting fouls, which definitely helped. 82Games has charted shooting fouls, though the data isn't very granular, and they don't mention the magnitude of the study. Lastly, I found the shot locations of all made baskets where there was an and1. Here's what I came up with.

    nbasf.jpg

    I think the above graph reasonable. It's too smooth, since I think there is probably a steep breaking point where players stop taking mainly jump shots and start playing with their backs to the basket. Jump shots are much less likely to draw fouls than post-ups, however my model can't capture that since I use smoothing techniques. The play-by-play data does include shot type information, so if I had a do-over, I would do some testing based on jumpers vs. other shot types. Anyway, what I do with my shooting foul model is multiply the rate of missed shots at a given location by the shooting foul percentage at that location, and then multiply that by either 2 or 3, and again by either 0.76 or 0.81, depending on whether the respective shot was a 2 or a 3, which represent the number of free throws a player earns for a shooting foul on a missed shot and the made free throw rates on those shots. I also multiplied the rate of made shots by the expected And1 percentage, which is much lower than the shooting foul percentage.

    Put that all together, and here's my ultimate point expectancy model.

    nbaex.jpg

    The average is up around 1.25. That's about 0.2 points better than the average possession, since plays that don't result in shots either end up as personal fouls or turnovers, mainly turnovers, which net 0 points. I applied the model on five-man units as well as individual players.

    First, the top and bottom five five-man units in shot location efficiency, or expected points per shot. Ideally, some of the shooting, free throw, and rebounding percentage would be customized but I'm using league average rates for this entire study. Minimum 500 shots.

    Unit Shots Efficiency eFG%
    Dwight Howard Hedo Turkoglu Jameer Nelson Maurice Evans Rashard Lewis 768 1.34 60.81%
    Boris Diaw Gerald Wallace Nazr Mohammed Raymond Felton Stephen Jackson 629 1.31 51.67%
    Amare Stoudemire Leandro Barbosa Raja Bell Shawn Marion Steve Nash 598 1.31 55.52%
    Dwight Howard Hedo Turkoglu Jameer Nelson Keith Bogans Rashard Lewis 1480 1.30 55.07%
    Boris Diaw Emeka Okafor Gerald Wallace Raja Bell Raymond Felton 1147 1.30 53.10%
    Antonio McDyess Chauncey Billups Rasheed Wallace Richard Hamilton Tayshaun Prince 1974 1.20 50.48%
    Derrick Rose Joakim Noah John Salmons Luol Deng Taj Gibson 546 1.20 45.79%
    Kevin Garnett Mark Blount Mike James Ricky Davis Trenton Hassell 1243 1.20 51.05%
    Brandon Roy Joel Przybilla LaMarcus Aldridge Martell Webster Steve Blake 1036 1.19 50.43%
    Earl Watson Jeff Green Johan Petro Kevin Durant Nick Collison 517 1.19 45.16%

    I'm happy to see that the Eastern Conference Champion Magic are the top team on this list because I'd always assumed that their offense last year was extremely efficient. The Magic had two options on offense. Dwight Howard took shots at the rim, while Hedo Turkoglu and Rashard Lewis hoisted threes. That unit was also by far the best in effective field goal percentage in the league, so they were getting high percentage shots, making high percentage shots, and though I can't include their free throw rates or offensive rebounding rates since those would be pains to calculate, I'm sure that with Dwight Howard, the Magic were successful at getting to the line and grabbing rebounds. The Suns, of course, are one of the top five teams.The Bobcats, surprisingly, take highly efficient shots, but don't make many of them. On the other end, we already knew the Bulls run an inefficient offense, and I'm not surprised to see the Pistons do too. That Thunder offense last year must have been absolutely brutal.

    Now turning to defense, teams that force the least efficient shots.

    Unit Shots Efficiency eFG%
    Dikembe Mutombo Juwan Howard Rafer Alston Shane Battier Tracy McGrady 766 1.20 46.54%
    Dwight Howard Hedo Turkoglu Jameer Nelson Maurice Evans Rashard Lewis 845 1.21 48.58%
    Aaron Brooks Luis Scola Ron Artest Shane Battier Yao Ming 670 1.21 46.19%
    Bruce Bowen Fabricio Oberto Michael Finley Tim Duncan Tony Parker 858 1.21 48.95%
    Chuck Hayes Rafer Alston Shane Battier Tracy McGrady Yao Ming 1007 1.21 43.15%
    Emeka Okafor Gerald Wallace Jason Richardson Jeff McInnis Raymond Felton 766 1.29 52.87%
    Marc Gasol Mike Conley O.J. Mayo Rudy Gay Zach Randolph 1700 1.29 52.56%
    Jeff Green Kevin Durant Nenad Krstic Russell Westbrook Thabo Sefolosha 1691 1.29 49.91%
    C.J. Miles Deron Williams Mehmet Okur Paul Millsap Ronnie Brewer 768 1.29 54.75%
    Boris Diaw Gerald Wallace Nazr Mohammed Raymond Felton Stephen Jackson 631 1.29 55.23%

    It's no surprise that the Rockets force teams into low percentage shots, as they boast three of the top five five-man units. That defensive lineup containing Chuck Hayes, Shane Battier, and Yao must be impregnable. And what do you know, but the Magic offense that generated the most efficient shots also had the defense that allowed the second most inefficient shots. Interestingly, the Bobcats offense that ranked second in shot efficiency actually allowed the most expected points per shot on the other end of the floor. I don't think I've watched a Bobcat game this year, but I'd be interested to know what's going on with that unit. A couple surprises on the bottom five list. The Thunder have made noise throughout the league for their much-improved defense, yet it's not a matter of holding opponents to inefficient shots. Instead, their opponents have gotten quality shots off, but have not made them, which would point to an impressive ability to contest shots. Also, the Thunder might do a good job of defensive rebounding and not fouling, which wouldn't appear in the numbers I'm showing.

    The next table includes defensive stats for individual players, but still uses data based on the entire five-man opposition. I raised the minimum to 1,000 shots.

    Name Shots Efficiency eFG%
    Dikembe Mutombo 2826 1.21 43.77%
    David Harrison 1360 1.21 47.65%
    Shaquille O'Neal 8909 1.22 49.34%
    Yao Ming 8181 1.22 45.74%
    Jacque Vaughn 2878 1.22 45.95%
    Sam Young 1334 1.28 50.64%
    Salim Stoudamire 1815 1.28 48.51%
    Russell Westbrook 6876 1.29 49.45%
    Chris Douglas-Roberts 2590 1.29 51.51%
    Louis Williams 3580 1.29 49.50%

    I could've guessed that the top defenders at forcing low percentage shots would be centers, since preventing shots at the rim is the best way to force inefficient jump shots. Dikembe Mutombo, even at (insert whatever made-up hilarious age here), remained an astonishingly good defender. He forced opposing teams into inefficient shots, and no player held rivals to as low an effective field goal percentage as Deke. I'm not sure if any of the guys who show up on the bottom five have reputations as poor defenders. Basketballvalue exhibits poor defensive ratings for Russell Westbrook and Lous Williams and says that by adjusted +/- Sam Young has been a flat-out awful player in general this year, though the guy who runs basketballvalue is the stats guy for Sam Young's team, the Grizzlies.

    This table shows how a player's five-man unit performed while he was on the court.

    Name Shots Efficiency eFG%
    Steve Francis 1405 1.31 48.01%
    Eddy Curry 5605 1.30 49.02%
    Stephon Marbury 5168 1.30 48.13%
    Renaldo Balkman 3842 1.30 47.07%
    Donyell Marshall 2634 1.30 46.70%
    Antonio McDyess 9614 1.21 48.42%
    Cuttino Mobley 8225 1.21 46.33%
    Earl Barron 1592 1.21 44.91%
    Will Solomon 1127 1.21 49.56%
    Sam Cassell 4017 1.20 46.93%

    The top four players were all Knicks during this time frame, as were three of the next eight on the leaderboard. All this is telling us is that Stevie Franchise, Starbury, and Baby Shaq all excel at hanging and banging, and that Isiah is attracted to that type of player. Sam Cassell, on the other hand, can't get to the rim. So I decided to take out a player's own shots, and include only shots by a player's teammates while he was on the floor.

    Name Shots Efficiency eFG%
    Steve Francis 1161 1.32 49.61%
    Jameer Nelson 7475 1.31 52.49%
    Stephon Marbury 4137 1.31 48.65%
    Steve Nash 11406 1.30 56.11%
    D.J. Augustin 2986 1.30 48.44%
    Amir Johnson 3566 1.21 47.80%
    Joel Anthony 3196 1.21 47.70%
    Roko Ukic 1124 1.21 50.09%
    Joel Przybilla 7054 1.20 48.28%
    Erick Dampier 8368 1.20 49.46%

    At one end are players who spread the ball around and at the other end are players who inhibit floor spacing. Steve Nash's teammates had easily the highest effective field goal percentage, and oh by the way, Nash's own eFG% beats out that of his his teammates. Erick Dampier and Joel "Prezbo" Pryzbilla clog the paint like a hot fudge sundae clogs one's arteries.

    Touching BasesFebruary 18, 2010
    The Verducci Effect
    By Jeremy Greenhouse

    On Monday, Will Carroll noted that the Verducci Effect was being discussed on MLB Network. On Tuesday, Tom Verducci posted his ten young pitchers at risk of the Effect. Then to top it off, yesterday Josh Hermsmeyer unveiled a free player injury database. I've been meaning to research the Verducci Effect for some time, so this seemed like as good a time as any.

    The Verducci Effect, also known as the Year-After Effect, is defined by BP as "a negative forward indicator for pitcher workload," Specifically, pitchers under the age of 25 who have 30-inning increases year over year are at risk. David Gassko's research pointed to the opposite. With pitch by pitch data from FanGraphs and disabled list data from Rotobase, I attempt to expand on Gassko's preliminary analysis, although purely numerical research on injury prediction and pitch limits will never come close to showing conclusive results.

    I found 340 pitchers who pitched three consecutive years in MLB at ages 25 and under since 2002. 140 of them fit the Verducci Effect, while 200 did not. Here's the data.

    Verducci Group

      IP FBV K/9 BB/9 GB% BABIP WHIFF ZONE DL DAYS
    Year One 48.3 90.4 7.3 3.7 41.5% 0.313 20.6% 51.4% 18.71% 51.6
    Year Two 126.5 91.1 7.0 3.4 44.2% 0.302 19.7% 52.0% 20.14% 36.6
    Year Three 113.1 90.9 7.2 3.3 43.3% 0.305 19.6% 51.4% 29.50% 59.4
    Difference -13.3 -0.2 0.2 -0.1 -0.8% 0.003 -0.1% -0.6% 9.35% 22.8

    Non-Verducci Group

      IP FBV K/9 BB/9 GB% BABIP WHIFF ZONE DL DAYS
    Year One 80.5 91.3 7.0 3.6 42.1% 0.301 20.6% 52.0% 20.49% 48.9
    Year Two 65.7 91.3 7.1 3.6 42.2% 0.326 20.2% 51.4% 34.63% 69.5
    Year Three 85.2 91.5 7.3 3.5 41.9% 0.306 20.7% 51.0% 36.10% 57.6
    Difference 19.5 0.1 0.2 -0.1 -0.3% -0.019 0.5% -0.4% 1.46% -11.9

    The first point of interest is the decrease in innings pitched for those under the influence of the Verducci Effect. I should preface the rest of this analysis with a few popular credos: TINSTAAPP, regression to the mean, and small sample size. First, pitching is an inherently risky business. Dave Cameron recently wrote a great piece on how successful young pitchers often peak early. This problem is exacerbated by the nature of the Verducci Effect, which dictates that pitchers establish a career high in innings pitched. If you take any group of players who establish a career high in any category, chances are that they will regress to the mean the following year. Finally, my sample again only contains 140 Verducci pitchers. One can't draw important conclusions from a sample of that size. You've been given fair warning.

    In general, 25-and-under pitchers improve their peripherals in their third year. Their strikeout rate trends up while their walk rate trends down. Gassko found similar results. I'm not so interested in whether or not young pitchers improve; I'm looking to see where Verducci Effected pitchers differ from other pitchers.

    Therefore, the Difference row is the row of interest, as it represents the change from the innings-jump year to the Year After. There are four terms in the Difference row that report different positive/negative signs (besides innings pitched) between each group. BABIP, velocity, whiff rate, and days per DL trip. That Verducci Effected pitchers suffer worse luck based on BABIP and that their counterparts exhibit better fortune speaks to the infallibility of regressing to the mean. I'm not so interested in the contact rate of pitchers, but I decided to further explore the possible velocity and injury aspects of the Verducci Effect. So I turned to the statistical technique of regression analysis.

    First, I tried predicting fastball velocity using several separate variables for age, past velocity, and past workload. I've looked at the topic of velocity curves before. Velocity generally peaks during a pitcher's mid twenties. Here are the regression results, which I've broken down by variable type.

    Age Coefficient P-Value
    32 Up NA NA
    29-32 0.14 0.03
    26-29 0.30 0.00
    26 Down 0.55 0.00

    Younger pitchers have a .5 MPH advantage over older pitchers in velocity.

    Velocity Coefficient P-Value
    Year One 0.18 0.00
    Year Two 0.78 0.00

    Fastball velocity from the previous year has nearly five times as much predictive value as fastball velocity from two years ago.

    Workload Coefficient P-Value
    Year One Pitches (1000) 0.06 0.11
    Year Two Pitches (1000) -0.14 0.00
    Verducci Effect -0.30 0.02

    The previous year's workload helps predict velocity. Throwing a thousand pitches in a year coincides with a drop in velocity of more than a tenth of a mile per hour. This could represent the difference between starters and relievers, in that starters throw more pitches at a lower velocity than relievers. Also, pitchers who have undergone the Verducci Effect have thrown softer than non-Effected pitchers to the tune of 0.3 MPH.

    Next, I ran another linear regression to predict days spent on the disabled list in a pitcher's third consecutive year of pitching.

    DL History Coefficient P-Value
    Year One DL Trips 4.19 0.09
    Year Two DL Trips 5.90 0.01
    Year One DL Days -0.03 0.46
    Year Two DL Days 0.18 0.00

    First off, predicting future health is hard. While I was able to predict nearly 90% of a pitcher's fastball velocity without developing a very sophisticated model. The disabled list model explains only 6% of a pitcher's health. Nevertheless, injuries from the previous year are significant, as each trip to the DL tends to yield another several days on the DL the following year.

    Age Coefficient P-Value
    32 Up NA NA
    29-32 -1.31 0.60
    26-29 -3.72 0.13
    26 Down -0.16 0.96

    Age isn't a very strong predictor of future injuries. Pitchers on either extreme of the age spectrum are most at risk, but the results aren't significant. Verducci might've chosen a wise cutoff at age 25, as this table shows that there could well be a point at which pitchers grow less vulnerable.

    Workload Coefficient P-Value
    Year One Pitches (1000) 0.6 0.78
    Year Two Pitches (1000) 4.3 0.06
    Verducci Effect 0.63 0.90

    The Verducci Effect, like most everything else I tested, is not significant in predicting future injuries. Injuries are hard enough to predict as is, and there's certainly no straightforward rule of thumb. A high workload does coincide with a trip to the DL the following year, though the causative effect may be that pitchers who throw a lot of pitches have more opportunities to get injured, rather than the pitches placing more stress on their arms.

    Verducci identifies the likes of Felix Hernandez and Josh Johnson as pitchers at risk. Verducci Effect or not, those guys aren't going to replicate their spectacular seasons. But Verducci also points to lesser pitchers such as Homer Bailey and Joba Chamberlain, who failed to live up to their prodigious potential last year. Bailey's fastball velocity leaped up three MPH last year while Joba's velocity dipped by a similar amount. I say if they stay healthy, they both improve on their performance from last year, but chances are at least one of them hits the DL. The data show that workload and age help predict production, velocity, and injuries, but the jury's still out as to whether the Verducci Effect helps explain the nexus between injury and risk beyond what one would expect from young pitchers with taxing workloads.

    Touching BasesFebruary 11, 2010
    Shot Location Visualizations
    By Jeremy Greenhouse

    There's been an influx of publicly-available NBA data over the last few years. While there's no data with the detail of pitchf/x or databases with the sophistication of FanGraphs that analysts can get their hands on for basketball, there have been gradual improvements. My favorite type of basketball data to look at is shot location data, which is why I regularly visit HoopData. On Saturday, I came across the last few years of raw shot location data on BasketballGeek. I'm far from an expert in APBRmetrics, and I don't know whether the basketball blog-dome has its own Dave Allen, but I felt like it might be fun to produce some visualizations using this data. Eli Witus has previously charted this data in several ways, so I'm going to be reproducing some of his work. Click on images for a larger view.


    shotlocation.jpg

    Each point represents one square foot and the goal is located 5.25 feet from the baseline and 25 feet from the sideline.

    The most efficient shots are those at the rim or those from three. The least efficient are ten-foot jumpers it would seem. None of this data includes free throws or offensive rebounding, so the only inputs are missed shots, made two-point shots, and made three-point shots. Witus' chart on offensive rebounding suggests that mid-range jumpers, in addition to being low-percentage shots, yield the lowest rate of second-chance points.

    Something I find interesting in the shot location frequency chart is that there are equally-spaced patches along the three-point arc as well as the 17-foot arc where players like to shoot, which I call the corner, the wing, and the middle. I understand a lot of this has to do with floor spacing, and the corner three has such a high frequency since it is 1.75 feet closer to the basket than threes along the arc, nevertheless I feel like players are predisposed to wanting to take shots from normal angles (0, 45, 90 degrees). Maybe it's just me.

    I also made splits of the above graphs. Home vs. away. 1st, 2nd, 3rd, and 4th quarters. The first minute of a quarter vs. the final two minutes of the fourth or overtime.

    I chose to only include points where significant amount of shots have occurred, and therefore didn't need to use any smoothing. The charts are plenty smooth already. But I did smooth out and pretty up the chart I made for field goal percentage.

    FG%25.jpg

    I also thought it might be nice to break down this data on the team and player level. The first team I considered was of course everybody's favorite statistically-oriented team, the Houston Rockets. You may recall that, nearly a year ago to the day, Daryl Morey penned a self-aggrandizing self-profile in the New York Times titled "Moreyball."* In it, Morey wrote

    "The 3-point shot from the corner is the single most efficient shot in the N.B.A. One way the Rockets can tell if their opponents have taken to analyzing basketball in similar ways as they do is their attitude to the corner 3: the smart teams take a lot of them and seek to prevent their opponents from taking them."

    The Chicago Bulls are not what you would call one of the smart teams, if this statement is taken at face value. According to HoopData, The Bulls lead the league in long twos attempted, but are last in threes attempted. That makes no sense. I've plotted each point where the Rockets and Bulls have attempted at least ten shots since 2006 along with the points per shot.

    HouChiShotLocations.jpg

    You can see that the Bulls have a much fuller area where they shoot long twos—those shots from 15 feet out to the three-point line. The Rockets area outside the arc contains a higher number of points. Also, the Rockets paint area is green, representing 0.8-1.2 points per shot by the scale, while the Bulls paint area is blue, good for 0.6-1.0 points per shot by that scale.

    *I wouldn't be Daryl Morey first of all. I wouldn't write the story "Moreyball." I understand that when you write a profile, you want to be the hero. That is apparently what Morey has done. But it's not going to make him popular with the other GMs or the other people in basketball.

    Now I didn't actually read the piece, as why would I want to read a story about a computer that gives computer numbers? After all, how do you think we got Madoff? But if Morey is so smart, then why hasn’t he won a championship? Statistics don’t tell the whole story, especially with players like Shane Battier. I mean, if Morey thinks Shane Battier is so good, then how come he only scores six points a game? The Rockets have only made the playoffs because 75% of basketball is play from the center and Houston lucked out by drafting Yao Ming.

    Finally, I wanted to look at individual players. Since players have taken at most 5,000 shots or so over the last few years, I decided to smooth out their heat maps. I also added contour lines showing where players like to shoot. Here's a look at the consensus two best players in the game:

    LeBronKobeShotLocation.jpg

    They have similar shot location distributions. Both shoot from anywhere on the floor, but are especially drawn to the three point shot from either wing. Kobe also likes to step in from the right wing and pull up from the free throw line extended. LeBron takes a higher rate of shots at the rim.

    As for their success when shooting, Bryant would appear to trump James by color alone. Note that the color scales are different, but even so, Kobe has a better mid-range game than LeBron. LeBron has blue patches where he earns less than 0.6 points per shot, while Kobe has no points from reasonable shooting locations on the floor where he shoots that poorly. Thing is, there's that tiny little area right underneath the rim that accounts for over a third of James' shots, and he's the best player in the league when shooting from the restricted area. The color scale for LeBron extends up to 1.9 points, while it only goes up to 1.6 for Kobe, and those figures represent how effective each player is when shooting from spots in close proximity to the rim.

    I made these graphs for several other players I was interested in, which you can view by clicking on the player names. Dwyane Wade, Tim Duncan, Kevin Garnett, Kevin Durant, Chris Bosh, Carmelo Anthony, Dirk Nowitzki, Paul Pierce, Steve Nash, Rashard Lewis, and Joe Johnson.

    Touching BasesFebruary 11, 2010
    Shooters by Zones
    By Jeremy Greenhouse

    Last week, I looked at Hitters by Zones, and I'm going to use the same format this week. My sample includes all NBA regular season games since the 2006-2007 season up to Saturday. Data from BasketballGeek. First, a crude chart showing the percentage of shots in each zone and how players fare when shooting, indicated by color. I didn't include any data on free throws, so the only inputs are missed shots, made two-point shots, and made three-point shots.

    ShooterZones.jpg

    Shots at the rim yield the highest return, followed closely by three pointers, specifically the corner three. Mid-range jumpers are the worst.

    Getting right to the leaderboards, highlighting the top five and bottom five. There are sixteen of these this time, but I’m going to again leave the commentary short and I’ll leave a spreadsheet at the end. The listed leaderboards will be limited to players with at least 50 shots in a zone, but I'm including all players in my spreadsheet, and you might just want to skip straight to that.

    I'm defining the side of the floor as that side you would face if you were standing on a basketball court, so the left side of the chart provided is actually the right side of the floor.

    Right-Corner Threes

    Player Shots Points/Shot
    J.J. Redick 54 1.69
    Mike Miller 70 1.53
    Vince Carter 73 1.52
    Courtney Lee 69 1.43
    Anthony Parker 256 1.42
    Rasheed Wallace 64 0.70
    Baron Davis 64 0.70
    Earl Watson 56 0.70
    Shawn Marion 126 0.69