Archive for the ‘Statistical Analysis’ Category

300 Game Winners by Year

April 11, 2009

One of the baseball stories I get a little tired of, and one that is going to pop up again pretty soon, is the demise of the 300 game winner.  In a few weeks Randy Johnson will win his 300th (he’s at 295 as I write), and there will be a trove of media stories about how pitchers don’t win as much any more, they’re not as tough as the old days, and that we may never see a 300 game winner again, at least not in our lifetimes.

I’ve been doing quite a bit of research on wins and other stuff to do with pitching, and I’m going to start sharing some of it with you.  Here’s the first piece, and we’ll start with a chart:

300 game winners

From this, you can tell that I like ugly charts, and it’s probably too small to read.  Click it to go to Flickr and see it larger if you wish.

This chart is showing you two things:  the red dots are the year that a pitcher reached 300 wins (mostly one a year, in a few cases two reached in the same year), and the blue line is the number of active 300 game winners by year.

The blue line does need a little clarification:  it means when a 300 game winner was pitching, not when a guy who would win 300 was pitching.  For example, Greg Maddux won his 300th in 2004, so he counts from 2004-2008, the years he had 300 or more wins, but he doesn’t count from 1986-2003, the part of his career when he had fewer than 300.

What I am attempting to show here is scarcity.  There have only been 23 300 game winners in baseball history.  If you take the history of baseball as being from 1870-present, that’s about one every six years.  If you count from 1888 (the first year someone won 300), it’s still only once every five years.  But here’s something interesting:  they appear to have some clustering.  Take the last five years, for example, and you have three guys who made it.  From 1983-90, you’d expect about one and a half, but you actually have six!  The peaks of the blue lines show how clustered things seem to be.

Now, the point of all this was the meme that the 300 game winner is going the way of the dodo.  The point I am making is that we’ve gone through a recent history where there have been more 300 game winners than ever before, and although this is a historical blip, somehow writers are assuming it has been the norm, and they’re thinking that things are going wrong.  Fact is, there have been more 300 game winners in the last 25 years than in any 25 year period before.  The only comparable time was the late 1800s, when a group hit 300, in the days when baseball was a lot different than today.

To suggest that 300 game winners are dying because we are dropping down from a peak is like suggesting that home runs are disappearing because we’re coming off recent records.  We went from the 1930s to the 60s with just a single 300 game winner, and from the 30s to the 80s with just three.  At those times you might have had a reasonable argument that the 300 game winner had disappeared, but they came back with a bang.  Incidentally, this may in some ways devalue the 300 game winners of the 80s, because it suggests they were somehow lucky to be pitching in the time they were, where for whatever reason a group of guys made it to 300 together.

Next time I’ll show you how the 300 game winners got there.  If you thought this chart was ugly, you’ll think the next one was drawn by my four year old.

Back from the dead

October 12, 2008

This is horrendous. This is an indictment of Rangers management past and present. And of the future, at least for some time, but when will that happen? 2010? 2012? 2020? 2050?

Idling through some numbers the other day, I came up with this chart:

MLB Standings 2000-08

Team W L PCT
NYY 862 592 0.593
BOS 825 632 0.566
STL 822 635 0.564
OAK 815 641 0.560
ATL 806 650 0.554
ANA 803 655 0.551
CHW 778 681 0.533
MIN 776 682 0.532
SFG 767 688 0.527
LAD 767 691 0.526
HOU 758 699 0.520
PHI 757 700 0.520
SEA 752 706 0.516
CLE 751 707 0.515
NYM 745 711 0.512
ARI 735 723 0.504
TOR 730 727 0.501
FLA 724 732 0.497
CHC 724 733 0.497
SDP 694 765 0.476
TEX 689 769 0.473
COL 677 782 0.464
CIN 673 785 0.462
MIL 661 796 0.454
WSN 652 805 0.447
DET 643 814 0.441
BAL 634 822 0.435
PIT 619 837 0.425
TBD 610 845 0.419
KCR 607 851 0.416

Uhh, yeah. 21st out of 30. Isn’t that about where you’d expect the Rangers, or maybe a little high? After all, only one winning season in that time, but most of the time they’ve been just a bit below mediocre, in the 70-80 wins range. As I’ve said many times, not good enough to compete, not bad enough to tear it all down and start fresh. Instead we get, year after year, the same old blather about needing to sign just one or two more starters and we’ll be there. Uhh, no.

The Rangers have a shot at getting to 20th (five wins behind a terrible San Diego team), but not 19th (35 wins behind a couple of teams). They could also fall a couple of spots, if Colorado or Cincinnati come on. The truly amazing thing: even if the Rangers went 162-0, and the Yankees 0-162, the Yankees will still have a better record for the decade. And the Rangers are 63 games back of Seattle, the next worst team in the AL West.

Of the current crop of Rangers starting pitchers, who do you think should be there when the Rangers are trying to compete? Padilla maybe, but he only has one year left on his contract (I think). Millwood has flamed out. The others have all been terrible. Of the 15 pitchers who started a game for the Rangers this year, only 3 had an ERA+ over 100: Ponson, who got dumped because he was a cancer in the clubhouse, McCarthy, who had five starts in a injury-plagued year, and AJ Murray, who pitched 7 innings. Feldman, who a few folks laughingly said had a good year? ERA+ of 82. Matt Harrison? 79. At least he’s young. Feldman sucks, and will never be in a rotation that could go to the playoffs.

And that’s the point, isn’t it? Our expectations have become so low about Rangers pitchers, we’re reduced to thinking that Scott Feldman might be a solution.  Or Dustin Nippert, good grief! For the first time since 1988 (not including strike shortened 1995) the Rangers drew less than 2 million fans. Even I gave up on them, hardly bothering to watch, let alone post here, after the football season came on. When you’ve killed the fans so many times, and you’ve lied to them about how the pitching is going to be better, and you tell them that this was a good season (take out about three weeks in May and they’d fall back below 70 wins), and they just know it’s going to keep being bad, and you suck the life out of the city’s interest in baseball. Bringing in Nolan Ryan as the savior was kind of like putting lipstick on a pig.

Everyone will tell you that there’s a great crop of kids coming through from the minors, and in a couple of years we’ll be loaded. I say show me the money. Come back here in those couple of years and see how many of them have actually made it. Do you think we’ll get a whole rotation’s worth out of them? Here’s a thought for the day: if you could graduate just one decent starter a year from the minors, you could comfortably turn over your rotation every five years, and that might make you competitive. The guys at the top get sold off for something before they hit free agency, and you plug in someone at the bottom. I know what you’re thinking, this is not a novel idea, it’s what Oakland does. Yep. And look at them in the chart. Fourth place, and they’ll crow about their payroll being half of everyone else.

I honestly cannot tell you who will be in the Rangers rotation next year, let alone in four or five years. I know there’s a bunch of guys down in the minors who are winning (as Jamey tells us), but like I said above, the history of the Rangers is to ruin them or trade them before they get to Arlington. Give me a list of your top 20 pitching prospects in the Rangers system right now, and we’ll see if even two of them are in the rotation in 2012.

I really shouldn’t do this, either:

Millwood 9-10
Padilla 14-8
John Danks 12-9
Edinson Volquez 17-6
Armando Galarraga 13-7

Wow.  What if, huh?  Just by themselves they’re 65-40.  Galarraga had the fewest starts (28) of those three guys.  Millwood and Padilla each started 29, then you go to Feldman’s 25, then Harrison’s 15.  The Rangers don’t just give arms away, they destroy the ones they have.  I’ll go back again to Volquez’s quote about going to Cincinnati, where he was encouraged to pitch, not just throw.  Losing Mark Connor is the first step in resolving that, but there’s a lot more to go.

Sorry to be so negative.  Every time I’ve wanted to write for the last three months, all I can think of is negative things to say about the Rangers.  This is one of those years where you feel even more beat down at the end.  I can’t raise my hopes to look at the horizon, because I don’t see anything coming any time soon.  I think 2009 is going to be a real trough of a season in terms of fan interest.

Oh yeah, one last note:  Shame on you, Rangers, for not having Chris Davis shirts available, even at the end of the season.  He was one of the very few feel-good points of the season.

Life, Interrupted

July 1, 2008

Hi there! It’s been a while. Too long, in fact. I hope you missed me, or at least didn’t delete me from your feed reader. Actually, it’s been about six or seven weeks since I last wrote here. I’ve been meaning to, every day I’ve come up with a different idea or theme or something. I just haven’t gotten around to writing them down. And, once the days starting slipping into weeks, it just became easier and easier not to write. Makes Jamey Newberg look like a freaking firehose, doesn’t it?

Fact is, I got a new job in mid-May, and it’s been sucking up all my time and energy. When you go from a job you don’t really care about, to one that you’re passionately interested in, that’ll happen. Is it my dream job? No, probably not, since I don’t get to a) swing a bat, b) kick a ball, or c) dive into a swimming pool full of cash every day. But in terms of what I enjoy doing, which is messing around with software, it’s pretty high up there. It’s with a huge company that you may or may not have heard of, but who I’m not allowed to name since they have rules about blogging. And, although there’s no swimming pool of money involved, I can certainly take off my shoes and splash around in a puddle. Life, right now, is pretty darn good.

And you’d be forgiven for thinking that life is good for the Rangers. One of the things you haven’t missed lately is my dragging down the mood of the party in Arlington. There’s some perception going on that the Rangers are doing well. They are, if you consider that we’re now in July (by 16 minutes as I write) and they haven’t been mathematically eliminated yet. Let’s take a look at a chart:

AL West Race through June 29

You’re probably familiar with this kind of chart. It’s the number of games above or below .500 each team is. The Rangers have, of course, spent the last 40 games or so hanging right around .500. But compare their low point, 9 games below, to where Anaheim was at the time, 5 games up. That’s a 14 game difference. Now, today, the Rangers are one over, and Anaheim is 16 over – a 15 game difference. That’s right, for all that you think the Rangers have been playing well lately, they haven’t gained on the leaders at all. Sure, they haven’t gone into freefall like the Mariners (and like the Rangers usually do), but is there reason for hope?

Cool Standings says that the Rangers have about a 10% chance to make the playoffs right now. Is that reason for hope? The Hardball Times yesterday said that the Rangers are performing above their talent level. Is that reason for hope? In fact, they say they’re about a 73 win team, playing about 85 win baseball. I predicted a 70 to 75 win season at the start of the year, and I see nothing to change that. What’s sad is that they’re hitting the heights of mediocrity, and people are using that as a pointer to them being good, a poor substitute for the reality of being good.

My fear is that ownership is going to be deluded enough, or pressured by the media enough, to do something stupid in a few weeks. Bringing in another Carlos Lee comes to mind. At least let it be a pitcher this time, and yes, I’ve heard the name CC Sabathia bandied about. Boy oh boy, wouldn’t it be great to see him here? Just think of all the prospects we’d have to throw out to get him.

I would rather the GM do nothing than go get someone like CC, who will be out of here as soon as he can. We have a plan, stick to it, plan on competing in 2010 and just pretend that we’re in it right now, so you can keep Michael Young happy a little longer.

Okay, now to a few other things that have been rolling around in my head for a while:

The tv broadcast lost TAG for a few weeks, and I am very glad to see him back. No offense, but Victor Rojas at times didn’t sound like he knew what was going on with the Rangers this year. Josh Lewin would say something, and Victor would be like “huh?”, and Josh would have to explain. Of course, Josh normally has to explain his jokes, but these were some pretty obvious things (and I can’t think of an example right now).

I emailed the booth tonight to tell them that Josh’s story about Yankees getting pinstripes to make Babe Ruth look thinner is a myth (they first had pinstripes in 1912, he joined the team in 1920). Maybe they’ll mention it tomorrow. There’s been a few cases lately where they’ve irritated me on things enough that I’ve written them. There was another one yesterday, something about stats, that really annoyed me. Again I don’t remember what it was now, I just remember thinking that someone like Josh Lewin really ought to have at least basic knowledge of modern statistical analysis. I don’t necessarily mean the deep stuff that people like me enjoy, but even the simpler ones like OPS proving things. Another that annoyed me tonight was on the radio on the way home, they said someone in the NL was having a bad year because he is something like 2-8, with a 4.20 ERA. Surely by now people realize that a pitcher’s record has little to do with his performance, it’s what happens around him. After all, if there was a guy on the Rangers with a 4.20 ERA they’d think he was a superstar (Padilla has a 4.13, next best is Feldman’s 4.60 among regular starters).

I’m starting to lose patience with Salty. He can’t hit (82 OPS+) and he can’t field. Opponents are 28-5 stealing against him, he has 3 passed balls and 17 wild pitches. He doesn’t seem to be improving. That jackass Ron Washington said the other day he doesn’t care if his catchers hit, he wants them fielding and working with pitchers. First of all, I’ll take a guy who can hit over someone who can’t any day, the numbers prove that hitting is much more important. Second, Ron Washington would say that, because he’s a guy who couldn’t hit the side of a barn in his career, so of course he doesn’t care about numbers. And third, why not give Max Ramirez the same deal that Laird/Salty had, splitting time? Right now, as far as I’m concerned, when Laird comes back, Salty should go to the farm.

Aren’t we glad we didn’t trade Laird during the spring?

Remember how the Rangers lost patience with Jason Botts, because in all those cameos where he barely got to string two games together, he didn’t hit much? Salty is barely hitting better than Botts did, and he’s had about 50% more playing time than Botts so far in their careers. Of course, Botts didn’t have the Teixeira trade on his side, whereas they will keep putting Salty out there until they can justify the deal.

I love Chris Davis. My natural nickname for him is CD. I’d drink some for him, if I could get it here.  Washington said that Davis will go back to the minors when Blalock comes back, no matter what.  I hope Davis can hit about .500 with 20 home runs between now and then, just to make it more difficult for them.  Davis should be playing first every day, and Max Ramirez shouldn’t be there at all.

And Blalock, well, how insulting is it to the other players that he says he’s going to switch to first to help the team?  It was Catalanotto and Shelton at the time.  Thankfully Blalock can’t get healthy (I don’t mean to insinuate anything, but don’t they say getting injured like this all the time is a sign of steroid use?).  The Rangers began to play well when he got injured, and the longer he stays out, the harder it will be for him to mess things up, like sending CD back down.

Now see, it took me an hour to write all that.  That’s why I haven’t been able to do it much lately.  Not only will I be tired in the morning, but I didn’t get a chance to play any games tonight.  So, no promises when the next one will be, but I hope it won’t be another six weeks.

Correct somewhere between 0 and 100% of the time

May 8, 2008

Padilla is dealing. If my math is right, he went from 20 quality starts in 33 starts in 2006, to 7 in 23 in 2007, and now 5 out of 7 in 2008. He’s on a pace for 20 wins, and for his best ERA+ since 2002. Can he keep it up? As long as I don’t jinx him again he will.

Ponson is dealing. After his second start the media was all over the Sir Sidney thing again, which I thought was pretty silly, since he’d had two starts, one of which he gave up five runs (only one earned, but still) and the other of which he’d beaten Kansas City. Then yesterday he did it to Seattle. It’s a case of the more he does it, the more confident he will be, and maybe he can stick with it. You have to wait and see what adversity will do though, like with Padilla – he had a bad start but came back strong next time. Will Ponson do that too? Or is he a flash in the pan, the Sammy Sosa of 2008? Better yet, can he keep it going through July, so we can trade him to some desperate playoff bound team?

I was looking at Ponson’s Pitchf/x for his first couple of starts. I wanted to compare them to last year, to see if there was a difference from when he was going bad, but unfortunately he pitched so little none of his games were recorded by Pitchf/x. Shame.

Brandon Boggs is only here for a short time. Know how I know? Because they didn’t even bother to get him a batting helmet that fits. Come on, the guy has to push it out of his eyes after every pitch. After his 4-4 career start, he’s now 6-28, which means pretty much any day now they’re going to give up on him. How come on Opening Day we had about ten outfielders in line, and this guy is suddenly a starter? They have some odd priorities, I tell you.

What can I say about Jason Botts? In the last few days we saw that the Mariners brass have, well, brass ones, but the Rangers brass have none. The Mariners took the bull by the horns and released Brad Wilkerson, eating his contract, as they should have. The Rangers did the stupid thing, by moving out Botts. They could have gotten rid of Broussard, who hasn’t done anything. Instead down goes Botts and up comes Shelton, who has also done nothing. It’s so bad at first they’re trying Frankie Cat there again. Come on guys, basic sabermetrics says if you have two players who are similar (and I’m talking Botts and Broussard, although they’re not similar), keep the young one. Instead they bring in a never-will-be like Shelton. They should have dumped Broussard (who was a dumb pick-up in the first place) and given Botts the full-time job. Let him have half a season doing it every day, and prove whether he can hit in the big leagues or not. Tell Ron Washington to get stuffed when he tries not to play him.

Actually I don’t think the Rangers sent Botts down. I think Shelton ate him.

Last time I said to fire Ron Washington. I still say that. He’s become a little more animated since the Rangers have won a few games, almost lifelike lately. That’s a bad thing, because it kind of proves he was sinking into a deep morass, pretty much just waiting for the axe to fall. It’s still hanging over his head, just a little further away. Recent wins against the Royals (terrible team), and the A’s and Mariners (bad teams both, but not if you believe the media) aren’t really a good indicator of improvement, it’s just the pendulum swinging the other way.

I agree with the recent poster who said to fire Jon Daniels with him. The front office needs a clean sweep, from the owner on down. The owner especially, but where are you going to find a billionaire to buy the team? Mark Cuban wants the Cubs, or the Pirates. (Great Jay Leno joke the other day: “Miley Cyrus is now the richest child in the world. Except for Mark Cuban”). Not too many other rich folk around here that want to suffer the indignity of owning the Rangers.

Pleased to tell you I have a new job, starting next week, doing stuff I really enjoy doing. Can’t wait.

I’m starting to like Josh Hamilton, although it’s still early. Player of the month is good. He won’t win the MVP though, if A-Rod couldn’t because the team sucked, he won’t. Maybe in a few years, if we’re lucky. The way things are going though, he’ll be an MVP contender and Edinson Volquez will win a Cy Young. Is that a good trade? You’d probably say yes… but in the back of your mind will be that nagging feeling that the Rangers haven’t had a good starter since, ummm… (fill in the blank).

Surprised that Hit Tracker only gave the Hamilton home run into the restaurant area yesterday a distance of 422 feet. I would have said 450 easily, but they seem to think it was the wind.

Final thought is on the umpires (again). Here’s a chart:
Strike Zone 5-7-08

What this shows is the strike zone from today’s game, showing just the balls and called strikes (i.e. the pitches that the umpire was involved in calling). Josh Lewin kept going on today about how the umpires had released some kind of stats or study showing that they were right 95% of the time. Personally I automatically think that means they’re wrong one in twenty times, or given today’s 290 total pitches, they were wrong about fifteen times (but then, I’m a glass half empty kind of person).

The problem is that I don’t think they’re counting just the close decisions, I think they’re counting all of them. A pitch that’s two feet outside, and they call it a ball – does that count as a correct decision, or a duh decision? Or one that’s right down the middle of the strike zone?

Watching today’s game, with umpire Mark Wegner in charge, it was interesting to note that he threw out the Mariner’s manager for arguing balls and strikes, and had a lot of complaints from players. Do you think he said “but I’m right 95% of the time”?

Look at the chart above. The box is the strike zone I use as a default, over a group of players. One foot either side of home, and from 1.8 feet to 3.3 feet in height. It’s a rough analog for a major league strike zone, as shown by Pitchf/x studies. Red dots should be inside the box, blue dots outside. Look at the bottom left corner, you see several blue dots inside. Compare it to the red dots outside on the left and below, and you have to ask how those blue dots were called balls – they obviously were closer to the center of the plate than the red ones to the left, and had enough height compared to the red ones below.

When I count this chart, I get seven reds outside and eight blues inside – coincidentally for a total of fifteen “wrong” calls, exactly what we guessed at above. But in reality, that math is not correct. 15 over 290 is close enough to 5%. But they didn’t make the call on all the balls that were hit, or fouled, or swung at (except on appeals on swings going around), so you shouldn’t count all those. Counting just the balls and called strikes, there were 157 pitches, which means we’re closer to 10% (15 over 157).

But that also counts a lot of balls that were nowhere near the strike zone. Let’s narrow it down to a six inch area around the zone – in Pitchf/x terms, from -1.5 to 1.5 pfx_x, and 1.3 to 3.8 pfx_y. Six inches seems reasonably close, if you’re outside by that much in any direction, you’re hardly going to get much argument unless you really blow the call.

In that narrowed area around the strike zone, there were 111 pitches called ball or strike. We’re now at 13.5% of calls on balls and strikes that could have been considered bad tonight. That’s about one in every seven and a half pitches that they have to call. Granted, over the course of a game, it’s still only 15 pitches, about one every half-inning or so, but it’s still nowhere near the utopia those guys like to present.

I’ve said it too many times to repeat it, but I will. Baseball should use technology. Instant replay at the very least – and there are arguments against, mostly due to the outcome of a particular play, but those can be dealt with. It is more important to get the call right than to feed the egos of the men in black.

Finding Padilla

April 21, 2008

Vicente Padilla has been outstanding so far in 2008. This may be my attempt to jinx him, but he has been effective at the “bend, don’t break” strategy the Rangers have been working on (personally I think it should be “don’t bend or break”, because that leads to fewer collapses like the ones we’ve had the last few days in Boston). Today, I want to look at what he was doing last year and compare it to this year, using Pitchf/x data.

Through mid-June 2007 he was 3-8, 6.69, and went on the DL. He came back for eight starts in August and September and was 3-2, 3.86. When I was checking these numbers, it surprised me, I did not think he had pitched that well in the second half. Combining that with his early numbers in 2008, and it strongly lends itself to the idea that he was injured for the first half of 2007, which caused his bad numbers.

After four starts in 2007 he was 0-3 with a 6.00 ERA, and stinking up the place. His last four starts in 2007 were 2-1, 2.12. This year, with four starts down, he is currently sitting at 2-1 with a 3.12 ERA. I decided to use these three group of four starts to look at what he has changed in that time, to get clues as to his improvement. His pitch counts for the groups ranged from 284 to 335, so they are all in a similar quantity which should help us note differences visually.

Note that I wrote about the Rangers rotation back in June 07, just before he went on the DL, and those charts include the first half of last year’s data for him. The data I am working with now is the same, but utilizes several techniques that Pitchf/x researchers have developed more recently. In particular, the spin charts shown were first used (I think) by Alan Nathan and then Mike Fast, and it was a spreadsheet posted by Mike (created by Tangotiger, I believe) that contained the spin chart that I have adjusted for my use here. Much credit to all of them for their work on this.

Starting with the early 2007 chart:
Padilla early 2007 Spin

First, this polar plot maps the spin speed vs the spin angle. The angle is shown simply by it’s rotation around the circle. The speed is shown as distance from the center. In this case (and the others in this blog post) the center of the circle is 50mph, and it increases by 10mph at each step away from the center, until you get to the outer ring which is 100mph.

In my review last year, linked above, I said that Padilla was showing a fastball, curve, and slider, although the results were so mixed that it was hard to tell what was what. Reportedly he has a changeup as well, but I could not pick it out from this data. In fact, here you can see I have split the fastballs into 2 and 4 seam versions, that is something I was not able to do until working with the polar plot and some other tools.

The late 2007 chart shows some differences:
Padilla late 2007 Spin

Not only can you see much more distinction between the groups of pitches, you see the numbers have clearly changed. The groupings are much tighter than before, and the sliders (red) are clearly differentiated from the fastballs. The one thing I do have trouble with here is the curve, it appears that there are two distinct groups, suggesting one is a different pitch. None of the other charts I have show this difference, so I have left it all as a curve, but at two different speeds (the group closest to the center is around 60mph, the other is between 70 and 80 mph, which is kind of fast for a curve so is more likely to be a different pitch).

Let’s look at the early 2008 chart, then compare the three:
Padilla early 2008 Spin

Here you see a chart much more similar to the late 2007 than early 2007 one. If anything, the groupings are even more compressed this time around, and the distinction in the curve has pretty much disappeared. Of course, the curve has almost disappeared too.

So, let’s look at the differences:

Much tighter groupings of pitches as time goes on. The suggestion I would have for this would be mastery of his pitches – as he has learned to throw each one, he’s gotten better at it and is more consistent. This would be more believable if he hadn’t been pitching in the majors for ten years – if he was 22 or 23, I might find this a convincing argument.

Ratio of pitch types has changed considerably. The slider has stayed almost constant at 12-14& of pitches, the fastball (both kinds) has grown from 73% to 81%, and the curve has disappeared, from 13 and 14% last year to 5% in 2008. The ratio of 4-seam to 2-seam fastballs has stayed roughly 2-1 in favor of the 4-seam, although late last year he threw more 2-seamers than in the other time periods.

Fastball speed is down. In late 07 and early 08, his top speed was 96.5mph, second best was 95.8. He beat 95.8mph 31 times in early 07, peaking at 98.6. This drop of peak speed is about 2mph, which curiously is not reflected in the overall average for the fastballs, which stayed just about the same for all groups, in the 91-92 range. He had a much bigger spread for the fastballs, but the average remained the same. Again, this leans toward someone learning how to throw more consistently.

The slider was more focused around the 80mph mark, instead of being spread around in the mid 80s. Like the fastball, he got better when he threw more consistently.

If I give you this link, you can go back and check out his release points for the first half of 2007. It contains a very ugly image, but the gist is that he was releasing over a very wide area, well over a foot square, and on a slope from bottom-left to top-right.

Compare it with this one, for the first part of 2008:
Padilla early 2008 release points

You see here a much cleaner release area. About 3/4 of a foot wide by a little over a half a foot tall. It is also very circular in pattern, compared to the angled release last year.

My conclusion is simple: Padilla was hurt early last year, and after a couple of months on the DL in the middle of the year came back and pitched much better at the end, and much better to begin 2008. He is releasing the pitch in a smaller area, suggesting he is not trying to compensate for where it hurts to move his arm. He is producing much clearer patterns of pitches, suggesting he has more confidence in what he is throwing, and better ability to throw it how and where he wants.

All-in-all, the much-maligned (especially by me) Vicente Padilla is probably a poster child for pitching healthy. I remember a lot of talk about how he wanted to pitch even when he was hurt, because he felt that was what he needed to do to be part of the team. This study shows that when pitching hurt, he (and presumably other pitchers) are not as effective, to the extent that they are hurting their team as much as themselves. Repairing the damage is a better option than pitching through it. As I recall, there were a number of complaints from his teammates about how slow he was at pitching, which disrupted their own rhythms of being in the game. Could it simply have been a case of him having to step off the mound after each pitch for long enough that his arm would stop hurting enough so he could throw again?

I’m still not convinced about Padilla’s ability, because every time he pitches I’m still expecting the roof to fall in at any moment. It will take a while for me to lose that fear, if ever. But at least when things are going wrong for him these days, I have some hope that it’s explainable, that he can work through it, and not just because he’s injured. This use of Pitchf/x was very helpful to me to understand why something happened, and not just accepting my own opinion that Padilla sucks.

Coming soon, I’m going to give the same treatment to Millwood, who has shown a surprisingly similar pattern to Padilla from last year to this. It will be interesting to see if we can discern something that explains his struggles and improvement as well as we could for Padilla.

Pitch f/x Confidence levels

April 17, 2008

One of the things I’ve been promising for a while now is a look at the confidence levels in the Pitch f/x data this year. If you don’t already know, MLB added a pitch type to the data, and a confidence in that type. So, for example, you’ll see it say FA for Fastball, and 0.86 for the confidence.

I thought this was a pretty neat thing for them to have added, so I’ve been crunching some numbers on it. Others have looked at the pitch type, and not been greatly impressed with the results, and I’ve been working on the confidence intervals themselves.

While doing this work (and the reason it has been so delayed), I discovered, to my chagrin, that I shouldn’t concentrate so much on the Rangers pitchers (and their opponents), but rather I should be looking more league-wide when I do things like this. Sometimes it doesn’t matter, when, for example, you’re directly comparing two pitchers in a game. In that case, even if the values are off for some reason, they’re likely to be the same for both pitchers, in which case comparisons can still be made. But if you start to stretch things out, by, say, looking at all of Kevin Millwood’s starts this year, you suddenly discover how your analysis doesn’t scale.

You see, for the first part of this I had manually downloaded the xml files for the Rangers pitchers and their opponents (in the opening Seattle series) and then scattered others, principally Millwood’s starts. Once I expanded, I discovered one critical change: the scale of the confidence level. In the initial files I looked at, the confidence levels ranged from about .4 up to about .98 or so, which I took to mean that they were basing their confidence in the call of a pitch type on a scale of 0 to 1, 1 being completely confident they had it right, and lower numbers indicating how unsure they were.

When I looked at Millwood’s later starts, I suddenly discovered that he was getting confidence levels over 1, ranging up to almost 1.5. At first I thought it might be a mathematical problem, and worked on that a little, but then I realized that no, the values had pretty much been multiplied exactly 1.5 times compared to his initial start. That was oddly suspicious, so this past weekend, for the first time this season, I ran my automated process for downloading the files, and my program to install them into SQL Server. I then started crunching numbers.

Here’s a picture of a waterfall for you to enjoy:
Confidence Maximums

Okay, so really it is the confidence value from all games through 4/15. Sorting it by date and then by the home team, it kind of separates each game into a distinct (in some places) vertical line. If you can read the numbers at the bottom, it’s simply a number from my spreadsheet, sequential pitch number. Doesn’t mean anything other than a sort of representation of time.

What we see is, for the first half of the numbers (through about pitch 30,000, which happens to be around April 8, that the vast majority of games had a maximum confidence level at about 1. This maps perfectly with what I discovered in the first few files from the Rangers starts in Seattle. Oddly, though, there are about twenty or so games in that early part of the season where they spike up to 1.5. This also coincides with what I saw happening. There is a single odd game where it spikes to 2 (which happens to be Cleveland at Anaheim on 4-8), and then right around that 4-8 date all of a sudden every game goes to a 1.5 maximum.

So what happened? Take a look at this chart, which although colorful is not nearly as pretty as the other one:
Confidence Max by Team and Day

As clear as mud? This is a table showing the maximum confidence level for each game, based on the home team (horizontally) and date (vertically). The red shows days where the confidence was below 1, the yellow is around 1.5, and the sole green one is the 2. What does all this mean?

I believe that MLB changed their algorithm, and released it publically on 4/8 or 4/9. Everything after that date is using the 1.5 maximum, everything prior is using the 1.0. No it’s not, I hear you say, prior to 4/9 it is mixed! This is true, but why? Now, I believe that the yellow days on the left side are days when they were testing their new algorithm – and not necessarily live, but with old data. It is interesting to me that the three days in the middle where the data is all red are a weekend (Saturday-Monday, to be precise), which is exactly when a programmer may not be working on the changes. I think perhaps some of these games were processed live with the 1.5 maximum, but some of them they went back and ran on the data later. Why do I believe that? Because I had manually downloaded some of those games, and can compare what I got on the day of the game, to what I downloaded this past weekend, two weeks after the games were played. There would be no reason for them to go back and change those games – except if they were testing a new algorithm. Although why it would then go into a live directory instead of a test I don’t know.

So, what changes were made? Let’s take a look at the ones I manually downloaded and see what happens:

On 3-31 I got the xml files for Millwood and Bedard, who opened the season in Seattle. My manual file has a maximum around 1, but if you look really closely in the chart above you will see the file I downloaded later has a 1.5 maximum (it’s the only yellow one in the first column).

What changed for the two pitchers? With one exception, for both pitchers the pitch type stayed exactly the same, and the pitch confidence multiplied exactly by 1.5. That sole exception was the one FS pitch ( a splitter in their nomenclature) thrown by Millwood, where the pitch stayed as a FS and the confidence stayed the same.

Okay, so evidence their algorithm increased some or most pitches by 1.5 in confidence, not necessarily a big deal, right?

On 4-1 I got Felix Hernandez and Vicente Padilla’s files manually, and later automatically. They were identical.

On 4-2 I got Jason Jennings’ and Brian Burres’ files, and they showed radical changes. Many of the pitches simply showed the 1.5 increase. But – and this is the most important finding, I think – a number of the pitches (18, to be precise) changed their confidence by different values, and also changed the pitch type. CH, CU and FC pitches (changeups, curves and cut fastballs) changed to fastballs, sliders and splitters, and in no direct relationship (e.g. the changeups didn’t all change to fastballs).

I would keep going, but I think you get the point: in the first few weeks of the season, MLB changed the values not only of the confidence levels, but also the pitch types they were showing. This was probably in part influenced by articles about the lack of quality in the pitch types, such as the one I linked above, which suggested they were only getting about 70% right. I’m sure it was also influenced by their own continual desire to improve the data, and the results they are giving. For all I know, they may have been intending to do this, as they got more and more data.

I have not attempted to find which games Mike Fast analyzed in his Hardball Times article, but it is reasonable to assume that some of them might have been affected by this problem. With 18 out of 61 games in the first five days, that’s about 30%, and if Mike had downloaded the night of the game, he might not have seen these changes (it took me two weeks to discover them). Thus, it is possible that at least some of the pitches may have changed with the new algorithm. It would be interesting to see how many were affected. Given that the first Millwood game had a single pitch change, and the Jennings game had 18 pitches changed, there probably would not be a big change in Mike’s results, perhaps only a percentage point or two.

This has all been interesting for me because it underscores the real-time and transient nature of Pitch f/x. It is exciting to be looking at this stuff, it is like being on the frontier of a new science (or at least of a new way of measuring things). As time goes by it will only get better and better, and we can answer so many questions we have now. Recognizing these changes and riding along with them will be very interesting. Not as interesting as watching the Rangers win a 14 inning thriller after leaving 20 men on base, but interesting nonetheless. I hope to introduce more study of the pitch type and confidence in the near future.

Stacking up the Runs Created

December 11, 2007

One of the things I keep harping on about is the Rangers pitching, and how they need to upgrade there a lot more than they need to work on the hitting. In a recent post, I showed how the hitting has been fairly decent for the 2000s, but the pitching has dragged the team down. I calculated that the hitting has been winning them in the mid to high 80s in games each year, but the pitching has been winning them anywhere from the low 60s to 82 at best. Clearly the pitching is the trouble spot. Clearly I am not the first to say this, and I won’t be the last. But clearly management doesn’t seem to be listening. Any improvements they make, and almost all the talk they talk, has been about the hitting side of the equation. Okay, so the Rangers sank to their worst hitting level in the 2000s this year, with 83 hitting wins, but as noted above, their best pitching was only 82 wins (and in 2007 it was 75). Pitching is all that counts for the Rangers.

But, since they only care about hitters, I thought I’d look at hitters too. This study is team by team and position by position Runs Created in 2007. It uses the simplest version of Runs Created (Hits + Walks * Total Bases/At-Bats + Walks), because that is the data easily available and easily calculable. Thus it ignores things like baserunning, and the different variables in hitting, and also park factors, but it is still comparing apples to apples so it is relatively useful. What I am trying to look at is the Runs Created for each team at each position, and see how the Rangers compare at each position, and thus where they need to improve. Note that these are team totals, not for any specific player.

Okay, this post is going to contain my first try at tables, so let’s find out how they will look. We’ll start with catcher:

Team

Runs Created

NYY

119

CLE

110

MIN

85

SEA

81

DET

79

BOS

75

TEX

62

BAL

60

TOR

58

LAA

58

CHW

58

OAK

57

TB

55

KC

54

No big surprises there. The Yankees with Posada and the Indians with Martinez dominate. If you play fantasy baseball, you know that the rule is grab one of the top few catchers, or don’t bother because the rest are pretty much all the same. The surprise really is that the Rangers are right in the middle, despite the poor year with the bat that Laird had. The good news is that if the Rangers hand the reins over to Salty, as expected, they’ll probably improve by 10 runs based on this year’s numbers. On to first base:

Team

Runs Created

TB

138

BAL

98

MIN

98

OAK

95

BOS

95

CHW

94

CLE

93

DET

90

LAA

90

TEX

87

KC

81

NYY

81

TOR

77

SEA

70

Everyone knows the monster season that Carlos Pena had. This illustrates it very well, he was far and away the best first baseman in the AL in 2007. Once again, Texas was right in the middle, with just 11 runs separating them from second place. But this is an illusion. The Mark Teixeira factor was huge. Tex in fact had 58 of those runs, despite having less than half the at-bats. The rest of the first basemen for the Rangers were useless. Put it this way: if Tex had had all the at-bats, he would have scored 125 RC, putting him closer to Pena than to anyone else. If Tex had had none of the at-bats, the rest of the first basemen would have put up 58 RC, significantly worse than anyone else. Yes, that is how much of a difference Teixeira made. In 2008 they’re again going with a bunch of stiffs at first, and look to see them looking up at the rest of the league. This is one position where the Rangers need a huge upgrade before beginning to contend.

On to second base:

Team

Runs Created

DET

111

NYY

105

TB

103

BOS

102

BAL

101

TOR

95

LAA

92

KC

91

OAK

89

TEX

81

CHW

73

MIN

65

SEA

61

CLE

60

I don’t know why I had the impression that Ian Kinsler was so good, when second basemen for the Rangers were so poor. In truth, some of the dreck they put out there (Hairston, Desi Freaking Relaford) were so bad, they dragged things down. Not worried about this position, as Kinsler ages he is going to get better, and I could see him lifting himself by 10 RC next year, which would put him on the fringes of the top players at second.

Third base:

Team

Runs Created

NYY

150

BOS

119

SEA

102

TB

92

OAK

87

BAL

86

LAA

86

TOR

84

CLE

80

KC

76

TEX

69

DET

64

CHW

60

MIN

57

I like Hank. If only he could put it together for a full season, like we continue to think he could. Third was just like first, except it was occupied by bad players because of injury, not trade. Again, if Blalock had played the whole season there, he’d have put up 110 RC, putting him in the top three. If he hadn’t had a single at-bat, Vazquez and Metcalf would have gotten 48 RC, by far the worst in the league. At least in this case, Hank will be back at third so we don’t need someone to fill in. Oh yeah, there was some rich jerk who did pretty well for the Yankees at this position.

On to short, and Michael Young:

Team

Runs Created

NYY

111

BAL

102

TEX

99

DET

95

CLE

87

LAA

87

SEA

72

TB

71

BOS

64

OAK

63

MIN

63

CHW

60

KC

56

TOR

47

Best position on the team, but as I’ve stated before, he’s only going to get worse as he ages. Still, he has enough room to decline and still be useful to the team. He’ll probably be above average for the position for the next couple of years, which is hopefully the point by which he will have been traded, when his whining about contending reaches fever pitch.

Left was a sore spot:

Team

Runs Created

NYY

118

BOS

114

SEA

113

LAA

109

TB

108

OAK

96

TEX

89

TOR

84

CLE

77

CHW

75

DET

68

MIN

67

KC

64

BAL

58

But it wasn’t as sore as they seemed to think. Right in the middle of things. Given the number of players who trundled through, they were not as badly served as Ron Washington’s comments made out, when he seemed to dismiss all the outfielders at the end of the year, and suggest someone new be brought in. It’s probably going to be Murphy and Byrd out there in 2008 most of the time, can they continue their success? And for center:

Team

Runs Created

DET

128

SEA

111

CLE

109

MIN

105

TEX

94

TB

90

NYY

85

LAA

80

KC

79

BOS

79

TOR

78

BAL

78

OAK

77

CHW

62

In 2008 Milton Bradley will be getting the call here. Can he do a Lofton job for us? Once again, surprisingly high in the charts despite the mixing and matching of players, so looking for Bradley to provide a much bigger boost is not necessarily the best of tacks to make. Over in right, a position which contained the true MVP:

Team

Runs Created

DET

169

TOR

117

LAA

113

BAL

112

SEA

103

NYY

103

OAK

102

CHW

98

KC

95

BOS

93

MIN

88

TEX

87

CLE

85

TB

85

Gone are the days of Juan Gone. The guys that filled this spot were actually only bad compared to other right fielders. Put them in most other positions and we’d have been happy with their performances. Of course, they’re right fielders, so we’re not happy. Much room for improvement here. Maybe the better move would be for the Rangers to put Bradley in right, and let Murphy and Byrd occupy the other two slots in the outfield. That would probably give the best chance for all three spots to improve.

Finally, the DH:

Team

Runs Created

BOS

154

CHW

108

TOR

102

DET

99

CLE

98

OAK

96

TEX

96

SEA

85

NYY

84

MIN

75

BAL

75

KC

73

LAA

72

TB

70

Maybe Sosa was more effective than he appeared to be?  I still believe that a full-time Botts would produce as well or better than Sosa, and should be given a chance.

So, of all positions, first and right were the weakest points.  And right now, those are positions the Rangers haven’t done much with.  Oh, they got Shelton for first, but how much of Teixeira can he replace?  Especially if they are filling in with Catalanotto et al?  In right, nothing has happened yet.  The other positions, they will all shake themselves out in the end.  Some will get better, some will get worse, but they should be okay.  Like I said at the start, it’s not the hitting we have to worry about.  Anyway, there’s two years until it’s time to contend, plenty of time to fill the holes and improve the positions they need to.

Next time, I’ll see what I can do with pitching.  If anything.  The Rangers never have, so why should I be any better at it?

How to measure the relative effectiveness of hitting and pitching

October 27, 2007

I played with some numbers this week, came up with some fairly fancy stuff, then decided to throw it all away and simplify.  My conclusions weren’t unreasonable, I was just digging too far for what I was trying to prove.  After all, once you’ve decided that 2 + 2 = 4, you don’t really need to break it down to (1 + 1) + (1 + 1) = 4, do you?

What I was working on was seeing where the Rangers need to improve to get into contention.  A fairly easy answer, you’d think – pitching.  And you’d be right.  But there’s a little more to it than that, so here goes:

First of all, a little note about what I’m looking at.  I’m analyzing the American League from 2000-2007.  That means there are 112 teams, and 32 of them made the playoffs.  Any numbers I reference will apply to that dataset, unless I say otherwise.

Wins:  Overall, for the decade so far, the Rangers rank 10th in wins, with the obvious suspects (Detroit, Baltimore, Kansas and Tampa) behind them.  At 610, they’re a long way behind the leading Yankees with 773.  Sadly, 2nd, 4th and 5th spots are held by Oakland, Anaheim and Seattle, showing just how much they’ve left the Rangers behind.  The Rangers are in a bit of a gap right now, 34 behind Toronto and 41 ahead of Detroit.  It will take quite a bit for that to change in the next couple of years – could they finish 17 games ahead of TOR, twice in a row (doubtful), or 21 games behind DET twice in a row (actually, quite possible now, this year they were 13 back of DET, and you can only assume they’re going in opposite directions).  But as it stands, you’d probably be pretty safe putting your money on the Rangers finishing 10th for the decade, and that’s just about right given their performance.

Runs:  Given the win totals, you may be surprised to discover that the Rangers are third in runs scored (6783), behind the Yankees and Boston.  Obviously scoring hasn’t been a problem, although probably a lot of this is due to the Ballpark producing runs that other places don’t.  That, or that the Rangers have always been able to hit pretty well (more on that later).  The rivals in the division all scored in the middle of the pack.

Runs conceded:  Ahh, here we see why they’re 10th in wins.  Having given up 7073, they also finish third in allowing runs, not too far behind KC and Tampa for the worst pitching.  We also see why the division rivals did so well, OAK, LAA and SEA finishing first, second and fourth in runs allowed.

Run differential:  They’ve allowed 290 runs more than they scored, which puts them in 10th place.  Actually, the differential across the league almost exactly follows wins, validating Pythag (not that it needed it).  In fact, OAK and BOS swapped places, and CLE dropped two spots in their wins compared to differential, but otherwise everyone was where they were supposed to be.  The Yankees scored 1049 runs more than they allowed to lead.  The D-Rays allowed 1349 more that they scored (presumably not all to the Yankees).

Okay, those are some pretty simple numbers to look at.  In summary, the Rangers are 10th out of 14 teams, and they pretty much deserve to be there.  It’s all about the pitching, too.

Baltimore, Kansas, Tampa, Texas and Toronto were the five teams who never made the playoffs in that timeframe.  BAL, TOR and TB have an excuse, playing in the AL East where they have no chance of competing with the money men in Boston and NY.  KC and TEX?  Doormats for their divisions.

Looking at how teams made the playoffs, we can see some patterns.  The obvious ones are “score more runs” and “allow fewer runs”.

Ranking all 112 teams by runs scored, 16 of the top 21 teams made the playoffs.  The rest scattered throughout, but skewed towards more, with the 05 White Sox ranked 84th being the lowest scoring team to make it.  If I was to put a dividing line though, it would be at 26 out of 54, which is close enough to 50% of teams making it.  That run total is exactly 800 – meaning if you score 800 or more runs in a season, you’ve got about a 50% chance of making the playoffs.

The problem with that though is that the Rangers have scored over 800 every year, with a low of 816 this year.  Obviously the pitching is weighing them down more than we thought.  Either that or the Ballpark is really inflating those numbers.  Unluckiest Rangers?  2001, scored 890 to rank 15th.

So let’s turn to pitching.  Similar pattern:  19 out of the top 32 made it.  Lowest was again the White Sox, 2000 this time, who allowed 839 runs.  This time I’m going to put the break-point at 790, which is where 25 out of 52 teams got to the playoffs, or once again just about 50% of teams.

The Rangers had a year with 784 (2004) and a year with 794 (2006), and then you jump down to 844.  As you keep going down and down, you discover the bottom of the list, where the Rangers fill three of the four worst pitching spots (interrupted by KC at #2).  Worst was the 2000 team, with 974 runs allowed.  Yep, it’s the pitching.

Okay, so the hitting for the Rangers has been good, every year they’ve been above that break-even point for making the playoffs.  The pitching has been bad, just one year barely above the break-even, and most years way down in the dumps.  Can’t argue with the numbers:  it’s the pitching.

But it’s the differential that counts.  When I calculated the R-squared values for runs, runs allowed, and differential against wins, the differential easily is the most reliable predictor of wins.  Runs gives an R-squared of .487, and RA gives .513 – both reasonable values, showing they’re pretty useful predictors.  But differential gives an R-squared of .900, a highly significant result.  Look at differential and you can almost exactly predict the wins.

I then tried something a little interesting, and came up with a result that was a lot interesting.  I took the formula for the linear trendline for differential (y = 0.100x + 80.81, which pretty much matches the theory of 10 runs equaling one win by giving an 0.1 on the x value, and if you’re wondering why the last number is not 81, it’s because interleague play means there are not the same number of wins as losses), and plugged it in against both the runs scored and the runs allowed.  That should show where the wins are coming from.

Here’s an example, the 2004 Rangers.  They scored 860 runs.  Plug in the formula and that equals 88.48 wins as a result of batting.  They allowed 794, which results in 81.32 pitching wins.  The batting was good but the pitching was barely average.  How did they then win 89 games?  Well, I discovered that if I multiply the batting wins by the pitching wins, then divide by 81, I get 88.83, which rounds to 89.  This makes sense, because what I think I am saying is that the hitting won 88.48 against average teams, and the pitching won 81.32 against average teams, and if you combine them they get slightly better.  Anyway, by doing this math against all the teams, I found a highly accurate predictor of wins, in fact the two numbers (wins and this combined result) correlate at an R-squared of .894.  I think I can use this to show just how much a team’s hitting or pitching was worth.

What’s interesting is how the multiplying of the two sides (batting and pitching), then dividing by 81, pushes the result towards the actual wins.  You saw just above how the good hitting and average pitching combined for the Rangers in 2004.  If you get one of the numbers very close to 81, the wins will be very close to the other number.  If you push them both in either direction, they exponentially increase the wins.  Take the team with the best record, the 2001 Mariners who won 116 games.  By this method, their batting was worth 96 and their pitching 99, combining them we get 118 wins.  In other words, the sum of whole is greater than the sum of the parts.  Flip to the other end, the 2003 Tigers, hitting of 57 and pitching of 66 combined to 47 wins (compared to the actual 43).  With each side being bad, they push each other down even further.  If they’d only had average pitching, they would have won 57, but they didn’t, so they fell even further.  If I was naming it, I would call this rubber-banding, where the two sides are bouncing against each other, and if one stretches in one direction the other gets pulled that way too.

So what is the use of this tool?  Look at the Rangers from 2000-07.  They won between 71 and 80 games every year except one, when they won 89.  How did they do it.  Good hitting and bad pitching, right?  But how good was the hitting?  How bad was the pitching?  Would you say their 76 win average was due to 80 hitting wins and 72 pitching wins, or something similar?  And that exceptional year, was it a huge change to the system, or just random variation pushing them up for a year?  Let’s calculate:

The hitting side was remarkably consistent, running from a 91.9 in 2001 to 83.4 in 2007 (proof that the 07 team were the worst hitters?).  Just minor variation in the big scheme of things.  The pitching varied much more though.  In 2000 they scored a 61.5, while in 2006 it was 82.4.  Three years in the 60s, three in the 70s and two in the 80s.  Wild variation, and not even a consistent flow, as they jumped up and down like water on a hot griddle.  And what was the result?  Their actual wins were dampened by the hitting, so they floated in the 70s most of the time.  2004, the 89 win year, was simply a case of the hitting staying on the high side of where it was (the 88.4 was the third highest hitting score) and the pitching coming up to average (81.3, one of only two times it went above average).  There was no big breakthrough, there was simply pitching being barely adequate.
And where does this leave the Rangers?  With the knowledge, if they didn’t know it already, that their pitching sucks.  More precisely, a quantitative tool they can use to see just what they need to do to improve.  If they hold the hitting steady, just how much do they have to improve the pitchingto get to be contenders?  After all, by keeping their hitting at the same level in 2004, and just getting the pitching to average, they were in contention until the last week of the season.  Take another step, push those pitchers just slightly above average, and they could contend for some time.   Going back to the earlier stuff on run counts, they’re doing fine in the offensive numbers, but they need to get their pitching down – in 2007 they allowed 844 runs, and the break-even point is 790.  Where can they gain 54 runs in pitching?  Oh yeah, and even if they do that, it just puts them on the very edge of the competition, not deep into the playoff zone.

The Young strike zone, part 3

September 16, 2007

Michael Young now needs 17 hits in 14 team games to get 200 for the fifth straight year. I would say that he will almost certainly do it, barring some kind of injury which keeps him out of a few games. I hope I didn’t just jinx him. He’s averaging about one and a quarter hits per game that he plays, which would give him 18 over the 14 games left. Now, if he slips for a few, he might be in trouble, but I think he’s just as likely to have a four hit game. What’s odd is that he had an 0-4 today, and the ESPN report said it broke his career high 14 game hitting streak. I am very surprised he hasn’t had a hitting streak much higher than that, in fact I would have expected at least a 20. I don’t know what that means, if he is not so streaky, but more consistent, or what.

There’s a few interesting things to point out about his shot at 200 hits. First of all, only 20 players have ever had five or more seasons of 200 hits, and only six of them had five consecutive seasons. Five of the six consecutive are in the Hall of Fame (Wade Boggs, Charlie Gehringer, Willie Keeler, Chuck Klein, Al Simmons), the only exception being Ichiro, who will probably be headed there one day. That’s interesting company. Many of the rest of the 20 are also in the Hall, in fact so are many of the 15 who have four seasons of 200. Is Michael Young bound for the Hall of Fame? By the way, of active players, Jeter is also 17 hits from 200, which would be his 6th season of 200. Juan Pierre (177) and Vlad Guerrero (176) are both trying for their 5th seasons, but they might be a bit too far away. Ichiro already reached his 7th, the only player with 200 hits so far this year.  The consecutive season record is 8, by Wee Willie Keeler, then Boggs and Ichiro at 7.  The record for 200 hit seasons (not consecutive) is held by Pete Rose with 10, then Ty Cobb has 9.

A while back I reported on his strike zone, and showed that I thought that it didn’t seem to matter where in the strike zone a pitch was thrown, the rate of result was the same, except for the down and away pitch. I looked at several batters and saw the same pattern, and didn’t know what was going on. I argued that the percentage of fouls, of hits, of outs, and of swinging strikes was roughly the same across all zones for the batter. This did not make sense, because everyone knows that batters have hot and cold zones. I am still perplexed by these results. I haven’t been able to do any kind of breakdown that would show something different, based on all the pitches. If this is the case, then it seems that random pitches thrown in random locations would give the same result.

I finally found a hitting zone chart for Michael Young online. Problem is, I can’t match it, for the simple reason that I’m using Gameday data and it’s not complete for Michael Young, or anyone else for that matter. I tried tracking them for the last week or so, but even that was impossible, because I could not make my pitch data match their strike zone. The strike zone is presumably a personal preference, or at least a programmatic preference, because I was able to follow some of theirs where a hit was recorded in one zone but an out in the same spot recorded in another. But their chart is still interesting, in that it shows his down and away to be his only weak spots, like I did. That at least tells me I’m on the right track.

Without putting all the numbers in, I can tell you that the highest average I have for him is.583, in the up and in zone. Dead center I have him at .415, whereas down and away I have him at a miserable .136. In all these cases, by the way, small sample sizes are very evident, as my largest zone, center, I only have 41 results (hits or outs). Of course, outside the zone it gets even worse.

I defined my zone horizontally as starting from the 0 point, dead center of the plate, taking three inches either side (technically, from -0.250 to 0.250 in the Gameday data), to make a six inch area. I then stepped out six inches on either side to make zones. Vertically, I went from 1.8 as the bottom line, up 0.567 each zone until I got to 3.5 as the top. Above and below also had that distance. So horizontally I am using 0.500, and vertically a little more. That means my plate is measuring at 18 inches wide, which for me is close enough.

Now I know you’re wondering about how I got 1.8 and 3.5. The Gameday data varies a little, as many people have shown, but in a recent game (9-9), the data had his strike zone from 1.531 to 3.502 (with a couple of variations, but these were the dominant measures). That suggests I’m good at the top, but I’m cutting off almost three inches at the bottom. Why? Here’s why:

Michael Young balls and strikes

This is a chart of the balls and called strikes Michael Young has received this year. Given that these are what is called by the umpire, this should be a fairly accurate representation of what his actual strike zone is. There is some variation, of course, and you have to realize that some of the dots are not really what they should be, but more likely errors from Gameday. For example, I doubt an umpire called a strike on that pitch that’s two feet outside (although most umpires are blind, but that’s another story).

What you see is a strike zone that is very well defined at the top, at 3.5 feet. At the bottom, the variation works out at 1.8 feet, with a couple of strikes below that, but also some balls above. Interestingly, the width of his strike zone is from about -1 to _1, or a total of 24 inches, a little wider than you’d expect.

What you also see is how many pitches are on the right side and down (both balls and strikes). There is a huge number down and away, which shows that other teams know his weakness too. I said before that his down and away zone average is .136, well if you include outside the strike zone, it falls to .093. Shouldn’t this be something that Rudy Jaramillo should be working on? Or maybe they don’t worry about it, since he’s approaching 200 hits anyway?

In the near future, I’m going to expand my look at who hits what where.  Showing Michael Young’s strike zone results don’t answer the questions I posed in the earlier post, namely that hitters seem to hit about the same anywhere in the strike zone.  This seems contradictory, after having read this post, but I am talking about two different things, namely the batting average here, but the rate of strikes, fouls, etc in the other.  How did I get that result before, and was it valid?  How do I reconcile it to this one?  With luck, and a little hard work, maybe I’ll have a result before Michael Young reaches 200 hits.

Switching on a lightbulb

September 8, 2007

Before you get carried away by Edinson Volquez’s two games this year (and yes, it is Edinson with an N in the middle, not Edison as half the media outlets would have you believe), remember that it is just two starts, just eleven innings. Having said that, he’s had an excellent 2007, from being demoted to A ball and working his way back to the bigs. One minor disciplinary misstep a few weeks ago, but otherwise he’s doing well. A few weeks ago I talked about the Rangers’ 2008 rotation, and in it I put Volquez in the “not yet” category, and said he’s a year or more away. Well, I don’t mind admitting I’m wrong, but with the caveat of just eleven major league innings this year, I think he might be a candidate for the last slot in the rotation. There are other stronger contenders, but if he goes on to have a good September, does everything well in spring training, and doesn’t have any more troubles off the field, he might just make it. At worst he’ll be starting in AAA, waiting for whichever of the rotation breaks down first next year. He could also be a long man in the bullpen, but you’d think someone like Rheinecker would be ahead of him for that job too.

Even with just the two starts, and two wins this year, I thought I’d take a look at how he’s been pitching in Gameday. For a guy with a career 1-10 record (who was rushed to the majors and potentially ruined two years ago), he’s been doing some interesting stuff. Here’s how he charts, these are all the pitches he threw in both games (9-1 at Anaheim and 9-7 vs Oakland):

Volquez Speed HV

Three pitches, and actually with the fastball (in blue) I was on the verge of deciding there were actually two pitches in there when I was looking at the other charts my program generates, but finally decided to leave it at one (at least until I have more evidence). The three are the fastball (blue), curve (green), and change (red). A clear and distinct gap between the fastball and the others in terms of speed. His fastball is from 90-96, averaging about 93, but a little high vertically at about 9.2. The other two pitches are very close together, from 77 to 85 mph, but note that the change breaks left while the curve breaks right (the horizontal is the darker color of each pair).

Overall from Gameday I have 107 fastballs, 52 changeups and 21 curves, so he’s not trusting the curve very much at the moment. The fastball is of course his bread and butter pitch, throwing about 60% of them, which you’d expect since he throws it 95. In the first game he threw 94 pitches, 60 strikes, in the second game he had 87 pitches with 52 strikes. Right around 60% of strikes in both games. The pleasing thing is that with 94 pitches in Anaheim he got through five innings, but with seven fewer pitches today, he got an extra inning. He was apparently pulled because of a blister on his thumb, which the Rangers experienced earlier this year with McCarthy, and (I have to get a dig in here) perhaps with a better pitching coach we wouldn’t have to deal with that kind of thing, because the pitchers would be better prepared.

There is one little worry that popped up in the charts I ran. Take a look at his release points:

Volquez Release Point

We saw this pattern once before, with Millwood when he was being pounded earlier in the year (I haven’t checked recently to see if that has changed at all). In this case, he is throwing the changeup from a point below the other two pitches, in this case about half a foot below the others, and that is something that major league scouts and hitters will pick up on. Again, something for a pitching coach to work on, or at least to be aware of. Again, it will be interesting to see how this changes over time.

So another nice start and another win for Volquez. He was helped in a big way by Frankie Cat, who was pulled in a very surprising move by Ron Washington. Postgame quote from big Ron: “I was only concerned about winning the ballgame.” Ron, you have to look at the big picture. Your team has a 3 run lead in the bottom of the 8th. You bring out Sammy Sosa to lead off, the thought being that it was a lefty and Cat doesn’t hit against lefties (10 at-bats this year) while Sosa does. But that’s not the big picture. The big picture is that you had 22,000 fans in the ballpark tonight, best crowd for a while (I believe yesterday it was 17,000, second worst crowd since 2000), and you could have made them very happy, as well as all the folks like myself watching on tv. You could have gotten a little exposure for the team on the news, instead of us having to watch 10 minutes of high school football and something about some Cowboy being injured.

And more importantly, to steal an idea from Gregg Easterbrook, you’re a 66-74 team! You’re not going to make the playoffs! Do something that you wouldn’t normally do! Remember back when Scott Sheldon got to play all nine positions, because Johnny Oates thought it would be something fun for the team? Yes, wins are nice, but Cat could have entered the exclusive club whose membership is just Oddibe McDowell and Mark Teixeira, and given this team something good to talk about. C’mon, Ron, break the mold, stop being a push-button manager and think about what you’re doing, and what your position is.

As it was, CJ tried to throw it away again. I’ve been saying that he should get the one run games and Benoit should get the rest, because CJ is too intense to pitch with a multi run lead (in this case five runs). I think now that Jack should be the closer, and CJ should be used in the earlier innings when the game is on the line. There’s a lot of debate online about when your best pitcher should be used, in the 9th to end the game or earlier, say in the 7th when there’s trouble. CJ would be a perfect guy to try this on, use him when he’s most needed, and save steady Jack for the 9th.

Finally, I reported less than a month ago that this blog had hit 1,000 page views (not counting feed readers), and I was happy about that. I’m even happier to report I just hit 2,000, doubling the views in under a month. A large part of that was due to a mention in an article in Slate, but it’s very gratifying to know that people are reading. There’s an old saying that it’s better for people to think bad of you than not to think of you at all. I’m pleased you all are reading, and hopefully you don’t think bad of me. Either way, I’ll keep writing it, so you can keep reading it.