Baseball has generally been a sport of statistics.  I think there’s a variety of reasons for this.  The first is: it’s a slow game.  Because of that, you can make lots of decisions based on whatever you want, e.g., numbers.  Some managers, I guess, prefer using numbers and others are more instinctive, going with their gut.

The slowness affects announcing too.  More stories are told in baseball commentary than any other sport, to fill up the air while people wait.  I believe some announcers, who knew some math, began adding stats into the commentary, and this created a loop where stat guys recorded numbers and crunched results of those numbers, and gave you odd stats like how a particular hitter does against left-handed pitchers at night when the moon is full.

So even if you couldn’t be a star athlete, you could be a commentator or stats guy with a knowledge of stats.

Another factor was what I’ll call “fantasy baseball” which started off with a name like rotisserie.  When you have to pick players here and there, you want some reason to pick good players on something besides your gut.  Thus, stat books came out to help you make that choice.  Indeed, fantasy any-sport often thrives on as much data as you can get your hands on.

Another key was a book called Moneyball which advocated some lesser known statistics to help teams find diamonds in the rough for bargain prices.  Apparently, the two stats of importance were on-base percentage and slugging percentage.  Baseball is one of those sports in the US where they let those who have (the Yankees, the Red Sox) spend as much money as they want, and those who don’t have to struggle with lack of money to get good players.  This was one way to make the buck go a bit further.

Nowadays, you can pick up quite a few stats for American football and basketball.

But one sport lags far behind.

Tennis.

Tennis has always had a minimal amount of stats.  The typical ones are: first serve percentage, aces, winners, unforced errors, winning percentage on first serves, winning percentage on second serves, break points made vs break point opportunities, average first serve speed, average second serve speed, and double faults.  Most times, you get a small fraction of these stats.

Go to the ATP site and look at their stats for matches.  It’s awful.  You don’t even get unforced errors or winners.  You get aces, double faults, first and second serve percentages, winning percentage on first and second serves, breaks made vs break opportunities, and percentage of service/return points won.  Perhaps the most useless statistic is total points won by each player.  By and large, these numbers are very close, and tell you pretty much nothing.  It just says the winners win the key points rather than win a lot more points.  Even a blowout match, a winning player might win another 15% more points or so.

What kind of stats would be useful? First, there should be a concept known as “break games”.  These are games where a break point occurs.  Then, you should have a percentage of break games made vs. break games.  This tells you how many games a receiver has to break.  Sometimes, you get a game with lots and lots of break points.  The usual stat doesn’t tell you how many different games there was a chance to break.  Was it a lot?  Or just a few.

The usual stat is mostly important if a player shows an incredible ability to hold.  A great example is Nadal’s win over Federer in 2008 Wimbledon.  Federer had some outrageous number of chances to break, but broke like once or twice, and relied on tiebreaks to stay even with Nadal.  That’s where the stat is useful.

Another stat I’d like to see: key holds.  Key holds would be games where there is at least one break point.  Since it’s a hold, you know the server held serve.  This should be combined with break points in those key holds.  Something like 4/8 would mean 4 key games held with 8 break points saved.  This would give you a sense of how good a holder someone is, combined with the last stat (break games).  I’d also like to see extreme key holds. These are holds where the server is down double or triple break point (i.e., 15-40, or 0-40).  The best players are good at these kinds of holds.  Sure, they get in a hole, but they get out of it too.

Second, winners and unforced errors are way too general a category.  You get a player like John Isner that might hit 30 aces in a match, and most of those go to the winner category (I think).  That makes it difficult to tease out where the winners or unforced errors come from.  I think winners should be split as follows: aces, unreturned serves (these are serves where the return doesn’t make it in the court, rather than a ball that a returner barely touches, but doesn’t move), forehand winners, backhand winners, volley winners, specialty winners (overheads, drop shots, and weird shots like net balls that dribble over).  Unforced errors could also be divided this way: forehand errors, backhand errors, volley errors, and specialty errors (missed overhead, missed drop shots).  These winners and errors should not include serves.

Unreturned serves is a better stat than aces because pure aces vs a bad return still gives you a sense of how well the server is hitting.  Sure, it combines errors made from returners trying for a big shot, but at least it’s less subjective.

Along with key holds, I’d also like to see easy holds.  An easy hold is one where the server does not face break point.  There are two reasons to look at easy holds.  First, it shows the dominance of the server.  Second, it can show when a returner has “given up”.  Sampras used to be famous for this.  He’d get a break up, and then his opponents would get a number of easy holds because he didn’t want to break again.  He wanted to conserve energy.

So, a related stat is “easy holds by opponents after being up a break”.  This shows who wants to win every single game, and who basically relies on their serve to win games, and conserves energy.

I’d like to know first serve percentage on break points.  I have a sense that Federer’s first serve percentage on break points is awful.  I’d like to see percentage of unreturned serves on break chances.  This would tell you how well a server gets out of a hole by serving.

When I look at a score that is 7-5, 7-5, I have no information.  Maybe one guy was up 5-0 and let his opponent get back in the match.  This happened to Andy Murray.  He was down an early break to Xavier Malisse, but then broke back twice to take the set.  7-5 doesn’t tell you that.

One way to explain what happened is to indicate when breaks occurred.  This could be done like:

Set 1: 0-2*, *1-2, *7-5.  The game scores are shown with the person who ultimately won the set as the first number.  The asterisk indicates who broke.  Thus 0-2* might mean Malisse broke, then *1-2 means Murray broke back immediately, and *7-5 means Murray broke again.  The set score is given after the break occurs.  This would also indicate how many breaks occurred.

Many of these stats have a chance to be collected automatically.

How?

Hawkeye.

Indeed, with Hawkeye, you can have other stats that might not be so visible to the naked eye.  On TV, you know see average distance behind baseline, or percentage of balls taken behind baseline and in no-man’s land, a number that would show, say, Murray plays way behind the baseline, but Davydenko has a much more aggressive position, or that Nadal on grass plays more aggressive than Nadal on clay.  Indeed, these stats are part of tennistv.com (the official online broadcast of ATP events).

Already, tennistv.com is providing stats for replaying individual points with trajectories and ball-speed.  All because Hawkeye can keep track of this kind of information.  (It does look like it might need some human help since it has to figure out forehand and backhand, but maybe it can track the players too).

Ideally, this information would even be available for computer programs to query so a person could do statistical studies and see if there’s interesting in all the data, such as average rally length for each player when they win a point (thus, Nadal probably wins most points that go longer than 10 shots).

These stats help you get beyond the basic score of a match and let you know what a player was struggling with or what they did well.  Right now, ATP stats lack unforced errors and winners, so you get no sense how sloppy the match was or how aggressively it was played.  The Slams all have this stat in their websites, presumably because ITF manages the Slams and they coordinate with IBM to get that information.

It’s time the ATP start devising more and better stats for the match for public consumption.