Here's a post about balance, scoring volatility and a little home cookin.
Ben was lauding the balanced attack in the Detroit game after Shaq's early exit. trevor was concerned about the volatile tradeoff between SHAQ and STAT performances. When one was hot, the other seemed to be stone cold (in a bad non-Steve Austin way). However, when I see this tradeoff, it could be a positive sign of balance and diversification.
Meanwhile, I have been musing about one of the great mysteries of this season, aside from why there are no term limits on commissioner David Stern, is the 32-1 record of the Utah Jazz at Energy Solutions Arena. The Energy Solutions phenomenon is one of the greatest sports mysteries in Basketball ever, more amazing than a team winning 70 games. To win 70 games, one simply needs a good team. To win 32-1 at home and to be miserable every place else, how can that happen?
As a preview, I show:
SHAQ and STAT foul trouble is highly correlated, possibly due to the game's referee crew or matchup problems. When we have trouble with the front line, expect more trouble.
Although home court is worth 3.5 points/game on average, it is only worth one more win per team per year. All the home court talk is rubbish. 3.5 points sounds like a lot, but because of some variations in scoring and across teams, it's not as strong as an effect as you think. Therefore, since home court advantage is nearly mythical, absolutely nothing explains Utahs 32-1 home record.
- Adjusted Scoring margins explain 95% of wins.
- Teams with higher scoring averages and more scoring volatility given the same scoring margin win a few games less.
- Eastern conference teams win one game less on average, but one more game once controlling for scoring margins - Eastern Conference teams are better at close games.
So, lets look at scoring volatility and home cookin'.
Scoring Volatility: Let me give you an quick example.
Not nearly enough sports research has focus on volatility and correlations of the stats, not simply the averages. +/- is a move in that direction by looking at correlations. Here's an example of how volatility effects the game.
Team A has the Mailman and the Paperboy. You know the Mailman always delivers. The paperboy almost always delivers. And then there's Team B. Team B has Eddie House and Leandro Barbosa. You know how Eddie House and Leandro Barbosa are. They can break a game open and get you that George W. Or more often they can lay a goose.
Imagine the two teams play a few games.
Every game, the Mailman gets you 20 and the paperboy delivers 10. Your "A Team" averages 30. Every game, every time. Wind, rain, sleet, snow.
Now, Eddie House and LB each average 10 points. Your B team kinda sucks. If the two teams play, Team A thrashes Team B 30 to 20 every game. However, you also know Eddie House and LB are volatile. They'll either score 20 or zilch with equal probability. If they get hot and cold together, you win 50% of the time (40 to 30 half the time and 0 to 30 half the time). You have the sucky team and you can still win half the time!! If they are perfectly uncorrelated, you win 25% of the time (40 to 30 one game, 20 to 30 for 2 games, 0 to 30 one game). Not bad, but it won't win you a playoff series. Now, if one gets hot only when the other is cold you have a problem. You lose all the games, 20 to 30.
Real teams have an A squad and B squad on the same team. The B squad is the lesser team comprised of volatile scorers who may help you get lucky while your better, more consistent team is getting a rest. As you can see, there was no coincidence in the choice of A and B team members in my example.
In general the lesson is that if you are the better team (higher scoring margin), you want more consistent scoring. If you are the lesser team, not only do you want highly volatile individual scorers, but the hard part is you want them to all get hot in all the same games. That's where coaching comes in, but we save that for another day. We simply accept that perfectly negative correlation in hot hands for streaky players is impossible, so volatile scorers are helpful for bad teams and bench squads to win a few games.
Let's revisit trevor's SHAQ-STAT analysis, but use some statistics to help us. We can look at simple correlations between SHAQ and STAT and then look at them combined. If I'm the coach or the fan, I only care about the combination, not the individual numbers. The correlation could go either way in some cases, or should be expected to go one way in other cases.
When you expect a star to make other players better, you suggest that the star and other players are positively correlated performers. If a player is hot, they may draw defensive attention and open up opportunities for other players, again creating a positive correlation between them. However, if you have two stars that can perform without much aid, the correlation need not be positive. In fact, the correlation could be negative simply because there are a limited number of possessions and rebounds.
So what do we have with Shaquille O'Neal and Amar'e Stoudemire?
What do these numbers say? All of the numbers are suspect after only 11 games, but the bolded ones are likely to be not zero with 80% confidence.
- The numbers suggest when one takes more shots, the other takes less, as you might expect when the game has limited possessions. The one taking more shots is also making much more proportionally, so that's a what we would want. Feed the hot hand.
- Although FTA in uncorrelated as expected, we see a huge negative correlation between STAT and SHAQ for FTM and FT%. Let that be a lesson to you about statistics. There should be NO CORRELATION between FT%. Take these numbers with a molecule of NaCl. The FTM correlation is driven by FT% correlation.
- Rebounds are mildly negatively correlated as are everything else except personal fouls. The correlated foul trouble may be correlated due to the Zebras.
- Points are highly negatively correlated, driven directly by FGA and FGM.
Obviously, we would simply prefer to have combined plus/minus numbers for STAT and SHAQ together - player pairs data like this at 82games. But without that, how can we see how they work together?
Let's look at their individual and combined production averages and standard deviations (or volatilities). A useful summary statistic is the Sharpe Ratio, or average divided by standard deviation. For Sharpe Ratios, higher numbers are better. They have higher averages and lower volatilities.
As you might want to see, the combined output is more consistent than looking at each players' numbers. Separately, you get noisy numbers. Together you get 38 and 17 without more variability than the individual players, so the Sharpe Ratio almost doubles.
So, STAT and SHAQ are appear FINE together, as long as the refs don't call the game tight.
So the question arises, how does team scoring volatility weigh in on wins? Do consistent teams win more? Do volatile teams win less?
About the Data:
I got this data for the 2007-2008 season from basketball-reference. Thanks guys! I use the overall wins data and the game-by-game data for each team.
How to measure Home Cookin & Scoring Volatility:
 If you care, take the logarithm of points. This reduces the skewness of points. Points, theoretically, can be unlimited in the infinite-OT game. However, they are bounded from below at zero. Taking logs fixes this little statistical non-normality problem.
However, even easier is to simply throw out the OT games and this is what I've done here. Mainly, the positive skew is not a big problem and log(points) is hard to interpret without computing back into points (exp(log(points))). I found it didn't make much difference as long as you throw out the OT games.
 The most well-known determinant of score levels is home court advantage. Some teams depend a lot on home court and others don't I don't want scoring volatility clouded by homecourt advantage effects. I want to look at both homecourt dependence and scoring volatility as separate factors.
You could look at average scores at home and away and compute and average level and volatility for each, but the easy way is with a quick regression for each team of points on a dummy that equals 1 if the team is at home and zero otherwise. This gives you
[a] an intercept (unconditional scoring level average),
[b] a coefficient on homecourt advantage (how many more points the team scores at home) and
[c] an error term (adjusted scoring volatility, net of home court effects).
A look at Home Cookin:
As others have pointed out, home teams scored an average of 3.5 more points per home game last year. You would think that should translate into a lot more wins, but in the later analysis, I'll show you it only adds one win in favor of the home team.
How is that possible? Two obvious ways. First, perhaps blowouts are more likely in the favor of the home team. More home team blowouts increase the points at home, but not the wins. I'm too lazy to check this theory out right now but let's say that's something worth considering. Second, there can be cross-sectional differences home court scoring differentials. If good teams with high point differentials (scoring margins) are also good home teams, then higher scoring at home does not necessarily mean more wins, since the team was likely to win anyway, albeit at a lower margin.
Home Dummy is the increase in points scored for each home team. To compute Home_Cookin' I take Home Dummy and scale (divide) it by the adjusted scoring average. Obviously, 3.5 more points onto a scoring average of 90 matter more than to a team with a 110 scoring average. This doesn't have a huge effect, but it does reverse the order for the Utah Jazz and the Altanta Hawks.
What we see is a large variation in home court advantage from -3 points for the Clippers (ouch, Clippers Nation!) to zero effect for the Boston Celtics, an average level of 3.5 for the New Orleans Hornets, the Phoenix Suns and the LA Lakers and a mix of teams including the Utah Jazz, San Antonio Spurs, Dallas Mavericks topping it off with a shocking 7 point differential.
However, interspersed with the good teams are other middling, average and bad teams. There does not seem to be any obvious pattern, and this will help explain the rather lukewarm home court advantage results I show next.
Wins, Home Cookin & Scoring Volatility
John Hollinger believes wins are predicted by point differentials. Had I know how strong the results were, I would not have even bothered with this research question. The data are sick.
Wins and SRS (point differentials scaled by strength of schedule provided by basketball-reference) are 97% correlated. Even if I turn up anything statistically significant, it just won't make a large difference in real terms.
Eastern conference teams were -16% correlated with wins, i.e. they won less by a lot.
Home teams are 22% correlated with wins.
As in my example above, teams with more volatile scoring game-to-game were less likely to win. The correlation is -13.5%.
Conditional Regression Results
If we put this into a regression, we can get the conditional results controlling for all of the other factors.
- Average number of wins per team is 46.4 +/- 14 games (1 standard deviation).
- Basically, only the SRS is the dominant effect and the rest are secondary. Every increase in the adjusted point differential is worth 2.6 wins.
- Eastern conference teams appear better at winning close games. Controlling for SRS, Home Court advantage, Scoring Average (pace) and Adjusted Scoring Volatility, Eastern Conference teams actually win 1 more game than Western Conference teams.
Higher scoring teams that win by average SRS margins win less. Teams that score 10 more points/game but win by equal SRS win 1.25 games less than their counterparts.
Home teams win more, but not by much. This explains 0.9 wins of 46 average wins.
- Scoring volatility can help a bit, but not much.
By request, after slogging through those numbers I provide this coverage of considerably more important sports competition around the world: