In essence, the main premise of Michael Lewis' book, Moneyball was to examine how the Oakland A's did so well with one of the lowest payrolls in Major League Baseball. Additionally, as we state in The Wages of Wins, team payroll does not explain a high degree of team performance. How do we back up this statement statistically? We analyized team performance and relative team payroll data (to account for increasing overall payrolls over multiple seasons), and calculated the coefficient of determination, also called r-squared or R

^{2}. We use R

^{2}since we are interested in the proportion of variance that is in common between NBA team payroll and NBA team performance. Since R

^{2}is between zero and one, the number is the percentage of the variance that is in common between NBA team payroll and NBA team performance. What we find is that the proportion of variance that is in common between NBA team performance and NBA team payroll is rather small.

Some have argued - incorrectly - that we use the wrong statistical measure. They say the true measure is the correlation coefficient - also called r. Why is this incorrect? As I explained in this post on The Wages of Wins Journal, the correlation coefficient does not measure how much of the variaition between NBA team payroll and NBA team performance is in common, but rather whether NBA payroll and NBA performance change together or change oppositely.

Sometimes correlations can lead us astray. For example my blog about there being is a high positive correlation between vocublary and corporate success. If we use correlation as our guide to the importance that one variable has on another, we would conclude that studying the dictionary (or watching The Daily Show) will allow us to climb higher on the corportate ladder. While I do not have the data, my guess is that the R

^{2}is rather low, since the amount of variation that is common between these two variables is most likely tiny. These cases where you get very high correlations (positive or negative) are referred to as spurious correlation.

So with the stats stuff briefly discussed, let me show you why I disagree with the USA Today's inferences about NBA payroll and team performance. If we calculate the coefficient of determination (R

^{2}) for NBA team payroll - using the USA Today's NBA salary database and the NBA's final season team performance the R

^{2}is 0.041. What this means is that the proportion of variance that is common between NBA team payroll and NBA team performance is 4.1%. Just to be clear, the correlation coefficient is 0.202.

Not only that, but I also tested to see if the correlations between this past years NBA team payroll and team performance were related, and using the test statistic: ((n-2)*R

^{2})/(1-R

^{2})) for 1 degree of freedom and 30 degrees of freedom, found that the calculated test statistic was less than found at the 5% probability level in the F Distribution, so we would accept the null hypothesis, which is that the correlations between the two variables (NBA payroll and NBA performance) are unrelated. So not only the proportion of variance that is common between the two tiny, but here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero.

Now since I am only looking at the 2008-09 NBA season, I did not calculate relative payroll as we did in The Wages of Wins. If I were to calculate relative payroll - like we did in The Wages of Wins - we will get the same answer since relative payroll is a monotonic transformation of total payroll.

Earlier this year, an unnamed NHL executive and I looked at NHL payroll (using their data) and NHL team performance, and we found in essence the exact same result - which was a surprise to him, but not to me.

Bottom line: team payrolls are poor gauges in measuring team performance.

Click here for more information on correlation.

## 6 comments:

OK, a few points of clarification. First of all, it is a simple mathematical fact that R^2 is r, squared. There is a very straightforward relationship between the correlation and the coefficient of determination: as long as the relationship in question is bivariate, the coefficient of determination contains exactly the same information as the correlation, except that it doesn’t tell you whether the linear relationship in question is positive or negative. So, for the vocabulary and corporate success example, if r is very high, then R^2 mathematically has to be very high, as well. So when Brook says, “the correlation coefficient does not measure how much of the variation between NBA team payroll and NBA team performance is in common,” that’s not really true. Take the absolute value of the correlation coefficient, and you have a straightforward, if nonlinearly transformed, measure of the shared variation between NBA team payroll and NBA team performance.

Another point: both r and R^2 measure linear relationships only. Language about “shared variation” can be made to sound more general than this, but it is not. In the present context, the crucial question about linearity would be whether there are diminishing returns to scale in spending.

Also, when Brook says, “here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero,” this is an abuse of hypothesis testing. To say that the correlation is statistically zero is to say that we can generate a confidence interval for the correlation coefficient that is centered at zero and has width zero. Obviously, this is not the case. What is true is that we are unable to show that, if the probability model is right, a true population with a correlation coefficient of zero would generate fewer than 5 yearly payroll/victory combinations out of every 100 samples with sample correlation coefficients as big or bigger than the one we observed. This is routinely glossed with the phrase “statistically indistinguishable from zero,” but not “statistically zero.”

Finally, “spurious correlation” is about bivariate correlations that disappear when theoretically appropriate covariates are added to create a multivariate version of the model. It has nothing to do with the difference between r and R^2. In the present context, one might be tempted to ask whether there might be something of a spurious non-correlation between spending and performance. I don’t know if this is the case, but one plausible alternative hypothesis might be that underperforming teams tend to add payroll more quickly than other teams in the effort to improve their situation…

Figures lie and liars figure!

I agree with what RoastedTomatoes said, even with what little I've retained from the the stats classes I took oh so many years ago.

But I also remember that the "coefficient of correlation", r squared, allows one to deflate the spurious claims that some people like to make about low correlations...and we're talking about a low correlation here!

The facts clearly are that you don't necessarily get what you pay for if you're an NBA owner. There just isn't much of a causal or any other relationship between what NBA teams shell out and team success.

But I would submit that there is a simple reason for that.

Teams feel compelled to spend all the money they are permitted to spend.

The losers in the 2010 LeBron-Bosch-etc. sweepstakes who are sitting there with $20 million or so that the top free agents spurned won't put most of it back in their pockets and sign lesser players for what they're worth.

No, they'll blow the whole wad on somebody who isn't worth the money. That's why teams have such bizarre salary structures as Ben Wallace making more than LeBron James, or my own Hornets paying $11 million to Tyson Chandler for an occasional ally-oop dunk or an even rarer blocked shot.

They will spend the money!

They need the salary cap to protect themselves from their own irrational exuberance!

The article didn't claim there was a strong relationship between total team payroll and wins, but rather between having very highly paid players and wins. Given the complexity of the NBA salary cap and deferred salaries, and the rather narrow range of total payrolls, it's certainly possible that one is true but not the other. Why not engage the actual argument being made?

Let's look at the top 30 players by salary (all over $14M). 15 of them are on teams that had over a .600 win% (50+ wins), while just 3 of them are on sub-.400 teams. In other words, the most successful teams employed 1.67 of these elite players on average, while the worst teams employed just 0.375 elite players. That's a ratio of over 4:1 -- sure looks like a relationship.

I think you should look at the correlation of highly paid players and team wins, and tell us if there is in fact a relationship as the article implies.

Almost by definition there should be a correlation between the highest paid players and winning teams, so I certainly agree with David there.

The way the NBA contracts are structured you almost have to be a genuine superstar to qualify for top dollar.

The only exceptions probably occur when someone has been resting on their laurels and some team is foolish enough to give them one more max contract, ala Shaq.

I think those USA Today salary figures aren't right -- for instance, they have all of Jermaine O'Neal, Jamario Moon, Marcus Banks, and Shawn Marion on the Raptors' payroll even though the first two were traded to the Heat for the last two.

If you use numbers from here ...

http://thehoopdoctors.com/online2/2009/02/2008-2009-team-payrolls/

... you get an r-squared of .2561, which is fairly high.

Forms ProcessingWith a number of forms passing through an office everyday, it can be time consuming and tiring to take care of all the form processing. Instead of relying on human help, you can quickly accomplish your task with forms processing automation. Pages and pages of written or computerized data and are converted into an electronic form that is convenient for use with forms processing.

Post a Comment