We use (r2) since we are interested not in only whether the two variables move together, but rather we are interested in the proportion of variance that is in common between the two variables, which is what r2 reveals. In the interest of allowing one to see our argument as to why the correlation coefficient (r) is not sufficient and why the coefficient of determination (r2) is sufficient in evaluation the statement by Bob Costa's that payroll is the single biggest determinant of MLB performance that I have decided to re-post my blog. Here it is.
-------------------------------------------------------------------------------------------------
In The Wages of Wins we use just a little bit of statistics. Much of the book is still friendly, easy to follow, words. But every once in awhile we let it slip that as professors of economics – who all have taught econometrics – we are basing our conclusions on our statistical analysis of the data.
One of the conclusions that we reach is that money cannot buy love in baseball (or football, or basketball, or hockey). This conclusion is based on the ability of relative payroll to explain the variation in winning percentage. In simple words, payroll doesn’t seem to tell us much about wins.
Recently, some individuals who claim to have knowledge about statistics have questioned this conclusion. Specifically – and this is where this post gets a little technical – people have questioned the use of the coefficient of determination – otherwise referred to as r2. These individuals have suggested that using the correlation coefficient – otherwise known as r – is a more “real-life” statistic to use in looking at how payroll and wins are related in Major League Baseball. As you can guess, we disagree. Here’s why.
Let’s begin with the evidence. From 1988 to 2006 the correlation coefficient between relative team payroll and winning percentage is 0.43. In The Wages of Wins we chose to report the coefficient of determination – or the correlation coefficient squared – and that is 0.18. Which statistic gives us the best picture of this relationship?
There is actually a problem with drawing conclusions from the correlation coefficient. In the words or R.J. Rummel, who provides an excellent tutorial entitled Understanding Correlation:
“As a matter of routine it is the squared correlations that should be interpreted. This is because the correlation coefficient is misleading in suggesting the existence of more covariation than exists, and this problem gets worse as the correlation approaches zero.” (emphasis added).
In essence, the correlation coefficient exaggerates the relationship between any two variables. That is why we employ the coefficient of determination.
It is important to understand how this statistic is interpreted. An r2 = 0.18 means that across our sample, 18% of the variance in the two variables is in common, or 82% is not in common. So given an r2 of 0.18 we conclude that 18% of the variation is explained by the two variables.
From this we conclude that the relationship between relative payroll and wins is quite weak. Of course we admit that there is no common level of explanatory power that is accepted. In other words, one could come back and say that 18% is quite large. And relative to the NFL where the explanatory power is less than 5%, maybe it is. Still, we do not think 18% is anything to hang your hat on. Simply put, payroll does not explain much of wins in Major League Baseball and therefore, the evidence tells us that teams cannot simply buy wins in baseball.
No comments:
Post a Comment