Back in February, ESPN (The Magazine) ran a cover story about the controversial 2012 MLB AL MVP race between Miguel Cabrera and Mike Trout, which somehow turned into a proxy battle between "old school" baseball men who used their eyes and guts to evaluate players and the new age sabermatrician nerds who could only see what was quantifiable. Qualitative evaluation versus quantitative evaluation. Clint Eastwood in Trouble With the Curve versus Brad Pitt in Moneyball.
For those who didn't pay any attention to the controversy, the narrative went something like this. Detroit's Miguel Cabrera won the Triple Crown, leading the league in batting average, home runs and RBIs, the first player to do so in major league baseball since Carl Yastrzemski for the Red Sox in 1967. Cabrera also led his team to the pennant. The old school guys were decidedly in the Miguel Cabrera camp.
On the other hand, Mike Trout, center fielder for the California Los Angeles Anaheim Orange County Angels of Anaheim, California, USA, had what many statisticians argued was one of the best seasons in baseball history. Compared to Miguel Cabrera who is an average fielder at a less-critical position (third base) and a slow baserunner, Mike Trout was the textbook definition of the "5-tool" player -- hitting for average, hitting for power, speed, strong arm, strong fielder. Trout, the statisticians claimed was just as strong, if not stronger due to his speed, than Cabrera offensively, and light years ahead of Cabrera defensively while playing what is considered a more critical position. WAR or Wins Above Replacement, an inscrutable statistic designed to try and evaluate players based on their performance in all aspects of their game, not just at the plate, had Trout earning his team 11 wins compared to some schmoe called up from Triple A while Cabrera was good for roughly 7 wins. Trout, the nu skool claimed, was the MVP.
The Angels also greatly underperformed as a team, finishing 5 games back in a division led by a team with a small fraction of their payroll.
I'll refer you to the linked ESPN article to read more about the details behind the arguments for each player, but one statement made by the author, Sam Miller, I think, sums up the absurdity of this whole argument plus touches on a greater systemic problem in our culture, the fear of the analytical:
Here, I'm going to resist the strong urge to digress down a more socio-political path in the interest of staying focused. But Miller makes the case that the proxy being fought here really goes beyond baseball or sports. It reflects a general ambivalence, if not animosity, of the misunderstood and analytical.
Neither side is wrong. Both players had historically epic seasons. Miguel Cabrera did something that hadn't been done in almost half a century. Mike Trout emerged as potentially the best player of a new generation.
The primary difference between the old school and the new school is which, how many, and in what way data is used to assess player excellence.
Enter the Computer Poll
The "Computer Poll" was introduced into the college football lexicon in 1998 with the advent of the Bowl Championship Series. And immediately, fans hated it. Computers don't watch games! They can't appreciate the one-handed grab of a pass over the middle in traffic or the way a good linebacker can scrape off of a blocking guard and take out the pitch man on an option. It distills football down into numbers and numbers are boring. Why are these nerds trying to ruin football with all this math?!?!
And so, as the controversies of each season passed, the "formula" for the "Computer Polls" was tweaked. Margin of victory, despite being perfectly correlated to wins, and wins always a strong traditional method of determining which teams were the best, was continually diminished in importance and finally eliminated in 2004. Added to the BCS equation in an effort to eliminate human biases in evaluating football teams, the weight of the computer poll was continually diminished when the results of the "unbiased" computer polls didn't confirm the biases of the "human" polls.
In other words, over the course of the BCS, the methodology by which one might use data to evaluate the relative strength of football teams became something that no statistician would ever devise or agree to. And whatever biases those models sought to reduce, were forced right back into them by people who clearly did not understand the advantages (and disadvantages) of such analytics.
A few key points to understand about the Computer Polls:
1) "Computer Polls" are really models
Models, if done well, seek to emulate some physical system. Models are built to emulate pumps in a submarine, electronics in a computer, unseen subatomic interactions during the Big Bang. And to paraphrase the notorious scientists/activist James Hansen, not a one of them is right. And that is because the whole point in building a model is to try to boil a intractably complex thing, be it the climate of the Earth's atmosphere or a interaction of 124 college football teams, into a small enough number of essential and understandable elements to be computationally feasible and still representative enough of the intractably complex system to glean meaning from it. No model is going to be able to predict the movement of every atom in the atmosphere nor will it be able to predict the proper alignment of the billions of butterfly effects that had to occur in the right way for TCU to beat Kansas in basketball last year.
So the first thing to understand is that models are not the systems themselves. They deviate from reality every time a variable that affects the system isn't considered (and in complex systems there are countless variables), every time a simplifying assumption is made, every time the modeler picks one piece of data to put into it and leaves another out.
But the their advantage lies, especially in the era of computers, in the ability to include many more variables than the human brain can process at one time. We've all heard the Transitive Theory of Football before -- in 2012, TCU beat Baylor by 28 and Baylor beat Kansas State by 28, then TCU should beat Kansas State by 56! The reality is that the performance of college football teams is noisy and how a team plays one week may is likely not to be at all representative of how they play in subsequent weeks. In reality, TCU not only didn't beat Kansas State by 56, they lost by 13.
But the number of results analyzed in the Transitive Theory of Football (three) is about as much information as the normal human brain can hold at one time. A computer model, while far from capable of evaluating every single variable in a system, can at least evaluate more than a cursory glance over a schedule of results by a human brain.
2) Computer polls are subject to human bias too
Computers just crunch the numbers. Aside from HAL, how they crunch the numbers is dependent on the humans who tell them what to do. And how the numbers are crunched is where the human bias of the programmers enters into the model.
As discussed in a previous entry, DUSHEE makes a number of assumptions, simplifications, and subjective assessments, as all models do. All of those assumptions and simplifications can be justified to varying degrees of rigor but nonetheless I as the modeler have determined what information I think is important, how much more important some information is than other, and even how much information I'm willing to consider given time limitations. My personal bias goes into every one of those decisions.
The same is true for Sagarin's model or Billingley's model or any other model used by the BCS system. Add to that the aforementioned biases that the BCS forces into those models themselves to make the models give them the answers they want to be given.
However, a "computer" model, if allowed, does eliminate historical bias; e.g., Texas must be good this year because they've always been good and they always have highly ranked classes, thus I will rank them highly at the beginning of the season and punish them less severely for losses than I might a "non-traditional" power. Fan loyalty bias is reduced or preference for a style of play. Not eliminated, as the modeler could skew the analysis of the stats in favor of a stronger rushing game, but at a minimum, the actual numbers on the field are what is dictating the analysis.
3) Nothing prevents "human" poll voters from using a quantitative analysis or model to determine their own rankings
Frank Windegger, who once was a voter in the Harris Poll, could have used DUSHEE, or his own model, to establish how he votes. So could anybody else. In that regard, the "computer poll" is nothing more than a really analytical "human poll." There is nothing inherently "un"human about using data to guide an assessment. Good assessments use data.
The "old school" baseball fans were using data to support their assessment that Cabrera was the MVP, namely the "Triple Crown" stats. Nothing inherently wrong with them. They provide insight into important parts of a player's game. Old school pollsters are using data too. They're using the Transitive Theory, they're using strength of schedule, winning percentage, total offense and defense, all kinds of data, both quantitative and pseudo-quantitative.
So in light of this, even with the continually diminishing clout of the computer poll during the BCS era, it still has an extremely disproportionate weighting in the BCS calculation, if viewed from a per voter basis. There are 115 Harris voters and 60 voters in the Coaches' Poll accounting for 2/3 of the BCS standings. There are 6 computer polls (ignoring that the biggest outlier gets thrown out) accounting for the other 1/3. So each Harris voter is contributing 0.29% to the BCS standings, each coach is contributing 0.56%, and each computer poll is contributing 5.6%. The computer polls are weighted 10 times more than each individual coach and almost 20 times more than the individual Harris voter.
Did anybody put any thought into this? If we accept the argument that the computer polls are superior by eliminating more qualitative bias, why include the human polls at all? However if we then return some of the qualitative bias to the computer polls (by eliminating margin of victory and dictating how the modelers use the numbers to give results more in keeping with the human polls) then why have them weighted as much as they are? The computers were added because we can't trust Lane Kiffin not to vote USC number one when he knows other teams in the country are better but then when the computers tell us some team is Number 1 other than the one the humans voted in, we decide we can't trust the computers either.
4) Despite what you hear, college football's regular season tells you the LEAST of any major sport league about the relative dominance of its teams
FBS college football has 124 teams (I think, unless more were added this offseason that I forgot) who play a 12-game season. One the other extreme, Major League Baseball has 30 teams who play a 162-game schedule. While one can and many have argued that the importance of any one individual game is almost nil in baseball and the opposite is true in college football, those making that argument often ignore the flip side of that coin. If we eliminate the inter-league games, at the end of a 144-game season, an MLB team has played every opponent in its league (15 teams, starting this year) at least 8 times. There should be no doubt who the best team in the league is. Playoffs, from the perspective of determining the best team in the league, have no purpose in baseball. They could go back to the pre-expansion era system of the best record in each league playing in the World Series, and no one would have an argument that their team deserved an opportunity. You had 162 games (now 144 in the inter-league era) to prove otherwise.
In college football, not only do you only play each opponent only once, but you only play one-tenth of the teams in your "league," meaning, in this case, the FCS. Yes, each game is very important, but the season is statistically insufficient to definitively determine which team or teams are the best in the country. Undefeated, one-loss, even two-loss teams could and have had a case that their team was the best in the country.
So in college football, more than any other American major sport, the need to coax out as much information from such a meager statistical sampling is critical if your goal is to determine which teams are the best. And so, I would argue, is the need for extending the season into a playoff. So analytical and statistical approaches should be emphasized. Because for all the bally-hoo nonsense about how "every game is a playoff game" (like Alabama getting knocked out of the "playoff" by A&M last year, right?), the reality is the regular season doesn't give us nearly enough information to determine who the best teams are based on wins and losses alone. If it did, there wouldn't be controversy every single year about who should be playing in the MNC.
DUSHEE versus the BCS
If computer polls were truly unbiased arbiters of college football excellence, they'd all arrive at the same dispassionate result. But they don't. In fact, their results can vary as wildly as the voters in any of the human polls. Last year, Billingsley had Northern Illinois at 12 while Sagarin had them unranked. The computers had Florida State anywhere from 14 to 24. Clemson, ranked 13 in both human polls was also unranked in Sagarin's.
Part of the reason why the computer polls are so universally reviled is because all of them except Colley keep their methodology at least partially obscure. Massey, on his website, compiles the output of 124 different computer polls (not including DUSHEE!), which probably gives the best true estimate of college football team rankings.
So how does DUSHEE compare to the computer polls used to determine the BCS MNC participants? I compared the 2012 DUSHEE results to 120 of the 124 computer ranking models (eliminating the ones that did not rank all 124 FCS teams) cataloged on Massey's website by using a simple R-squared correlation measure. An R-squared equal to 1 means that two data sets are perfectly correlated, i.e., the rankings of the two systems from 1 to 124 are identical. The closer the number to 1, the more closely correlated the two data sets.
The systems to which DUSHEE correlates most closely are:
Bias-Free (link broken) 0.982
Not all of these systems detail their methodology (although the Margin-Aware and Maurer methods do describe a clear emphasis on MOV, much like DUSHEE), but it is probably safe to assume that these five methods most closely resemble DUSHEE. Perhaps not surprisingly, DUSHEE doesn't compare as well with the BCS systems since they expressly do not use margin-of-victory in their formulations:
If you take the compiled standings of all 124 polls on the Massey site, DUSHEE compared to that ranking at an R2 of 0.957. Not bad for a system that only looks at two statistics, if I do say so myself.