A few years ago, on some random college football Saturday, I decided that it wasn't enough for me to just waste the day on the couch watching football ... that I needed to incorporate my two loves, college football and Excel spreadsheets, into one glorious symbiotic whole. And on that day, the Duquesne Ultimate Statistical Heuristic Evaluation of Excellence (I don't remember if that was the original non-sense acronym or not, just go with it) was invented.
I did it for a couple of reasons. One, I love putzing around with numbers. It makes me happy. Second, amid the hullabaloo about BCS computer polls, I wanted to better understand how they might be used to analyze performance and what their value was. Using just a few simple and basic performance measures, could I create a ranking system that paralleled both the "human" polls and the other computer ranking systems? Third, being a SABR nerd, I'm fascinated by comparitive techniques that allow you to compare teams of different years and eras.
And I think DUSHEE actually does a pretty fair job of doing it. In the following weeks in the lead up to the next season, I'll go into some of the interesting things the numbers tell you.
Before we go there, though, I thought it best to start off with a description of the methodology. Many on here have seen and understand it (not that it's all that complicated) but this will serve as a reference for future discussions. Plus, one of the reasons why so many people have a negative predisposition to "computer polls" is that the authors of those polls don't reveal their methodology. I know why they don't; their efforts get them column inches and salaries and name recognition in USA Today and other places and if they told you how they got their numbers, no one would need them to do it. But as we may discuss down the road, I'm not too far off of Sagarin and many of the other polls. He tends to have many of the same "Whaaa?!?!" teams that show up in DUSHEE as appearing to be way better than anyone else thinks they are. Which means either they are way better, or Jeff and I have the same GIGO problems with our models. That's for you to decide.
You can criticize the methodology, you can take it and tweak it, you can print it out and use it to line the cat's litter box. I'll take it all into consideration although I've invested way too much time in this to greatly change the way I'm doing it and go back to tweak all the stuff I've already done. So feel free to comment, but don't expect a massive revision of the method. Not until USA Today starts sending me a check too.
Keep It Simple, Stupid
DUSHEE looks at two numbers for each game between two FBS opponents, margin of victory (i.e., point margin) and yardage margin (i.e., yardage margin). Those two numbers are then compared to how a team's opponent's opponents did against them.
For example, let's look at last years' TCU-Baylor game. Why that game? Because we beat their ass.
TCU won the game 49-21 (PM = 28) and outgained Baylor 508 to 431 (YM = 77). Baylor, over the course of their season and removing the TCU game from their totals, beat their FBS opponents by an average of 8.7 points and outgained their opponents by an average of 78.5 yards. The difference between how Team A does against Team B in a particular game and how Team B does, on average, in their other 11 (or however many) games is what I'm defining as the Point Differential (PD) and the Yardage Differential (YD).
So in this particular game, TCU scored a PD = 28 + 8.7 = 36.7 and a YD = 77 + 78.5 = 155.5.
On the other side of the coin, we can look at how Baylor fared in this game.
Outside of the Baylor game, TCU was outscored their FBS opponents by 0.9 points per game and outgained their opponents by an average of 29.8 yards per game. So Baylor's PD for the game was (-28 - 0.9 = -28.9) and their YD was (-77 + 29.8 = -47.2).
This calulation is made each week for each game and the total PD and YD for each team is summed over the course of the season and averaged. TCU's totals for last year looked like this:
Opp: Kan Uva SMU | ISU Bay Ttech | OkSt WVU KSU | Tex OU MichSt
PD: -7.40 12.00 9.09 | -17.64 36.73 0.00 | -10.91 0.00 1.82 | 14.58 2.27 2.92
YD: -9.00 119.10 -93.64 | -5.27 155.45 258.18 | -43.45 82.73 40.18 | 29.67 3.91 156.42
Table 1: TCU's point differential (PD) and yardage differential (YD) for each game during the 2013 season
TCU's average PD and YD per game for the year was 3.6 and 57.8, meaning we were 3.6 points better against our opponents than their opponents were against them. And we outgained them by 57.8 more yards.
The same thing is done for all the other now 124 FBS teams.
To this point, everything is a completely objective, and really very simple, statistical analysis. If the NCAA wanted to, they could report a team's PD and YD on their website.
But then we get to the slightly subjective part ...
There's Room for Me, Sagarin
The team with the highest PD and YD (not necessarily the same team) is used as a reference and every other team's PD and YD is "normalized" relative to it. This is a fancy way of saying that you divide each team's score by the highest team's score. Thus the team with the highest PD will get a "normalized" PD of 1. And it follows that the team with the highest YD ... you get the picture.
The subjective part comes in when establishing the relative importance of PD vs. YD when establishing a ranking metric. Some would argue that YD is meaningless and PD is the only thing that matters. That sort of analysis is basically what basketball and baseball do with the RPI calculation.
As I'll argue in a follow up entry at some point, I think margin of victory is made "noisier" by the randomness of turnovers and other intangible factors that aren't necessarily controlled by the performance of a team. Looking at TCU's numbers from last year in Table 1 shows just how "noisy" a team's performance can be from week to week. Without that noise, a slightly above average team like TCU was last year should play other slightly above average teams to a near tie, lose to better teams by a steadily increasing number as the opponent gets better, and beat worse teams by an increasingly bigger number as the opponent gets weaker. That is rarely the case, but that is not the subject of this entry.
So I view yardage margin as a damper on the noise of college football performance. Touchdowns are scored on freak plays, but good teams are going to outgain bad teams on a more consistent basis than they outscore them. I'm not sure the numbers support this theory entirely (hey, a subject for a future post!) but it's my story and I'm sticking to it.
So to rank teams, I take the normalized PD and multiply it by 67 and I take the normalized YD and multiply it by 33. This weights points over yards 2:1 and gives a team who leads FBS in both PD and YD a perfect score of 100. I chose 100 because it's 100, the most kickass of all round numbers. And I chose a 2:1 weighting largely because it used to be 3:2 and Uniballer kvetched and moaned that DUSHEE was slandering his poor Kansas State team who had a great PD and a not-so-great YD one year. Completely arbitrary and subjective, but there it is, the basis for the DUSHEE ranking system.
The best team in all the land will have a DUSHEE score at or near 100. A middle-of-the-road, mediocre team will be at or near 0, and while I do not normalize the low end to force a team who would be worst in the country in both categories to a score of -100, that is typically pretty close to where the worst team ends up.
An Ass out of U and me
There are many underlying assumptions that DUSHEE makes as a model, but probably the most important one is that the teams have to be interconnected enough to make this comparison valid. Ideally, the method takes strength of schedule into account implicitly. If you are playing a bad team and barely beat them, you will get a bad score. If you play a bad team and beat them about as badly as another team has, you will get roughly the same score as that team. And DUSHEE will reward "moral" victories. Lose closely to a good team, you will have a high YD and PD.
But for a "bad" or "good" team to be established, a team has to play enough teams over some range of oppositional "quality" to make the evaluation. Over the full course of a season, that should be the case. Even a team that plays in a weak conference should have enough basis for comparison if you establish their comparative performance to teams separated by one degree from them.
By this method, a team is not only linked to their opponents, but also to their opponent's opponents as a basis for evaluation. So if every team plays 10 FBS opponents and all of those opponents have played 10 opponents, each team is getting compared to 100 teams (minus, of course any repeat teams in the opponent's opponents schedule). That should represent enough connectedness that even teams with really weak or strong schedules are evaluated fairly.
If All That Doesn't Have You Positively Orgasmic ...
That's probably enough minutia to digest for now, but to tease future entries ... I've run DUSHEE for every season dating back to 2000. Starting in 1999, it seems like the centralized storage of box scores on a single intraweb site vanishes. A few months ago, I discovered that the NCAA.org website mysteriously has hand-written official box scores saved as pdf's from the 1982 season, but nothing between then and 2000. So I ran DUSHEE for 1982. DUSHEE also generates a strength-of-schedule number, a conference strength number, can be used to select the best and worst single game performances in a particular season or week, can be used to evaluate the historical strength of teams and conferences (well, "historically" back to 2000, at least) and all kinds of other cool stuff that I know will keep this audience in rapt attention for the months ahead.
So we'll discuss all that, what "computer" polls really tell you, and what the numbers tell you about the performance of college football teams as a whole. And then when the 2013 season rolls around, we'll start to look at those numbers as well.