I've been struggling to come up with some blog ideas to pass the time until the beathlessly-anticipated first DUSHEE ranking does come out, marking the official beginning of college football season. This morning, the Google News aggregator passed this article from the Detroit News before my eyes, the audaciously named "Top three things to know about college football analytics" by one Ed Feng, a fellow engineer.
In his article, he presents some interesting data and reaches a number of conclusions, some of which appear to be justified by the data and others not so much. First in his introduction he makes a point that I've made on these pages numerous times, the college football season provides one of the least rich data sets of any popular American sport:
The other piece of this is that in addition to the lack of games each team plays, there is a glut of teams all nominally counted in the population of teams. There are 125 teams playing in the FBS. Each only plays roughly 10% of the other teams in the league.
Now we get to the three things. I'm not sure why these are the "top" three things we should know about college football analytics, but we'll blame the headline writer for that.
Thing 1: Last Year Matters
Here Feng really makes two assertions: 1) there is a vast disparity in resources among the 125 teams in the FBS and because of that 2) past performance is a predictor of future results. To me, the obvious place to look for evidence in support of these assertions would be to look for correlation between athletic budgets and revenue, alumni bases, attendance, etc., to performance over time.
Instead he focuses on assertion #2 and attempts to correlate team performance from 2013 to 2014. We've discussed here the striking lack of consistency that a team of college football players shows from game-to-game within a season (http://www.thefroghorn.com/index.php/blog/2/entry-39-blind-squirrels-finding-acorns/). The data he shows indicates to me that, not surprisingly, the consistency doesn't improve when you add the variables of player and leadership turnover from season-to-season into the mix:
Feng looks at this data set and sees evidence of "strong persistence" from season-to-season in team performance. However, take away the diagonal line which biases the mind toward seeing a correlation in a blob of data points where, at best, only a very weak correlation exists, and I think you'd be hard pressed to argue that data set "strongly" shows anything. "Strong" correlation would show a concentration of data points on or near that diagonal line instead of a blob of points that is ever-so-slightly skewed in the direction of correlation. The fact that the data set IS such a random scattering tells me that the correlation of the previous season's results to the upcoming season's results is very weak and that the 70% prediction rate claimed is due to the high count of "body bag" games where a many teams on the top right portion of that chart play teams on the bottom left.
Thing 2: Predicting Turnovers
Nothing will split the nu-skool college football analytics advocate from the old-school football fan/writer than the randomness of the turnover. Both camps agree on the importance of the turnover on the outcome of the game (http://www.footballstudyhall.com/2013/8/23/4649718/college-football-turnover-margin-winning-percentage), but the two sides strongly disagree on the ability a team has to generate turnovers (and prevent their own) by either sheer will or strength of play. Analytics will tell you without equivocation that turnover margin is one of the most random metrics from season-to-season you can find (http://grantland.com/features/nfl-stats-predicting-success/, http://www.advancedfootballanalytics.com/index.php/home/research/general/79-examining-luck-in-nfl-turnovers). Teams, even those in the NFL which possess far more season-to-season roster cohesiveness than college teams, do not have the ability to maintain high turnover margins consistently.
(The one exception to this has been the recent vintage Patriots who have consistently not fumbled at a level far exceeding any other team in the league, evidence used by many that they have, at minimum, uniquely discovered some competitive advantage, and most probably that they are cheating. http://www.wsj.com/articles/patriots-always-keep-a-tight-grip-on-the-ball-1422054846)
Feng here presents one of the few instances of evidence for some modicum of predictability in turnovers that I've seen in the correlation between quarterback accuracy and interception rate:
Even here, the correlation isn't rock-solid, but it is weakly there, and as Feng states:
The randomness and unpredictability of turnovers is why I don't use turnover margin explicitly in DUSHEE, although it is there in its impact on point margin. There is no question that turnovers have a strong impact on point margin. And it is, in part, why you see so much variability in a team's performance from game-to-game and season-to-season.
Thing 3: Efficiency, Efficiency, Efficiency
While in most respects I fall into the nu-skool analytics camp, the focus on efficiency, or looking at statistics on a per-play basis, is not one with which I am fully on board. First and foremost, the game is not played on a per-play basis, it is played on a per-game basis. If you are a successful grind-it-out, huddle-up, milk-the-clock kind of offense and can keep the opponent's offense off the field, you will be viewed poorly on an efficiency basis. In my mind, if you put 500 yards of total offense against an opponent, not only doesn't it matter whether it took you 50 or 80 plays to do it, it might actually be preferable in some cases to do it in 80 plays (i.e., less "efficiently") because you are helping out your defense more by keeping them off the field and giving the opposing offenses fewer opportunities to score.
This was always the failing of the run-and-shoot offenses of the Jerry Glanville era. Yes they moved the ball and scored lots of points quickly (read, efficiently) but they also kept their defenses on the field for lots of plays.
This was the raison d'être for Gary Patterson offenses prior to the Meachum/Cumbie era and a pretty successful one ... grind it out, score points in a methodical fashion, don't put your defense in bad situations.
So, from a performance evaluation standpoint, I'm still not convinced that the per-game basis is not more indicative of a team's quality than the per-play basis.
So, I don't think Dr. Feng has convinced me that any of this portends any necessary changes to DUSHEE for the upcoming season. However, one significant factor that DUSHEE should account for but doesn't is the effect of home field. As it currently stands, DUSHEE does not distinguish between performance at home versus away, though pretty much any analytical approach shows a clear advantage (usually 3-5 points) to playing at home. As of this writing, I still haven't formulated how I want to change DUSHEE to account for it, but I have decided I need to come up with some basis for doing so. Stay tuned ...