Much ado was made during the 2016 presidential election about the analytical odds placed on Trump/Clinton winning the White House. In particular, the oddsmakers had Clinton as the universal favorite on election day, at odds (as memory serves) anywhere from 60-80%. And when Trump won, many took the opportunity to slam "inaccuracy" of the analytics.
But what does it mean when one claims that someone's odds of winning are 60%? Where does that number come from? If an analysis puts the odds of something happening at 70% and that something doesn't happen, does it mean that the analysis was wrong (spoiler: almost certainly, but probably not in the way you think)? Or did something improbable simply happen (spoiler: also almost certainly)?
At the crux of understanding of this topic, we first have to get comfortable with the concept of uncertainty. Those in the science/statistics game are trained to get comfortable with the concept, but it is not a concept that humans necessarily have a natural affinity for. In fact, uncertainty makes most people really uncomfortable. We don't like to think of forces that act on us as being in any way random. "Everything happens for a reason," we love to tell ourselves.
But whether we like uncertainty or not, every halfway complex interaction that occurs is fraught with uncertainty and randomness. The variables at play are many and unaccountable with the time and hardware available to us in the moment to assess them. Did that man in Pennsylvania see an ad right before the election that made him change his vote? Did that woman in Wisconsin have a child get sick the morning of the election which caused her not to vote when she would have?
Now don't worry ... I'm not going to talk politics here. We're going to use college football for our example, but the principles apply to the political example too.
So how does someone determine the odds of a team winning?
First, you need some data. For this example, we'll use DUSHEE's modified Point Differential metric. And let's take two teams, totally at random ... say TCU and Oklahoma.
As I write this, TCU has played 11 DUSHEE-recognized games, Oklahoma has played 12. The week-to-week DUSHEE score for each team is as follows:
In theory, the DUSHEE score should represent how many points better (or worse) the team performed than an average team would have performed against that team. You will note that there is a fair amount of variability in how each team has performed game-to-game. TCU has bounced between 1.27 (against WVU) and 29.01 (Oklahoma State) for an average of 15.27. Oklahoma has played between -0.38 (against Baylor) and 48.65 (Ohio State) for an average of 25.43.
On average, Oklahoma has been about 10 points better than the Frogs. However, the Sooners' performance has also been much more inconsistent -- their worst performance was worse than TCU's and their best performance was (much) better. The range (best minus worst) of Oklahoma's performance (49.0) has been much greater than TCU's (27.7).
So while you are more likely to get a 40+ point differential performance from OU (they've done it twice, TCU hasn't even hit 30), OU is also more likely to give a negative point differential performance than TCU is.
You Are Such a Standard Deviant
The most common method for measuring this spread is standard deviation. I'm going to assume that most of you have heard the term and I'm going to assume that the majority either know what it means or don't care enough to spend time going into it here. I talked a bit about it in a blog post several years ago on the Monte Carlo method.
But I do need to make a few points about standard deviation before we proceed. First, using standard deviation (or any assessment of spread) is a bit dicey when talking about as few data points as we have here (11-12 games). Second, standard deviation is based on the assumption that the data being analyzed has a "normal" or "bell-curve" distribution; i.e., a team is more likely to perform at or near their average performance than they are further away from average.
Go back to the game-to-game data for each team above and you can see how well that second assumption holds up. TCU, in particular, has only played one game near their average (last week, against Tech), and has instead shown more of a binomial distribution with five games grouped closely around 5 and five games grouped around 25.
TCU's is not a distribution that looks particularly bell-curvy. More bathtubby. And Oklahoma's distribution doesn't exactly look like a bell curve either. But let's take our lack of data points excuse and use it to ignore our lack of bell-curviness. We'll assume that the non-normal distribution of the data is because we just don't have enough data points and that as the Frogs continued to play more hypothetical games, they'd get more "average" performances and start making their bathtub look more like a bell. There is no reason to expect that teams would not have a normal distribution given enough data.
So if you do the calculation, TCU's mean performance is currently at 15.3 with a standard deviation of 10.6. Statistically, that means that there is a roughly 63% probability that TCU's DUSHEE Score in any one game will be between 4.7 and 25.9 (15.3 +/- 10.6).
Oklahoma's mean performance is currently 25.4 with a standard deviation of 15.5, meaning that we should expect 63% of Oklahoma's games to be between 9.9 and 40.9.
So, while Oklahoma has been better, on average, than the Frogs, they've also been more inconsistent.
Revisiting the French Riviera
So we can use these numbers to run what you often hear called "simulated" games. There are any number of ways you can go about "simulating" a game. You could play a bunch of games on your Xbox and assume the programmers have accurately accounted for each player's ability and matchups with opposing players and coaching and play-calling and all those other intangibles that make up the outcome of a game. You could go far deeper into the analytics than I do with DUSHEE and have a far more complex multi-variate determination of outcome.
Perhaps the most simple way to "simulate" many games against two teams is to assume that the two teams' performances over many games will follow a normal distribution based on the limited set of data we talked about above. So if we assume that the average and range of performance the teams have demonstrated so far is typical (and it might not be ... for instance the fact that Oklahoma has proven to be less consistent over 12 games than the Frogs have might be over-estimated given the limited data), then we can begin to predict how the teams would perform against each other over many, many games.
So if we take TCU's and Oklahoma's mean and standard deviations that we calculated above and use them to determine how the teams will perform over a 10,000 game season, the distribution in the figure above begins to look like this:
Now there are some bell curves.
Facing Baker Mayfield 10,000 Times
So we've made a lot of assumptions at this point. Many of them questionably asserted, but not totally unreasonable.
Now we bring randomness into play (which is where the "Monte Carlo" comes in to it). Using a random number generator, you weigh the result of the random number within the bounds of the performance bell curve for each team (this is done using the "NORMINV" function in Excel, for anyone playing along at home). Thus, outcomes near the mean of the bell curve are more likely than outcomes far from the mean. A random number (between 0 and 1) is generated for TCU and another random number is generated for Oklahoma; if the random number is near 0.5, the team performs at its mean. As the random number approaches 0, the team's performance moves out in to the left tail of the bell curve; approaching 1, the team is in the right tail.
So to simulate TCU and Oklahoma facing off 10,000 times (DUSHEE predicts the probability that Mayfield would make an obscene gesture toward the TCU sideline after taking out a Arlington police officer with a warmup pass at 3%), you generate 10,000 random numbers for TCU and 10,000 for Oklahoma and you pair them up. If the random number for Oklahoma lands at a location on its bell curve further to the right than TCU's, Oklahoma wins. You can see from the bell curves above that Oklahoma has much more real estate in the 30 to 80 range than TCU does ... so when Oklahoma lands in this range, they are very likely to win.
If you play this simulation out, on a neutral field (i.e., not giving any advantage to the home team), Oklahoma wins somewhere between 70-72% of the time. The distribution of outcomes looks like this:
So, when you begin to look at these games in a probabilistic manner, you can convince yourself pretty easily that the H2H argument as the be all, end all is a very incomplete picture. No one thinks Syracuse is a better team than Clemson. Yet, on October 13, Syracuse beat Clemson 27-24 in the Carrier Dome. If you run the same simulation I just showed for TCU-Oklahoma and give Syracuse the DUSHEE-calculated 3.4 points for home field advantage, Clemson beats Syracuse 87% of the time by an average margin of 23 points. But 13% of the time, Syracuse wins, just a little worse than the odds of rolling a six on a cubic die.
Closing Asides and Thoughts
As stated above, on average, home field is giving a 3.4 point advantage to the home team. When a team's DUSHEE score is boosted by that amount, it improves the team's chances of winning by about 3-5%.
And if you are skeptical about whether Oklahoma's inconsistency relative to TCU's is real (as I am), if you give Oklahoma the same standard deviation as TCU's, Oklahoma's chances of winning go up to 75%. By tightening Oklahoma's bell curve, Oklahoma becomes less likely to blow TCU out, but since that right tail portion of Oklahoma's bell curve is beyond TCU's right tail bell curve, Oklahoma was going to win those games anyway. Conversely, it also becomes less likely that Oklahoma will play a stinker (left tail of the bell curve) where TCU actually has a chance to win. So we kinda need to hope that Oklahoma's relative inconsistency (compared to the Frogs) over 12 games is real.
So take it easy on Nate Silver and his ilk over "missing" on the 2016 election. Their predictions are only as good as the data they have and Quinnipiac polls (as measures of how people are going to vote) are probably worse data sources than college football games. The uncertainty in those polls is why they were still giving Trump a 20-40% chance to win. Basically the odds of flipping two coins and getting two heads. About the odds we have against Oklahoma.
So you're telling me there's a chance ...