One of my greatest sources of UGA-related joy is a group twitter DM I have going with Nick Toomey and Dawg_Stats. (Seriously, follow those two, Chapel Bell Curve, and Dawg Sports, and you’ll be one very informed dawg fan.) After my last column, both of my stats-focused brethren pointed out that I would do well to address two concepts before we go any further: predictions and small sample size. So, with that in mind, lets get down to brass tacks (full disclosure, I don’t know what that means), and think about What Advanced Stats Tell Us About UGA’s Matchup Against Arkansas State.
The single most challenging aspect of advanced statistical analysis, at least when it comes to College Football, is that the sample size is inherently small. Teams play - at most - 15 games, which gives us very little data to work with. (As a side note, this is why play-by-play data is so vital, as it deepens your data set.) Given that our numbers do not (yet) include adjustments for last years data, at this point, we don’t just have a small sample size, we have so little data that it is foolish to exclusively use these numbers to make predictions. So, what, you may be asking yourself, is the point of these columns? Well, I would reply, you’ll notice that the adverb “exclusively” is doing a lot of work in that last sentence. I think our numbers can, and should, inform your expectations for the coming game. There is, however, a large difference between expectations and predictions.
We talked last time about using stats to help paint the narrative of a game, and to call BS on what we see when it doesn’t match up to results on the field. Similarly, when we’re preparing for a game, the best way to use statistics (for non-gamblers, which I am) is to let them inform us of the priors and fundamentals involved in a game.
Priors and fundamentals are terms I’m borrowing here from political analysis. (I’m not a political scientist, FYI, so if I botch these definitions, I apologize. These are just the heuristics I use.) By priors, I mean the historical and contextual information we can generate about the state of play going into a contest. In politics, these may include previous election data, early polling data, and candidate performance in previous campaigns. In football terms, I think of these as personality stats (Havok Rate, SR, Exp. Rate, Run/Pass Ratio), and previous results (YPP, PPP, etc.). These numbers offer us points of comparison between two teams, and give us a rough outline of what we can expect each team to attempt in terms of a game plan.
Conversely, by fundamentals, we mean the fundamental factors that affect our baseline. One of the core fundamentals in presidential elections, for instance, is the performance of the economy. We can use fundamental data to predict, in general, the outcome of a contest. To use our political example, incumbent presidents who enter the election with a strong economy have a better chance of being re-elected. In football terms, I think the strongest fundamental is something like Bud Elliot’s Blue Chip Ratio. Generally speaking, the team with the most talent wins (duh), but more importantly, only teams with at least 50% of their roster composed of blue chips have a realistic shot at winning the CFP.
Using these two sets of data, especially with a limited sample size like we’re dealing with, it’s more helpful to establish expectations for each game, as opposed to making predictions. Sure, we can predict that one team will win, in general, because they have more talent, a better offense, etc. But that doesn’t really tell us anything about what the game in question will look like on a play-by-play basis. So, in my estimation, it’s better to look at each team’s data side-by-side, and look at the margins between related numbers. By doing this, we can arrive at a set of expectations for how the game will look.
So what does this type of analysis look like in action? Let’s check out the comparative - i.e. side-by-side - stats for the UGA’s next game against Arkansas State.
When Arkansas State has the ball, one of the first stats that jumps off the page to me is 3rd Down Success Rate. This point in the year, UGA is surrendering a stingy 30% SR. Arkansas State, on the other hand, has a below-average SR of 37% on 3rd down. This data point doesn’t tell me what should or will happen. Rather, they let me know that, if Arkansas State is consistently converting 3rd down, something in the previous data is non-predictive of the game. This doesn’t mean that UGA is doing something wrong. Instead, it just lets me know that I need to pay attention to play-selection and execution on the next 3rd down so that I can try to understand the root of the disconnect.
Another interesting data point in the chart above is Arkansas State’s run rate. On the latest episode of Chapel Bell Curve, I started my analysis of Arkansas State with the assumption that Gus Malzahn’s fingerprints were so intrinsic to the program that they must be a team that ran more than they passed. As I looked through the data, however, I saw that Arkansas State actually passes 58% of the time. While this was a pretty embarrassing moment for me personally, I asked Justin to leave it in the episode because I think it’s representative of the kind of analysis we should be doing. Statistics will almost always provide us a more clear-eyed set of expectations than the eye test or reputation will. We shouldn’t be embarrassed when objective numbers correct or expectations, but excited that we now have a more accurate picture of the situation. While these expectations can always be subverted, knowing the general statistical lay of the land will still let us know, as the game is played, what is going different than expected. And the core of intelligent analysis, I think, is the ability to trace those differences mid-game, and determine their source.
I’ll catch you Saturday in the Classic City. Until then Go Dawgs!
Also, here’s some more stats for the degenerates who made it this far.