Grab Bag Analytics is Worse Than Your Gut
When we do analytics, it is important that we always focus on the goal of our analysis. There is a place for describing the here and now. But more often than not, we want to answer questions about the future – or at least shed some insight on them.
Who is going to win the next game? The championship?
We can then use these ideas to inform other questions. For instance, here at Wolohan Analytics, we use P(100) to evaluate quarterbacks. It’s a metric that measures how likely a quarterback is to play well enough to win the Super Bowl.
We use it to evaluate quarterbacks because we care more about winning the Super Bowl than we do putting up big yardage totals or highlight reel plays.
It’s not necessarily about the metric, though. It’s about the order of thinking.
First we think about what the phenomenon is;
then we think about how to measure it;
then we look at those measurements;
and then we make an assessment.
There’s another version of analytic-like thinking, that looks something like this:
Notice something;
Go look for a bunch of measurements that confirm that thing.
This – no matter which measurements you use – is worse than just going on gut feel.
Confirmation Bias, P-Hacking, and Analytics
We have a problem in 21st century sports: we have access to a tremendous amount of data. This let’s lots of well-intentioned people come up with Candy Analytics, analytics that sound nice, but don’t do anything to better inform us about the nature of the sport underway.
However, we also have another problem. And this is one that is especially bad during the NBA season. You can find stats to support almost any idea.
And if you start with the idea – and then go looking for stats as if they’re evidence – you will find those stats. But unfortunately, those stats that you find—especially if you’re a member of the media—are not evidence. They’re chance in action.
In the sciences, there’s a phenomenon called P-Hacking. Where people run many statistical tests on their data, in an attempt to find something to report as important.
The problem is not that you can’t run multiple tests on the same dataset. There are many valid reasons you might want to. The problem is, that every time you look for something – you risk finding something that isn’t there, purely by chance. And this needs to be accounted for.
Every time you open up ESPN, or Basketball Reference, or Pro Football Reference, or FanDuel, or what have you – you risk finding a trend that isn’t there.
And so if you are just browsing around, looking at numbers and stats, you are bound to find something that is both wrong and supports your conclusion.
Of course, that doesn’t mean your conclusion necessarily was wrong. Just because a piece of evidence gets thrown out, doesn’t mean the defendant is innocent.
But what it means is, we need to be skeptical about statistics we find, that just happen to conform to our views of something.
How do we do it right?
So you want to do it right? You have to work backwards. What is the problem at hand and what evidence do you want to be able to bring to the table. Are there established ways to look at this problem?
For example, you want to look at NFL team strength. Well, there are two really powerful ways to do that:
Elo ratings
Weighted EPA
You can pick one or both.
Or you could decide on some other formula, so long as you can defend your choice.
Maybe, total yards of offense and points surrendered. Sensible enough.
But then you have to live with your decision.
If you picked EPA, the Rams are the #1 team by EPA this year. Don’t like the Rams? That’s OK. But they’re your #1 team. You don’t get to decide you like the Broncos better because they’ve won more games.
If you picked passing yards and points surrendered, and you like the Bills (#3 by EPA), too bad. They have the 17th worst defense and 13th worst pass offense.
Accept when the numbers don’t come out the way you want.
Measure twice. Cut once.
Except measuring is thinking about what measurement to use. And cutting is measuring.




