Behind the Numbers: VCU & Butler Prove the Limits of AnalyticsPosted by KCarpenter on April 7th, 2011
Continuing on: there is not a single analytic, logical or evidence-based approach that would have predicted VCU in the Final Four. Let’s be perfectly clear about this. In basketball analytics, most systems aim to predict likely future performance based on past performance and from that data calculate the most likely outcome. VCU in the Final Four was not a likely outcome by anyone’s reckoning. Sure, a few brackets had VCU in the Final Four, but that wasn’t because of rigorous analysis of match-ups or quantum wavelength formulas that are beyond us. Anyone who put VCU in the Final Four knew that it was an unlikely outcome. Maybe they put the Rams in because they were alums. Maybe their aunt lives in Richmond. Maybe they just think Shaka Smart is a handsome man (he is!). Maybe they picked the Rams because they knew few people would. All of these people who did actually pick VCU knew that it was a longshot as opposed to something that would probably happen.
This is smart. This is how you make brackets. Remember this. When there are thousands of different possible permutations, the most likely outcome is still pretty unlikely. An all-chalk bracket seems much more likely than any number of brackets in recent years, but it has still never happened. Hell, we’ve only had one year of all four number one seeds making the Final Four. On a gut level do you feel that there is a significant difference between 1,000,000-to-1 odds and 1,500,000-1 odds? At the level of the infinitesimally unlikely, even big differences don’t seem to matter that much. I say this not as anti-mathematical nihilism, but to bring a sense of perspective to unlikely events. So here’s what I’m saying: when the most likely outcome is still incredibly unlikely to turn up, how surprising is it when something extremely unlikely happens? There is a real math answer if we gave these outcomes values, but the important answer, the one that we feel in our gut is that, no, it’s not really any more surprising than the other infinite variations of weirdness that the tournament spits at us every March.
Every bracket is a longshot prediction at a perfect bracket, which is such a rare and magnificent beast that not a single one was spotted this year (or any year, for that matter). In the ESPN bracket challenge, only two submissions out of 9.5 million even got the Final Four right. Long odds to get the bracket right, but of course, the odds that the teams themselves faced were not insignificant. Far smarter minds than mine have looked at the unlikeliness of the overall composition of this Final Four, the incredible journey of VCU to the Final Four, and the surprise of Butler in two back-to-back Championship Games; and while the supposed rarity and oddity of each of these accomplishments is interesting, it’s important not to lose sight of the big picture question: How did all these supposedly unlikely things happen and no one see any of them coming?
I have an answer, underwhelming and honest, but you probably aren’t going to like it. Human beings want to know what is going to happen in the future. They want predictions and when they ask for predictions, they expect to receive predictions. Basketball analytics don’t dispense omens or visions of the future. APBRmetrics do not peel back the veil between this world and the unknown. Basketball analytics, at it’s best, generates a set of probabilities and a set of tendencies. Most statistics-based analysts, if asked in early March who was going to win the National Championship, would have said, “Probably Ohio State.” Were these analysts wrong? No. Ohio State probably had the best chance of winning, but the most likely outcome doesn’t always happen. A probability isn’t a prediction. It’s a tendency, and in an arena when even the most likely of outcomes is rare, predictions based on analytic probabilities may seem little better than blind guessing.
This is the impasse of basketball analytics. We can tell you what is likely to happen, but we can’t tell you what will happen without a crystal ball. Is this a cop out? Maybe, but here’s what I will say: VCU, for all but one critical stretch in their NCAA tournament run, shot 10 percentage points better from beyond the arc than they had all season. Any team that shoots a decent amount of threes and makes them at a 45% clip will win games. The Rams, one of the worst defensive teams coming into the tournament, played tougher defense than anything they had previously demonstrated. Analytics use the past to predict future performance and there wasn’t much in VCU’s past to demonstrate that these performances were even possible. Yet they happened.
Butler didn’t play far above its heads like VCU did, but they rose to the occasion and endured a lucky streak that carried them through their mistakes. Whether it was silly touch fouls from Pittsburgh or poor late shot selection from Florida, Butler managed to win enough in a row to play last Monday night. What does this mean? Does it mean something when you flip a coin and it turns up heads five times in a row? Not really. It’s unlikely for a given set of five throws, but over an increasing number of coin flips, it becomes not just more likely, but inevitable. Am I saying that Butler got lucky on the road to the Final Four? Absolutely, but I am also saying that they were unlucky against Connecticut.
I don’t mean to detract from the Huskies’ legendary, record-setting defensive dominance in the title game, but lots of guys in a Bulldog uniform missed open shots. Sometimes you flip a coin ten times in a row and it comes up heads the first five times. Then the next five times it comes up tails. Shooting averages regress to the mean, and offensive and defensive efficiency, while they certainly fluctuate, tend to drift towards particular means too. Did Butler have a sudden, nasty case of regression to the mean? Probably.
A series of unlikely events led to this wonderfully bizarre and fascinating Final Four, and I don’t mean to take away from the extraordinary coaching job done by all four of the coaches by suggesting that luck played a major role in how events turned out. I don’t mean to make excuses for why people got any particular predictions wrong. I want to instead highlight and celebrate the limits of analytics: there is a wonderful beauty in being able to accurately predict outcomes through statistics and models, but there is a greater beauty in a world that joyfully resists these same predictions, a world that allows peculiar and unlikely events to unfold in spite of their own unlikeliness.