Can prediction markets be right too often?

Some people are puzzling over the fact that TradeSports simultaneously called every individual state’s Senate election correctly, yet failed to call the Democratic takeover of the Senate. Lance explains why this is not a puzzle at all, and a commenter on Marginal Revolution provides a good analogy:

… the Indianapolis Colts are favored to win in every game remaining on their schedule, but they are not expected to finish the season undefeated.

Because the Democrats essentially had to go “undefeated” across three or four states each with tight races that were only slightly leaning Democratic, the individual and overall predictions were perfectly consistent.

But this begs another question: didn’t TradeSports call too many states correctly? If the markets were correctly calibrated, then among all of the Republican candidates that were given a 40% chance of winning, shouldn’t roughly 40% of them have actually won? The same “problem” occurred in the 2004 election when TradeSports correctly called all fifty states in the Bush-Kerry presidential election. Again, both the close calls and the blowouts turned out exactly the way each market was leaning even slightly. In other words, 0% of the candidates that were given a 40% chance of winning actually won, and 100% of candidates that were given a 60% chance of winning actually won. Does that mean that those marginal underdogs were way overpriced, and the marginal favorites were way underpriced?

Not necessarily. The reason is that state election outcomes are very far from statistically independent events. Because common factors (news, scandals, economic conditions, etc.) effect all states simultaneously, the elections are far more likely to break in unison than would be expected of the same number of completely independent events. In the Indianapolis Colts analogy, it’s fairly reasonable to assume that each game is roughly statistically independent, making sixteen wins without a loss incredibly unlikely even for the best team. But for individual states’ elections held on the same day, it’s much more likely for one party to string together an “undefeated” series of wins than if the elections were truly independent. So the fact that TradeSports correctly called 33 Senate elections and 50 electoral college outcomes should not be considered as 83 independent pieces of evidence about how well TradeSports is calibrated, but rather some much smaller amount of evidence of (mis)calibration. The bottom line is we need more data across many elections to truly test TradeSports’s accuracy and calibration.

If TradeSports had offered combinatorial markets, we could have explicitly seen the strong dependence between states’ election outcomes. In fact, in the 2004 election, TradeSports did list some combinatorial contracts, like “Ohio + Florida” which revealed very strong statistical dependence.

Although TradeSports’s individual state predictions and overall Senate prediction were entirely consistent, one might argue that traders underestimated the degree of dependence (correlation) among states’ elections. In fact, I made a few bucks selling the “GOP Senate control” contract on TradeSports using exactly that reasoning. The truth is, I probably just got lucky, and it’s nearly impossible to say whether TradeSports underestimated or overestimated much of anything based on a single election. Such is part of the difficulty of evaluating probabilistic forecasts.