Category Archives: probability

A toast to the number 303: A redemptive election night for science, and The Signal

The night of February 15, 2012, was an uncomfortable one for me. Not a natural talker, I was out of my element at a press dinner organized by Yahoo! with journalists from the New York Times, Fast Company, MIT Tech Review, Forbes, SF Chronicle, WIRED, Reuters, and several more [1]. Even worse, the reporters kept leading with, “wow, this must a big night for you, huh? You just called the election.”

We were there to promote The Signal, a partnership between Yahoo! Research and Yahoo! News to put a quantitative lens on the election and beyond. The Signal was our data-driven antidote to two media extremes: the pundits who commit to statements without evidence; and some journalists who, in the name of balance, commit to nothing. As MIT Tech Review billed it, The Signal would be the “mother of all political prediction engines”. We like to joke that that quote undersold us: our aim was to be the mother of all prediction engines, period. The Signal was a broad project with many moving parts, featuring predictions, social media analysis, infographics, interactives, polls, and games. Led by David “Force-of-Nature” Rothschild, myself, and Chris Wilson, the full cast included over 30 researchers, engineers, and news editors [2]. We confirmed quickly that there’s a clear thirst for numeracy in news reporting: The Signal grew in 4 months to 2 million unique users per month [3].

On that night, though, the journalists kept coming back to the Yahoo! PR hook that brought them in the door: our insanely early election “call”. At that time in February, Romney hadn’t even been nominated.

No, we didn’t call the election, we predicted the election. That may sound like the same thing but, in scientific terms, there is a world of difference. We estimated the most likely outcome – Obama would win 303 Electoral College votes, more than enough to return him to the White House — and assigned a probability to it. Of less than one. Implying a probability of more than zero of being wrong. But that nuance is hard to explain to journalists and the public, and not nearly as exciting.

Although most of our predictions were based on markets and polls, the “303” prediction was not: it was a statistical model trained on historical data of past elections, authored by economists Patrick Hummel and David Rothschild. It doesn’t even care about the identities of the candidates.

I have to give Yahoo! enormous credit. It took a lot of guts to put faith in some number-crunching eggheads in their Research division and go to press with their conclusions. On February 16, Yahoo! went further. They put the 303 prediction front and center, literally, as an “Exclusive” banner item on Yahoo.com, a place that 300 million people call home every month.

The Signal 303 prediction "Exclusive" top banner item on Yahoo.com 2012-02-16

The firestorm was immediate and monstrous. Nearly a million people read the article and almost 40,000 left comments. Writing for Yahoo! News, I had grown used to the barrage of comments and emails, some comic, irrelevant, or snarky; others hateful or alert-the-FBI scary. But nothing could prepare us for that day. Responses ranged from skeptical to utterly outraged, mostly from people who read the headline or reactions but not the article itself. How dare Yahoo! call the election this far out?! (We didn’t.) Yahoo! is a mouthpiece for Obama! (The model is transparent and published: take it for what it’s worth.) Even Yahoo! News editor Chris Suellentrop grew uncomfortable, especially with the spin from Homepage (“Has Obama won?”) and PR (see “call” versus “predict”), keeping a tighter rein on us from then on. Plenty of other outlets “got it” and reported on it for what it was – a prediction with a solid scientific basis, and a margin for error.

This morning, with Florida still undecided, Obama had secured exactly 303 Electoral College votes.

New York Times 2012 election results Big Board 2011-11-07

Just today Obama wrapped up Florida too, giving him 29 more EVs than we predicted. Still, Florida was the closest vote in the nation, and for all 50 other entities — 49 states plus Washington D.C. — we predicted the correct outcome back in February. The model was not 100% confident about every state of course, formally expecting to get 6.8 wrong, and rating Florida the most likely state to flip from red to blue. The Hummel-Rothschild model, based only on a handful of variables like approval rating and second-quarter economic trends, completely ignored everything else of note, including money, debates, bail outs, binders, third-quarter numbers, and more than 47% of all surreptitious recordings. Yet it came within 74,000 votes of sweeping the board. Think about that the next time you hear an “obvious” explanation for why Obama won (his data was biggi-er!) or why Romney failed (too much fundraising!).

Kudos to Nate Silver, Simon Jackman, Drew Linzer, and Sam Wang for predicting all 51 states correctly on election eve.

As Felix Salmon said, “The dominant narrative, the day after the presidential election, is the triumph of the quants.” Mashable’s Chris Taylor remarked, “here is the absolute, undoubted winner of this election: Nate Silver and his running mate, big data.” ReadWrite declared, “This is about the triumph of machines and software over gut instinct. The age of voodoo is over.” The new news quants “bring their own data” and represent a refreshing trend in media toward accountability at least, if not total objectivity, away from rhetoric and anecdote. We need more people like them. Whether you agree or not, their kind — our kind — will proliferate.

Congrats to David, Patrick, Chris, Yahoo! News, and the entire Signal team for going out on a limb, taking significant heat for it, and correctly predicting 50 out of 51 states and an Obama victory nearly nine months prior to the election.

Footnotes

[1] Here was the day-before guest list for the February 15 Yahoo! press dinner, though one or two didn’t make it:
-  New York Times, John Markoff
-  New York Times, David Corcoran
-  Fast Company, EB Boyd
-  Forbes, Tomio Geron
-  MIT Tech Review, Tom Simonite
-  New Scientist, Jim Giles
-  Scobleizer, Robert Scoble
-  WIRED, Cade Metz
-  Bloomberg/BusinessWeek, Doug MacMillan
-  Reuters, Alexei Oreskovic
-  San Francisco Chronicle, James Temple

[2] The extended Signal cast included Kim Farrell, Kim Capps-Tanaka, Sebastien Lahaie, Miro Dudik, Patrick Hummel, Alex Jaimes, Ingemar Weber, Ana-Maria Popescu, Peter Mika, Rob Barrett, Thomas Kelly, Chris Suellentrop, Hillary Frey, EJ Lao, Steve Enders, Grant Wong, Paula McMahon, Shirish Anand, Laura Davis, Mridul Muralidharan, Navneet Nair, Arun Kumar, Shrikant Naidu, and Sudar Muthu.

[3] Although I continue to be amazed at how greener the grass is at Microsoft compared to Yahoo!, my one significant regret is not being able to see The Signal project through to its natural conclusion. Although The Signal blog was by no means the sole product of the project, it was certainly the hub. In the end, I wrote 22 articles and David Rothschild at least three times that many.

Raise your WiseQ to the 57th power

One of the few aspects of my job I enjoy more than designing a new market is actually building it. Turning some wild concept that sprung from the minds of a bunch of scientists into a working artifact is a huge rush, and I can only smile as people from around the world commence tinkering with the thing, often in ways I never expected. The “build it” phase of a research project, besides being a ton of fun, inevitably sheds important light back on the original design in a virtuous cycle.

In that vein, I am thrilled to announce the beta launch of PredictWiseQ, a fully operational example of our latest combinatorial prediction market design: “A tractable combinatorial market maker using constraint generation”, published in the 2012 ACM Conference on Electronic Commerce.

You read the paper.1  Now play the game.2 Help us close the loop.

PredictWiseQ Make-a-Prediction screenshot October 2012

PredictWiseQ is our greedy attempt to scarf up as much information as is humanly possible and use it, wisely, to forecast nearly every possible detail about the upcoming US presidential election. For example, we can project how likely it is that Romney will win Colorado but lose the election (6.2%), or that the same party will win both Ohio and Pennsylvania (77.6%), or that Obama will paint a path of blue from Canada to Mexico (99.5%). But don’t just window shop, go ahead and customize and buy a prediction or ten for yourself. Your actions help inform the odds of your own predictions and, crucially, thousands of other related predictions at the same time.

For example, a bet on Obama to win both Ohio and Florida can automatically raise his odds of winning Ohio alone. That’s because our market maker knows and enforces the fact that Obama winning OH and FL can never be more likely than him winning OH. After every trade, we find and fix thousands of these logical inconsistencies. In other words, our market maker identifies and cleans up arbitrage wherever it finds it. But there’s a limit to how fastidious our market maker can be. It’s effectively impossible to rid the system of all arbitrage: doing so is NP-hard, or computationally intractable. So we clean up a good bit of arbitrage, but there should be plenty left.

So here’s a reader’s challenge: try to identify arbitrage on PredictWiseQ that we did not. Go ahead and profit from it and, when you’re ready, please let me and others know about it in the comments. I’ll award kudos to the reader who finds the simplest arbitrage.

Why not leave all of the arbitrage for our traders to profit from themselves? That’s what nearly every other market does, from Ireland-based Intrade, to Las Vegas bookmakers, to the Chicago Board Options Exchange. The reason is, we’re operating a prediction market. Our goal is to elicit information. Even a completely uninformed trader can profit from arbitrage via a mechanical plug-and-chug process. We should reserve the spoils for people who provide good information, not those armed (solely) with fast or clever algorithms. Moreover, we want every little crumb of information that we get, in whatever form we get it, to immediately impact as many of the thousands or millions of predictions that it relates to as possible. We don’t want to wait around for traders to perform this propagation on their own and, besides, it’s a waste of their brain cells: it’s a job much better suited for a computer anyway.

Intrade offers an impressive array of predictions about the election, including who will win in all fifty states. In a sense, PredictWiseQ is Intrade to the 57th power. In a combinatorial market, a prediction can be any (Boolean) function of the state outcomes, an ungodly degree of flexibility. Let’s do some counting. In the election, there are actually 57 “states”: 48 winner-takes-all states, Washington DC, and two proportional states — Nebraska and Maine — that can split their electoral votes in 5 and 3 unique ways, respectively. Ignoring independent candidates, all 57 base “states” can end up colored Democratic blue or Republican Red. So that’s 2 to the power 57, or 144 quadrillion possible maps that newscasters might show us after the votes are tallied on November 6th. A prediction, like “Romney wins Ohio”, is the set of all outcomes where the prediction is true, in this case all 72 quadrillion maps where Ohio is red. The number of possible predictions is the number of sets of outcomes, or 2 to the power 144 quadrillion. That’s more than a googol, though less than a googolplex (maybe next year). To get a sense of how big that is, if today’s fastest supercomputer starting counting at the instant of the big bang, it still wouldn’t be anywhere close reaching a googol yet.

Create your own league to compare your political WiseQ among friends. If you tell us how much each player is in for, we’ll tell you how to divvy things up at the end. Or join the “Friends Of Dave” (FOD) league. If you finish ahead of me in my league, I’ll buy you a beer (or beverage of your choice) the next time I see you, or I’ll paypal you $5 if we don’t cross paths.

PredictWiseQ is part of PredictWise, a fascinating startup of its own. Founded by my colleague David Rothschild, PredictWise is the place to go for thousands of accurate, real-time predictions on politics, sports, finance, and entertainment, aggregated and curated from around the web. The PredictWiseQ Game is a joint effort among David, Miro, Sebastien, Clinton, and myself.

The academic paper that PredictWiseQ is based on is one of my favorites — owed in large part to my coauthors Miro and Sebastien, two incredible sciengineers. As is often the case, the theory looks bulletproof on paper. But I’ve learned the hard way many times that you don’t really know if a design is good until you try it. Or more accurately, until you build it and let a crowd of other people try it.

So, dear crowd, please try it! Bang on it. Break it. (Though please tell me how you did, so we might fix it.) Tell me what you like and what is horribly wrong. Mostly, have fun playing a market that I believe represents the future of markets in the post-CDA era, a.k.a the digital age.

__________
1 Or not.
2 Or not.

2011 ACM Conference on Electronic Commerce and fifteen other CS conferences in San Jose

If you’re in the Bay Area, come join us at the 2011 ACM Conference on Electronic Commerce, June 5-9 in San Jose, CA, one of sixteen conferences that comprise the ACM Federated Computing Research Conference, the closest thing we have to a unified computer research conference.

The main EC’11 conference includes talks on prediction markets, crowdsourcing, auctions, game theory, finance, lending, and advertising. The papers span a spectrum from theoretical to applied. If you want evidence of the latter, look no further than the roster of corporate sponsors: eBay, Facebook, Google, Microsoft, and Yahoo!.

There are also a number of interesting workshops and tutorials in conjunction with EC’11 this year, including:

Workshops:

  • 7th Ad Auction Workshop
  • Workshop on Bayesian Mechanism Design
  • Workshop on Social Computing and User Generated Content
  • 6th Workshop on Economics of Networks, Systems, and Computation
  • Workshop on Implementation Theory

Tutorials:

  • Bayesian Mechanism Design
  • Conducting Behavioral Research Using Amazon’s Mechanical Turk
  • Matching and Market Design
  • Outside Options in Mechanism Design
  • Measuring Online Advertising Effectiveness

The umbrella FCRC conference includes talks by 2011 Turing Award winner Leslie G. Valiant, IBM Watson creator David A. Ferrucci, and CMU professor, CAPTCHA co-inventor, and Games With a Purpose founder Luis von Ahn.

Hope to see many of you there!

There’s a new oracle in town

Cantor Gaming mobile device for in-running bettingLast January, a few friends and I visited the sportsbook at the M Casino in Las Vegas, one of several sportsbooks now run by Cantor Gaming, a division of Wall Street powerhouse Cantor Fitzgerald. Traditional sportsbooks stop taking bets when the sporting event in question begins. In contrast, Cantor allows “in-running betting”, a clunky phrase that means you can bet during the event: as touchdowns are scored, interceptions are made, home runs are stolen, or buzzers are beaten. Cantor went a step further and built a mobile device you can carry around with you anywhere in the casino to place your bets while watching games on TV, drink in hand. (Cantor also runs spread-betting operations in the UK and bought the venerable Hollywood Stock Exchange prediction market with the goal of turning it into a real financial exchange; they nearly succeeded, obtaining the green light from the CFTC before being shut down by lobbyists, er, Congress.)

Back to the device. It’s pretty awesome. It’s a Windows tablet computer with Cantor’s custom software — pretty well designed considering this is a financial firm. You can bet on the winner, against the spread, or on one-off propositions like whether the offensive team in an NFL game will get a first down, or whether the current drive will end with a punt, touchdown, field goal, or turnover. The interface is pretty nice. You select the type of bet you want, see the current odds, and choose how much you want to bet from a menu of common options: $5, $10, $50, etc. You can’t bet during certain moments in the game, like right before and during a play in football. When I was there only one game was available for in-running betting. Still, it’s instantly gratifying and — I hate to use this word — addictive. Once my friend saw the device in action, he instantly said “I’m getting one of those”.

When I first heard of Cantor’s foray into sports betting, I assumed they would build “betfair indoors”, meaning an exchange that simply matches bettors with each other and takes no risk of its own. I was wrong. Cantor’s mechanism is pretty clearly an intelligent automated market maker that mixes prior knowledge and market forces, much like my own beloved Predictalot minus the combinatorial aspect. Together with their claim to welcome sharps, employing a market maker means that Cantor is taking a serious risk that no one will outperform their prior “too much”, but the end result is a highly usable and impressively fun application. Kudos to Cantor.


P.S. Cantor affectionately dubbed their oracle-like algorithm for computing their prior as “Midas”, proving this guy has a knack for thingnaming.

Predictopus in the Times of India

Today, Yahoo! placed two full-page ads on the back cover of the Times of India, the largest English-language daily in the world, to promote Yahoo! Cricket, a site that reaches 13.4 percent of everyone online in India and serves as the official website of the ICC Cricket World Cup.

Take a look at the middle right of the second page: it says “Play exciting games and win big” and features… Predictopus! That’s the Indian spinoff of Predictalot, the combinatorial prediction game I helped invent.

Page 1 of two full-page Yahoo! Cricket ads in the Times of India, p. 31, 2011/03/30Predictopus on Page 2 of two full-page Yahoo! Cricket ads in the Times of India, p. 32, 2011/03/30

Predictopus has nearly 70,000 users and counting, and this ad certainly won’t hurt.

Yahoo!!!

BTW, I grabbed these images from an amazing site called Press Display, which I discovered via the New York Public Library.

Times of India Mumbai edition
30 Mar 2011

Times of India Mumbai edition
30 Mar 2011

Also, congrats India, and thanks! I nearly doubled my virtual bet with the victory:

Dave's Predictopus prediction: India will advance further than Pakistan, 3/2011

We’re baaack: Predictalot is here for March Madness 2011

March Madness is upon us and Predictalot, the crazy game that I and others at Yahoo! Labs invented, is live again and taking your (virtual) bets. Filling out brackets is so 2009. On Predictalot, you can compose your own wild prediction, like there will be exactly seven upsets in the opening round, or neither Duke, Kentucky, Kansas, nor Pittsburgh will make the Final Four. You’ll want your laptop out and ready as you watch the games — you can buy and sell your predictions anytime, like stocks, as the on-court action moves for or against you.

Predictalot v0.3 is easier to play. We whittled down the ‘Make Prediction’ process from four steps to just two. Even if you don’t want to wager, with one click come check out the projected odds of nearly any crazy eventuality you can dream up.

Please connect to facebook and/or twitter to share your prediction prowess with your friends and followers. You’ll earn bonus points and my eternal gratitude.

The odds start off at our own prior estimate based on seeds and (new this year) the current scores of ongoing games, but ultimately settle to values set by “the crowd” — that means you — as predictions are bought and sold.

Yahoo! Labs Predictalot version 0.3 overview tab screenshot

For the math geeks, Predictalot is a combinatorial prediction market with over 9 quintillion outcomes. Prices are computed using an importance sampling approximation of a #P-hard problem.

What kind of information can we collect that a standard prediction market cannot? A standard market will say that Texas A&M is unlikely to win the tournament. Our market can say more. Yes, A&M is unlikely to reach the Final Four and even more unlikely to win apriori, but given that they somehow make it to the semifinals in Houston, less than a two hour drive from A&M’s campus, their relative odds may increase due to a home court advantage.

Here’s another advantage of the combinatorial setup. A standard bookmaker would never dare to offer the same millions of bets as Predictalot — they would face nearly unlimited possible losses because, by tradition, each bet is managed independently. By combining every bet into a single unified marketplace, we are able to limit the worst-case (virtual) loss of our market maker to a known fixed constant.

Predictalot goes East: Introducing Predictopus for the ICC Cricket World Cup

Yahoo! India Predictopus logo

I’m thrilled to report that Predictalot had an Indian makeover, launching as Predictopus* for the ICC Cricket World Cup. The Yahoo! India team did an incredible job, leveraging the idea and some of the code base of Predictalot, yet making it their own. Predictopus is not a YAP — it lives right on the Yahoo! Cricket website, the official homepage for the ICC Cricket World Cup. They’re also giving away Rs 10 lakhs — or about $22,000 if my calculations are correct — in prizes. Everything is bigger in India, including the crowds and the wisdom thereof. It will be great to see the game played out on a scale that dwarfs our college basketball silliness in the US.

The Y! India team reused some of the backend code but redid the frontend almost entirely. To adapt the game to cricket, among other chores, we had to modify our simulation code to estimate the starting probabilities that any team would win against any other team, even in the middle of a game. (How likely is it for India to come back at home from down 100 runs with 10 overs left and 5 wickets lost? About 25%, we think.) These starting probabilities are then refined further by the game-playing crowds.

It’s great to see an experiment from Labs grow into a full-fledged product run by a real product team in Yahoo!, a prime example of technology transfer at its best. In the meantime, we (Labs) are still gunning for a relaunch of Predictalot itself for March Madness 2011, the second year in a row. Stay tuned.

2011/02/24 Update: An eye-catching India-wide ad campaign for predictopus is live, including homepage, finance, movies, OMG, answers, mail, everywhere! Oh, and one of the prizes is a Hyundai.

predictopus ad on Yahoo! India homepage 2011/02/24


* Yes, that’s a reference to legendary Paul the Octopus, RIP.

wise.gov: NSF and IARPA funding for collective intelligence

The US National Science Foundation’s Small Business Innovation Research program provides grants to to small businesses to fund “state-of-the-art, high-risk, high-potential innovation research proposals”.

In their current call for proposals, they explicitly ask for “I2b. Tools for facilitating collective intelligence”.

These are grants of up to US$150,000 with opportunity for more later I believe. The deadline is December 3, 2010! Good luck and (not so) happy Thanksgiving to anyone working on one of these proposals. I’m glad to help if I can.


The deadline for another US government program has passed, but should yield interesting results and may lead to future opportunities. In August, the Intelligence Advanced Research Projects Activity (IARPA, the intelligence community’s DARPA), which “invests in high-risk/high-payoff research programs” in military intelligence, solicited proposals for Aggregative Contingent Estimation, or what might be called wisdom-of-crowds methods for prediction:

The ACE Program seeks technical innovations in the following areas:

  • Efficient elicitation of probabilistic judgments, including conditional probabilities for contingent events.
  • Mathematical aggregation of judgments by many individuals, based on factors that may include past performance, expertise, cognitive style, metaknowledge, and other attributes predictive of accuracy.
  • Effective representation of aggregated probabilistic forecasts and their distributions.

The full announcement is clear, detailed, and well thought out. I was impressed with the solicitors’ grasp of research in the field, an impression no doubt bolstered by the fact that some of my own papers are cited 😉 . Huge hat tip to Dan Goldstein for collating these excerpts:

The accuracy of two such methods, unweighted linear opinion pools and conventional prediction markets, has proven difficult to beat across a variety of domains.2 However, recent research suggests that it is possible to outperform these methods by using data about forecasters to weight their judgments. Some methods that have shown promise include weighting forecasters’ judgments by their level of risk aversion, cognitive style, variance in judgment, past performance, and predictions of other forecasters’ knowledge.3 Other data about forecasters may be predictive of aggregate accuracy, such as their education, experience, and cognitive diversity. To date, however, no research has optimized aggregation methods using detailed data about large numbers of forecasters and their judgments. In addition, little research has tested methods for generating conditional forecasts.

2 See, e.g., Tetlock PE, Expert Political Judgment (Princeton, NJ: Princeton University Press, 2005), 164-88; Armstrong JS, “Combining Forecasts,” in JS Armstrong, ed., Principles of Forecasting (Norwell, MA: Kluwer, 2001), 417-39; Arrow KJ, et al., “The Promise of Prediction Markets,” Science 2008; 320: 877-8; Chen Y, et al., “Information Markets Vs. Opinion Pools: An Empirical Comparison,” Proceedings of the 6th ACM Conference on Electronic Commerce, Vancouver BC, Canada, 2005.

3 See, e.g., Dani V, et al., “An empirical comparison of algorithms for aggregating expert predictions,” Proc. 22nd Conference on Uncertainty in Artificial Intelligence, UAI, 2006; Cooke RM, ElSaadany S, Huang X, “On the performance of social network and likelihood-based expert weighting schemes,” Reliability Engineering and System Safety 2008; 93:745-756; Ranjan R, Gneiting T, “Combining probability forecasts,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010; 72(1): 71-91.

[Examples:]

  • Will the incumbent party win the next presidential election in Country X?
  • Will the incumbent party win the next presidential election in Country X?
  • When will Country X hold its next parliamentary elections?
  • How many cell phones will be in use globally by 12/31/11?
  • By how much will the GDP of Country X increase from 1/1/11 to 12/31/11?
  • Will Country X default on its sovereign debt in 2011?
  • If Country X defaults on its sovereign debt in 2011, what will be the growth rate in the Eurozone in 2012?

Elicitation – Advances Sought
The ACE Program seeks methods to elicit judgments from individual forecasters on:

  • Whether an event will or will not occur
  • When an event will occur
  • The magnitude of an event
  • All of the above, conditioned on another set of events or actions
  • The confidence or likelihood a forecaster assigns to his or her judgment
  • The forecaster’s rationale for his or her judgment, as well as links to background information or evidence, expressed in no more than a couple of lines of text
  • The forecaster’s updated judgments and rationale

The elicitation methods should allow prioritization of elicitations, continuous updating of forecaster judgments and rationales, and asynchronous elicitation of judgments from more than 1,000 geographically-dispersed forecasters. While aggregation methods, detailed below, should be capable of generating probabilities, the judgments elicited from forecasters can but need not include probabilities.

Challenges include:

  • Some forecasters will be unaccustomed to providing probabilistic judgments
  • There has been virtually no research on methods to elicit conditional forecasts
  • Elicitation should require a minimum of time and effort from forecasters; elicitation should require no more than a few minutes per elicitation per forecaster
  • Training time for forecasters will be limited, and all training must be delivered within the software
  • Rewards for participation, accuracy, and reasoning must be non-monetary and of negligible face value (e.g., certificates, medals, pins)

Book of Odds is serious fun

In the Book of Odds, you can find everything from the odds an astronaut is divorced (1 in 15.54) to the odds of dying in a freak vending machine accident (1 in 112,000,000).

Book of Odds is, in their own words, “the missing dictionary, one filled not with words, but with numbers – the odds of everyday life.”

I use their words because, frankly I can’t say it better. The creators are serious wordsmiths. Their name itself is no exception. “Book of Odds” strikes the perfect chord: memorable and descriptive with a balance of authority and levity. On the site you can find plenty of amusing odds about sex, sports, and death, but also odds about health and life that make you think, as you compare the relative odds of various outcomes. Serious yet fun, in the grand tradition of the web.

I love their mission statement. They seek both to change the world — by establishing a reliable, trustworthy, and enduring new reference source — and to improve the world — by educating the public about probability, uncertainty, and decision making.

By “odds”, they do not mean predictions.

Book of Odds is not in the business of predicting the future. We are far too humble for that…

Odds Statements are based on recorded past occurrences among a large group of people. They do not pretend to describe the specific risk to a particular individual, and as such cannot be used to make personal predictions.

In other words, they report how often some property occurs among a group of people, for example the fraction all deaths caused by vending machines, not how likely you, or anyone in particular, are to die at the hands of a vending machine. Presumably if you don’t grow enraged at uncooperative vending machines or shake them wildly, you’re safer than the 1 in 112,000,000 stated odds. A less ambiguous (but clunky) name for the site would be “Book of Frequencies”.

Sometimes the site’s original articles are careful about this distinction between frequencies and predictions but other times less so. For example, this article says that your odds of becoming the next American Idol are 1 in 103,000. But of course the raw frequency (1/number-of-contestants) isn’t the right measure: your true odds depend on whether you can sing.

Their statement of What Book of Odds isn’t is refreshing:

Book of Odds is not a search-engine, decision-engine, knowledge-engine, or any other kind of engine…so please don’t compare us to Googleâ„¢. We did consider the term “probability engine” for about 25 seconds, before coming to our senses…

Book of Odds is never finished. Every day new questions are asked that we cannot yet answer…

A major question is whether consumers want frequencies, or if they want predictions. If I had to guess, I’d (predictably) say predictions — witness Nate Silver and Paul the Octopus. (I’ve mused about using *.oddhead.com to aggregate predictions from around the web.)

The site seems in need of some SEO. The odds landing pages, like this one, don’t seem to be comprehensively indexed in Bing or Google. I believe this is because there is no natural way for users (and thus spiders) to browse (crawl) them. (Is this is a conscious choice to protect their data? I don’t think so: the landing pages have great SEO-friendly URLs and titles.) The problem is exacerbated because Book of Odds own custom search is respectable but, inevitably, weaker than what we’ve become accustomed to from the major search engines.

Book of Odds launched in 2009 with a group of talented and well pedigreed founders and a surprisingly large staff. They’ve made impressive strides since, adding polls, a Yahoo! Application, an iGoogle gadget, regular original content, and a cool visual browser that, like all visual browsers, is fun but not terribly useful. They’ve won a number of awards already, including “most likely company to be a household name in five years”. That’s a low-frequency event, though Book of Odds may beat the odds. Or have some serious fun trying.

The most prescient footnote ever

Back in 2004, who could have imagined Apple’s astonishing rise to overtake Microsoft as the most valuable tech company in the world?

At least one person.

Paul Graham wins the award for the most prescient parenthetical statement inside a footnote ever.

In footnote 14 of Chapter 5 (p. 228) of Graham’s classic Hackers and Painters, published in 2004, Graham asks “If the the Mac was so great why did it lose?”. His explanation ends with this caveat, in parentheses:

And it hasn’t lost yet. If Apple were to grow the iPod into a cell phone with a web browser, Microsoft would be in big trouble.

Wow. Count me impressed.

To find this quote, search inside the book for “ipod cell”.

To get into a 2004 midset, look here and here.