Countdown to web sentience

In 2003, we wrote a paper titled 1 billion pages = 1 million dollars? Mining the web to play Who Wants to be a Millionaire?. We trained a computer to answer questions from the then-hit game show by querying Google. We combined words from the questions with words from each answer in mildly clever ways, picking the question-answer pair with the most search results. For the most part (see below), it worked.

It was a classic example of “big data, shallow reasoning” and a sign of the times. Call it Google’s Law. With enough data nothing fancy can be done, but more importantly nothing fancy need be done: even simple algorithms can look brilliant. When in comes to, say, identifying synonyms, simple pattern matching across an enormous corpus of sentences beats the most sophisticated language models developed meticulously over decades of research.

Our Millionaire player was great at answering obscure and specific questions: the high-dollar questions toward the end of the show that people find difficult. It failed mostly on the warm-up questions that people find easy — the truly trivial trivia. The reason is simple. Factual answers like the year that Mozart was born appear all over web. Statements capturing common sense for the most part do not. Big data can only go so far.*

That was 2003.

In the paper, our clearest example of a question that we could not answer was How many legs does a fish have?. No one on the web would actually bother to write down the answer to that. Or would they?

I was recently explaining all this to a colleague. To make my point, we Googled that question. Lo and behold, there it was: asked and answered — verbatim — on Yahoo! Answers. How many legs does a fish have? Zero. Apparently Yahoo! Answers also knows the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and “about 133,000” more.

Today, there are way more than 1 billion web pages: maybe closer to 1 trillion.

What’s the new lesson? Given enough time, everything will be on the web, including the fact that hungry poets blink (âœ“). Ok, not everything, but far more than anyone ever imagined.

It would be fun to try our Millionaire experiment again now that the web is bigger and search engines are smarter. Is there some kind of Moore’s Law for artificial intelligence as the web grows? Can sentience be far behind? 🙂

__________
* Lance agreed, predicting that IBM’s quest to build a Jeopardy-playing computer would succeed but not tell us much.

11 thoughts on “Countdown to web sentience”

Pingback: Notional Slurry » links for 2010-03-07

Pingback: Fish leg counts: What the web knows and doesn’t know « Knowledge Problem

That’s “lo and behold”. The “lo” is a contraction of “look”.

Bart: Thanks. Fixed.

How many legs does a fish have?
Zero. That was 2010 🙂
I want Googled that question in 50 years, the answer may not be as if BP arrange something like the Gulf of Mexico 🙁
___
Regards, Dave

You are right David, the AI is very “clever” these days. But..
In my opinion there is still a big gap between the human intelligence and what machine can produce.
Even if it’s a giants like Google or Yahoo

Interesting point of view. But i think an artificial intelligence that google has would never be same with real intelligence 🙂

Soon enough we won’t be able to blink without it registering on the web somewhere.

WWW: World Wide War (against chopping wood and carrying water) No more Zen like lives because of the www. May we please get off the net and into the present moment?

And the Web said: Let there be Light! And there was Google….

Not sure if this is a good or bad thing. As we become a more web dependent society, we have more information available to us than ever before imaginable, yet we suffer from an impressive lack of intimacy among real people. As more children grow up connected to the internet, they also separate themselves from the real people around them. Not sure the solution to this problem lies anywhere the web can touch.

Comments are closed.

Musings of a computer scientist on predictions, odds, and markets