Category Archives: artificial intelligence

Countdown to web sentience

In 2003, we wrote a paper titled 1 billion pages = 1 million dollars? Mining the web to play Who Wants to be a Millionaire?. We trained a computer to answer questions from the then-hit game show by querying Google. We combined words from the questions with words from each answer in mildly clever ways, picking the question-answer pair with the most search results. For the most part (see below), it worked.

It was a classic example of “big data, shallow reasoning” and a sign of the times. Call it Google’s Law. With enough data nothing fancy can be done, but more importantly nothing fancy need be done: even simple algorithms can look brilliant. When in comes to, say, identifying synonyms, simple pattern matching across an enormous corpus of sentences beats the most sophisticated language models developed meticulously over decades of research.

Our Millionaire player was great at answering obscure and specific questions: the high-dollar questions toward the end of the show that people find difficult. It failed mostly on the warm-up questions that people find easy — the truly trivial trivia. The reason is simple. Factual answers like the year that Mozart was born appear all over web. Statements capturing common sense for the most part do not. Big data can only go so far.*

That was 2003.

In the paper, our clearest example of a question that we could not answer was How many legs does a fish have?. No one on the web would actually bother to write down the answer to that. Or would they?

I was recently explaining all this to a colleague. To make my point, we Googled that question. Lo and behold, there it was: asked and answered — verbatim — on Yahoo! Answers. How many legs does a fish have? Zero. Apparently Yahoo! Answers also knows the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and “about 133,000” more.

Today, there are way more than 1 billion web pages: maybe closer to 1 trillion.

What’s the new lesson? Given enough time, everything will be on the web, including the fact that hungry poets blink (✓). Ok, not everything, but far more than anyone ever imagined.

It would be fun to try our Millionaire experiment again now that the web is bigger and search engines are smarter. Is there some kind of Moore’s Law for artificial intelligence as the web grows? Can sentience be far behind? 🙂

__________
* Lance agreed, predicting that IBM’s quest to build a Jeopardy-playing computer would succeed but not tell us much.

Thank you Bangalore

Sunday I returned from a trip to Bangalore, India, where I gave a talk on “The Automated Economy” about how computers can and should take over the mechanical aspects of economic activity, optimizing and learning from data in the way people cannot, with detailed case studies in online advertising and prediction markets. You can read the abstract, watch archive video of the talk, view my talk slides, browse the official pictures of the event, or see my personal pictures of the trip.

Some say everything’s bigger in Texas (most vociferously Texans). They haven’t been to India. My talk is part of Yahoo!’s Big Thinkers India series — four talks a year from (so far) Yahoo! Research speakers. If the Thinking isn’t Big, the crowds certainly are — the events can draw close to 1000 attendees from, apparently, all over India. Duncan Watts says its the largest crowd he’s spoken too; me too. This time they disallowed Yahoo! employees to attend the main event and the hotel ballroom still filled to capacity.

Here is a linked-up version of my journal entry for the trip, a kind of windy and winded thank you letter to Bangalore. If you’re not interested in personal details, you might skip to Thoughts on Bangalore.

Getting there

The Philadelphia airport international terminal is dead empty. I breeze through security — the only one in line. I’m inside security two hours early thinking that either the recession is still in full force or traveling internationally on a Monday night out of Philadelphia is the best ever. Maybe not. Get on plane. Wait two hours on tarmac. Apparently a two hour layover isn’t enough leeway on international flights. Miss my connecting flight in Frankfurt by a few minutes. Team up with a fellow passenger in the same boat. We are rebooked via Dubai. Fly directly over Bagdad. Dubai is an impressive airport. Endless terminals lined with upscale shopping. Packed with Asians, Europeans at midnight and beyond. From there, Emerites Air to Bangalore. Only 9 hours behind schedule. Sneezing fits begin after 28 hours of airplane air.

Day 0: Yahoo! internal practice talk

Driver right there outside baggage claim, nice guy. Takes me to hotel. Over an hour. Traffic. Time for shower, NeilMed nasal rinses (bottled water), Sudafed, but not sleep. Call home. Yahoo! Messenger with Voice doesn’t roll off the tongue like ‘Skype’, but it rocks. Super clear and dirt cheap. Lauren and the girls are so sweet. Miss them. To Yahoo! office. Meet Anita, Mani. Time for Yahoo! internal version of Big Thinkers talk. Nose is still running. Drips and wipes during my talk. Talk goes well but I run out of time for prediction market section and this seems what people are most interested in. I’m glad I had the practice run to work out the kinks and rebalanced the talk. Back to hotel. Call home again for a recharging dose of home. I missed Ashley’s graduation from pre-school: she did great: they sang six songs and she knew them all. She was dressed up in a yellow cap and gown. I’m upset I had to miss such an adorable milestone but am proud of my little girl (and dismayed she is rapidly becoming not so little!). More NeilMed. Room service. (Called “private dining” here — sounds illicit.) Sleep! For a few hours at least. Wake up in the middle of the night since it’s NY daytime. Finally get back to sleep again.

Day 1: Meetings

Hard to wake up at 9am = midnight. Shower. Feel 1000% better. Driver takes me to the Yahoo! office. It’s in a complex with Microsoft, Google, Target, Dell, and many other US brands. Once you’re inside it’s like every other Yahoo! office except the food — built essentially to corporate spec. Meet with Anita, Raghu, and Rajeev: go over PR angles and they brief me on the media interviews. These guys and gal are on top of things. Meet with Mani and her team: great group. Skip intern pizza talks because I can’t eat cheese, going for the cafeteria instead. Mistake. Order a veggie grill thinking that since it’s grilled, it’s cooked enough. I only take a few bites of this before thinking it’s too risky. I eat some bread and Indian mixtures. Not sure what the culprit is but something doesn’t sit well in my stomach. Give prediction markets portion of my talk to a few interested people in labs. Very sharp group. Meet with Dinesh and Sachin, their intern, and one other. Interesting work. Meet with Chid and Preeti on Webscope. Back to hotel. Call Lauren. Good to hear her voice. Ashley wants to say hi. She’s so adorable. She finds it hilarious that I am about to have dinner while she is eating breakfast. I can hear her laughing uncontrollably at the thought. Sarah says hi too and even ends our conversation without prompting with a “bye, love you”. I go down to the restaurant for dinner. Have a chicken Indian dish with paratha (is it lachha paratha?) bread. Spicy (sweat inducing) yet so delicious. The bread is fantastic — round white with flaky layers. Back to room. TV. CNN. CNBC. ESPN. Hard to sleep. There is an incredible thunderstorm with torrents of rain. I open my balcony door briefly to catch its power. I find out later that monsoon season is just beginning. I also find out that it rained so hard and so long that the roads flooded to the point of becoming impassible. In fact, Anita, the Bangalore PR lead, had a near-disastrous experience in the rapidly flooding streets on her way home and had to turn back and check into a hotel before going home briefly in the morning and then back to Yahoo! for our am meeting. Finally get to sleep.

Day 2: My talk!

Hard to wake up at 8:30am too. Talk’s today! Nerves begin. Media interviews are first! Even worse. Turns out they went fine. Two nice/sharp reporters, especially the second one who really knows her stuff and spoke to us (Rajeev and I) for 1.5 hours. She’s especially interested in the prediction market stuff since that is something new. She may write two articles (for Business World India). Lunch, then a bit of time to rest and freshen up. Stomach is not doing well. Pepto to the rescue. Back down to lobby. They take my picture in the courtyard. Then into the ballroom. Miked. Soundchecked. They accept a final last minute change to my slides: hooray! Room starts filling. 100 people. 200. 300. Now 500. It’s time to start! Rajeev gives a very nice intro. I walk up the stairs onto the stage. I’m miked, in lights, speaking in front of 500 people expecting a Big Thinker. Here I go! “Four score and seven years…” Ha ha. Actually: “Thanks Rajeev, and thanks everyone for your time and attention. I am happy and honored to be here. I’m going to talk about trends in automation in the economy…”

David Pennock speaking at Yahoo! Big Thinkers India June 2009Audience at Yahoo! Big Thinkers India June 2009

65 minutes later “Thank you very much.” Applause. I think it went well: one of my better talks. I covered everything, including the prediction market stuff. It turns out, like at Yahoo!, and like the journalists, the audience is more interested in prediction markets than advertising. Lots of questions. Some I follow, some I can’t parse the words, others I hear the words but just don’t understand. I do my best. Several people mention they follow my blog: gratifying. After the official Q&A session ends, there is a line up of folks with questions or comments and business cards. It’s the closest I’ll ever be to a rock star. A handful of people wait patiently around me while I try to get to everyone. Eventually the PR folks rescue me and take me to a “high tea” event with Yahoo! Bangalore execs and some recruiting targets. Relief and euphoria kick in. It’s over. I talk with a number of people. I make my exit. Private dining. Call home. Lauren has explained to Ashley that I am on the other side of the world, so when she has the sun, I have the moon. So I can hear Ashley asking in the background, “does Daddy have the moon?” I do. She can’t stop laughing. A repeat of game 6 of the Stanley Cup is on Ten Sports India. I watch it, getting psyched for Game 7. I check online for Ten Sports schedule. Game 7 will be on at 5:30am! I can’t miss that! Set my alarm. Try to sleep. Can’t sleep. Try to sleep. Can’t sleep. Try with TV on. Can’t sleep. Try with TV off. Can’t sleep. Finally fall asleep… Alarm!

Day 3a: Penguins win the Stanley Cup!

Really hard to wake up at 5:30am. Actually maybe not quite as hard since it’s 8pm in my head. Game on! Nerves are racked up. Can’t sit down: bad luck. Pacing. No score first period. Tons of commercials, all for Ten Sports programming: wrestling, cricket, tennis. Every commercial repeats three times. Is period two coming? Yes, it’s back on! Pens score first! Fist pumping and muted cheering. Can they really do this? No sitting rule in full effect. Pacing. Pens score again! Talbot second goal. Wow, is this real? Can it be? Don’t think about it yet. Don’t celebrate to soon. Plenty of time left. Period two end at 2-0. Unbelievable. All the same commercials come back, three times each. Period three begins. Stand up. Pace. Clock ticks. Pens are playing too defensive: not taking shots, just throwing the puck out of their zone. This isn’t good. Detroit is getting tons of chances. Fleury is awesome. Five minutes left. I let myself think about winning the cup. Mistake! Detroit scores! It’s 2-1! Nerves are ratcheted up beyond ratcheting. I think about it all slipping away. How awful that would feel. If Detroit ties it up, imagine the let down, the blown opportunity. Clock ticks. More chances. More saves. More defense. It’s working! Detroit pulls their goalie. Pressure. Final seconds. Faceoff in our zone. Detroit wins control. Shot. Rebound. Right to a Red Wing — Nick Lidstrom — in perfect position. He shoots. Fleury swings around. He saves it! It’s over! Pens win the Cup! Super fist pumping, jumping around, dancing, muted cheering. They did it! How amazing it feels after last year’s loss to the same team. After falling behind 2-0 and 3-2 in the series. They came back! A delicious payback with the same but opposite script as last year: a two goal lead cut in half in the waning minutes, a flurry of attempts at the end including a few-inch miss of the tying goal in the last seconds. These guys are young and have the potential to rule hockey for several years if they’re lucky. Mario Lemieux is on the ice. How sweet. Twice as player, now as owner, the one who saved hockey in Pittsburgh. What a year for Pittsburgh sports! Two nail biter games, two comebacks, two championships. City of Champions again. Too bad the Pirates have no shot to join them in a trifecta. Back to sleep.

Day 3b: Sightseeing

Phone rings at 11am — my driver is here. Off to do some whirlwind sightseeing. Everyone here who finds out I have a day off recommends I leave Bangalore — Bangalore is just not that nice, nothing really to see, they say. They all recommend Mysore, 3.5 hours away, but that is too far for my comfort level given that my flight is late tonight and it’s supposed to thunderstorm. We start with some souvenir shopping on “MG Road”. My driver takes me to a store and waits in the car outside. I walk in an instantly there are people greeting me and showing me things. One aggressive man takes over and remains my “tour guide” through the whole store. The fact that I reward his aggressiveness by following along and eventually buying stuff will only bolster him to do more of the same in the future. Annoying but clearly it works. I do negotiate him down, but I leave still feeling I didn’t bargain hard enough and with a bit of distaste in my mouth that I fueled and validated the pushy tactics. Next we drive past parliament and the courthouse. Impressive, large, old buildings. But I can just gaze and take photos from the car — can’t go inside. Next we drive past Cubbon Park — tree lined paths and flower gardens in center city. Next is ISKCON temple. But it’s closed. So one more round of shopping at a place called Cottage Industries. I’m wary given the last experience, but go anyway. This one is better. Again one person escorts me around but I feel less pressure. Plus I’m more prepared to say no and negotiate harder. I leave with what seems like a fair amount of value in goods. I recommend Cottage Industries to future visitors: more professional, more familiar (items have price tags), lower pressure, greater variety, and higher quality than at least the first shop I visited. Now we’ve killed enough time and the ISKCON temple is open. It’s a giant Hare Krishna temple. The parking lot is full. I tell the driver it’s ok — we don’t need to go. He says “you go, you go”. “Ok” I say. We drive around again to the same full parking lot. The attendant waves at us to leave, blowing a whistle. My driver is talking to him. They are talking quite heatedly. The attendant in his official looking uniform is waving us on vigorously. Although I can’t understand the words, he is clearly telling us the lot is full and we must leave immediately — we are holding up traffic. My driver is getting more insistent. They are yelling back and forth. I have no idea what he says but it works. The guard let’s us in. Meanwhile another car sees our success and tries to argue his way in too but to no avail. I ask my driver what he said: he simply replies “don’t talk”. Indeed once we’re in, there is an empty spot. We put all my bags in my suitcase in the trunk and cover my backpack. We take off our shoes and my driver leads me to the temple. He knows the back entrance and is guiding me to cut in front of lines everywhere. We walk past the main attraction: the altar with some people on the floor worshiping. Then the line weaves past a gift shop of course: I buy a crazy looking book (Easy Journey to Other Planets). We need to kill some time. We go to the gardens again to walk around. We walk into the public library. Most books are in English. Most seem old and worn. The attendant says the library is 110 years old. We start walking through the garden but I am paranoid about mosquitoes/malaria so we turn around early to return to the car. We go to UB City where I meet Rajeev. It’s a thoroughly modern office tower half owned by Kingfisher of Kingfisher Airlines. The building is full of high-end shopping like almost any upscale western mall with all the same brands. Here is the Apple Store. Here is Louis Vuitton. We have dinner at an Italian restaurant that could be anywhere in the western world, owned by an Italian expat. The only seating is outside and I remain worried about mosquitoes but don’t see any. The food is good and the conversation is good.

This place is the closest I’ve seen of the future of Bangalore. In the center of town, a gorgeous building filled with gleaming shops and tantalizing restaurants and bars, with apartments and condos within walking distance, and a palm-tree-lined street leading to the central town circle and the park. As Rajeev says, though, whereas New York has hundreds of similar scenes, Bangalore has one. For now.


Thoughts on Bangalore

Bangalore is a city of jarring contradictions, a hard-to-fathom mix of modernity and poverty. Signs with professional logos and familiar brands are set askew on dilapidated shacks and garages lining the road. While many live on dollars and day and others beg, the majority are smartly dressed (men invariably in button-down shirts), have mobile phones, and are intelligent and friendly. There are gleaming office towers indistinguishable from their western counterparts, yet a strong rain can flood the roads to the point of become impassible for hours and day-long blackouts aren’t uncommon. Many billboards are in English, sporting familiar brands and messages. Others, like sexy stars promoting a Bollywood film, are entirely familiar, English or not. Others are impenetrable. Still another advertises a phone number to learn why Obama quoted the Koran.

BMWs and Toyotas join bikes, motorcycles, pedestrians, aging trucks and buses, and colorful open-air motorized rickshaws in a sea of disorganized line-ignoring sign-ignoring traffic. People drive here the way New Yorkers walk sidewalks: weaving past one another in a noisy self-organized tangle that somehow — mostly — works. You can eat outside in a restaurant bar next to upscale shops, a fountain, and smiling yuppies, yet worry that a malaria-infected mosquito lurks nearby or that a washed vegetable will turn a western-coddled stomach deathly ill. When two people ride a motorcycle, as is common, only the driver wears a helmet — the passenger clinging on behind does not: new and old rules on display atop a single vehicle. And the traffic. Oh, the traffic. Roads are clogged nearly every hour of every day. My Saturday of sightseeing was as bad or worse than weekday rush hour. The extent of congestion itself illustrates Bangalore’s two faces: so many people with youth (India is one of the youngest countries in the world), energy, purpose, and the means and intelligence to accomplish it overtaxing a primitive infrastructure. Buildings are going up according to western specs, but under old-time rules where corruption reins and bribery is an accepted fact of life by even the western-educated aspirational class (about 20% and growing, according to Rajeev).

Thoughts on Yahoo! Labs Bangalore

The folks I met are impressive. Rajeev has done a great job hiring talented, driven folks. Mani‘s group of research engineers is fantastic. One is headed to Berkeley for grad school and asks great questions about CentMail. Another proposes an attack on Pictcha. Another (Rahul Agrawal) has read up deeply on prediction markets, including Hanson’s LMSR.

Thoughts on the Yahoo! Big Thinkers India program

The whole event was organized to precision. Anita, the PR lead, was incredible. I especially appreciated the extra “above and beyond” touches like having someone pick up Yahoo! India schwag for my family and send it to my hotel after I forgot: so nice. Raghu, who arranged the media interviews, is supremely organized and on top of his game. The fact that the event draws such a large crowd shows that there is great thirst for events like this in Bangalore. I’m not sure whose idea it is, but it’s a brilliant one: great marketing and great for recruiting.

Thank you Bangalore

In sum, thanks to the people of Bangalore for a fascinating and rewarding trip. Thanks to Rahul at the travel desk whose instant replies about the driver arrangements calmed my nerves on the stressful day of my departure. Thanks to the Yahoo! folks who arranged and organized my talk, and the Yahoo! Labs members for seeding an exceptional science organization. Thanks to my driver who got me everywhere — including into full parking lots, back entrances, and fronts of lines — with efficiency, safety, and a smile (when I tipped him, I tried to think wwsd and wwdd: what would Sharad or Dan do?). Thanks to those who attending my talk and whom I met afterward: it’s gratifying and invigorating to see your level of interest and enthusiasm (and your numbers). And thanks Bangalore chefs for keeping any stomach upset relatively mild and brief.

At the airport on the way out, the flight is overbooked and they are offering close to US$1000 plus hotel to leave tomorrow. Not a chance. It’s been fun and an adventure but my nerves are on high and I miss my family: it’s time to make the 20+ hour journey home.

Yahoo! Key Scientific Challenges student seed program

Yahoo! Research just published its list of key scientific challenges facing the Internet industry.

It’s a great resource for students to learn about the area and find meaty research problems. There’s also a chance for graduate students to earn $5000 in seed funding, work with Yahoo! Research scientists and data, and attend a summit of like-minded students and scientists.

The challenges cover search, machine learning, data management, information extraction, economics, social science, statistics, multimedia, and computational advertising.

Here’s the list of challenges from the algorithmic economics group, my group. We hope it provides a clear picture of the goals of our group and the areas where progress is most needed.

We look forward to supporting students who love a challenge and would like to join us in building the next-generation Internet.


Yahoo! Key Scientific Challenges Program 2009

Death in artificial intelligence

Until just reading about it in Wired, I knew little1 of the apparent suicide of Push Singh, a rising star in the field of artificial intelligence.

Singh seemed to have everything going for him: brilliant and driven, he became the protégé of his childhood hero Marvin Minsky, eventually earning a faculty position alongside him at MIT. Professionally, Singh earned praise from everyone from IEEE Intelligent Systems, who named Singh one of AI’s Ten to Watch (apparently revised), to Bill Gates, who asked Singh to keep him appraised of his latest publications. Singh’s social life seemed healthy and happy. The article struggles to uncover a hint of why Singh would take his own life, mentioning his excruciating chronic back pain (and linking it to a passage on the evolutionary explanation of pain as “programming bug” in Minsky’s new book, a book partly inspired by Singh).

The article weaves Push’s story with the remarkable parallel life and death of Chris McKinstry, a man with similar lofty goals of solving general AI, and even a similar approach of eliciting common sense facts from the public. (McKinstry’s Mindpixel predated Singh’s OpenMind initiative.) McKinstry’s path was less socially revered, and he seemed on a never ending and aching quest for credibility. The article muses whether there might be some direct or indirect correlation between the eerily similar suicides of the two men, even down to their methods.

For me, the story felt especially poignant, as growing up I was nourished on nearly the same computer geek diet as Singh: Vic 20, Apple II, Star Trek, D&D, HAL 9000, etc. In Singh I saw a smarter and more determined version of myself. Like many, I dreamt of solved AI, and of solving AI, even at one point wondering if a neural network trained on yes/no questions might suffice, the framework proposed by McKinstry. My Ph.D. is in artificial intelligence, though like most AI researchers my work is far removed from the quest for general AI. Over the years, I’ve become at once disillusioned with the dream2 and, hypocritically, upset that so many in the field have abandoned the dream in pursuit of a fractured set of niche problems with questionable relevance to whole.

Increasingly, researchers are calling for a return to the grand challenge of general AI. It’s sad that Singh, one of the few people with a legitimate shot at leading the way, is now gone.

Push Singh Memorial Fund

1Apparently details about Singh’s death have been slow to emerge, with MIT staying mostly quiet, for example not discussing the cause of death and taking down a memorial wiki built for Singh.
1 My colleague Fei Sha, a new father, put it nicely, saying he is “constantly amazed by the abilities of children to learn and adapt and is losing bit by bit his confidence in the romantic notion of artificial intelligence”.

Checkers bot can't lose… Ever

Mathematicians, third graders, and talkative defense department computers alike all know that there is an infallible way to play tic tac toe. A competent player can always force at least a tie against even the most savvy opponent.

In the July issue of Science, artificial intelligence researchers from the University of Alberta announced they had cracked the venerable game of checkers in the same way, identifying an infallible strategy that cannot lose.1

It doesn’t matter if the strategy is unleashed against a bumbling novice or a flawless grandmaster, it can always eke out at least a tie if not a win. In other words, any player adopting the strategy (a computer, say) makes for the most flawlessy grandmasterest checkers player of all time, period.

The proof of correctness is a computational proof that took six years to complete and was twenty-seven years in the making.

Tic tac toe and checkers are examples of deterministic games that do not involve dice, cards, or any other randomizing element, and so “leave nothing to chance”. In principle, every deterministic game, including chess, has a best possible guaranteed outcome2 and a strategy that will unfailingly obtain it. For chess, even though we know that an optimal strategy exists, the game is simply too complex for any kind of proof — by person or machine — to unearth it as of yet.

The UofA team’s accomplishment is significant, marking a major milestone in artificial intelligence research. Checkers is probably the first serious, popular game with a centuries-long history of human play to be solved, and certainly the most complex game solved to date.

Next stop: Poker

Meanwhile, the UofA’s poker research group is building Poki, a computer player for Texas Hold’em poker. Because shuffling adds an element of chance, poker cannot be solved for an infallible strategy in the same way as chess or checkers, but it can in principal still be solved for an expected-best strategy. Although no one is anywhere near solving poker, Poki is probably the world’s best poker bot. (A CMU team is also making great strides.)

Poki’s legitimate commercial incarnation is Poker Academy, a software poker tutor. An unauthorized hack of Poker Academy [original site taken down; see 2006 archive.org copy] may live an underground life as a mechanical shark in online poker rooms. (Poki’s creators have pledged not to use their bot online unidentified.)

Poker web sites take great pains to weed out bots — or at least take great pains to appear to be weeding out bots. Then again, some bot runners take great pains to avoid detection. This is a battle the poker web sites cannot possibly win.

1Technically, tic tac toe is “strongly solved”, meaning that the best strategy is known starting from every game position, while the UofA team succeeded in “weakly solving” checkers, meaning that they found a best strategy starting from the initial game board configuration.
2The best possible guaranteed outcome is the best outcome that can always be assured, no matter how good the opponent.