Category Archives: yahoo

2 weeks, 2 geeks: My two new fearless leaders

Well, geeks are certainly inheriting my earth.

On January 13, my company named Carol Bartz, a self-avowed math nerd and former punch-card carrying member of her college computer club, as its CEO. In her own words:

I was a real nerd. I love, love, love, love math. Back in the late ’60s, math meant being a teacher if you were a woman. I wasn’t interested in teaching. Then I took my first computer course. It was crazy. It was like math, only more fun. I switched to computer science.

Exactly one week later, on January 20, my country turned over executive control to Barack Obama, a CrackBerry addicted comic book geek. In his inauguration speech, Obama vowed to “restore science to its rightful place”, “wield technology’s wonders”, and even addressed “non-believers” — wording that in any sane universe should be entirely unremarkable, yet in ours appears to represent an unprecedented milestone.

I can’t recall a two-week span filled with so much geek pride and cautious optimism.

Back to the Carol Bartz quote. Reading it brings a smile to my face. It also reminds me of my mom, who, convinced it was her only option, taught middle school for a few years before returning to medical school to pursue her passion, enjoying a successful career as one of the first women radiologists.

I highly recommend Bartz’s essay, which mixes biography with prescience and insight. Bartz describes how technology and the Internet are transforming collaboration and improving productivity, at the same time ushering in an era of information overload, email bankruptcy, and misuse of the extra time technology affords. Remarkably, she wrote about these things in 1997!

It’s amazing to think how things have changed since 1997. My own first web experience, courtesy Mosaic, came in 1994, the same year Yahoo! was founded. In 1996, PayPal predecessor and public company First Virtual wrote their own keystroke-sniffing malware as a stunt to bolster their urgent call to “NEVER TYPE YOUR CREDIT CARD NUMBER INTO A COMPUTER”. Ebay was founded in 1995, PayPal in 1998. In 1997, Friendster had neither come nor gone, and Facebook CEO Mark Zuckerberg was 13.

Yet Bartz’s words seem more relevant than ever today.

The "predict flu using search" study you didn't hear about

In October, Philip Polgreen, Yiling Chen, myself, and Forrest Nelson (representing University of Iowa, Harvard, and Yahoo!) published an article in the journal Clinical Infectious Diseases titled “Using Internet Searches for Influenza Surveillance”.

The paper describes how web search engines may be used to monitor and predict flu outbreaks. We studied four years of data from Yahoo! Search together with data on flu outbreaks and flu-related deaths in the United States. All three measures rise and fall as flu season progresses and dissipates, as you might expect. The surprising and promising finding is that web searches rise first, one to three weeks before confirmed flu cases, and five weeks before flu-related deaths. Thus web searches may serve as a valuable advance indicator for health officials to spot the onset of diseases like the flu, complementary to other indicators and forecasts.

On November 11, the New York Times broke a story about Google Flu Trends, along with an unusual announcement of a pending publication in the journal Nature.

I haven’t read the paper, but the article hints at nearly identical results:

Google … dug into its database, extracted five years of data on those queries and mapped it onto the C.D.C.’s reports of influenzalike illness. Google found a strong correlation between its data and the reports from the agency…

Tests of the new Web tool … suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

To the reporter’s credit, he interviewed Phillip and the article does mention our work in passing, though I can’t say I’m thrilled with the way it was framed:

The premise behind Google Flu Trends … has been validated by an unrelated study indicating that the data collected by Yahoo … can also help with early detection of the flu.

giving (grudging) credit to Yahoo! data rather than Yahoo! people.

The story slashdigged around the blogomediasphere quickly and thoroughly, at one point reaching #1 on the nytimes.com most-emailed list. Articles and comments praise how novel, innovative, and outside-of-the-box the idea is. The editor in chief of Nature praised the “exceptional public health implications of [the Google] paper.”

I’m thrilled to see the attention given to the topic, and the Google team deserves a huge amount of credit, especially for launching a live web site as a companion to their publication, a fantastic service of great social value. That’s an idea we had but did not pursue.

In the business world, being first often means little. However in the world of science, being first means a great deal and can be the determining factor in whether a study gets published. The truth is, although the efforts were independent, ours was published first — and Clinical Infectious Diseases scooped Nature — a decent consolation prize amid the go-google din.

Update 2008/11/24: We spoke with the Google authors and the Nature editors and our paper is cited in the Google paper, which is now published, and given fair treatment in the associated Nature News item. One nice aspect of the Google study is that they identified relevant search terms automatically by regressing all of the 50 million most frequent search queries against the CDC flu data. Congratulations and many thanks to the Google/CDC authors and the Nature editors, and thanks everyone for your comments and encouragement.

NYCE Day: Thanks and thoughts

NYCE Day 2008 went very well, with over 100 attendees, great talks, and valuable discussion. Many thanks to the four plenary speakers — Costis, Asim, Susan, and Tuomas — and ten rump session speakers who came in from various NYC suburbs like Boston, Pittsburgh, and Palo Alto.

At dinner the night before,1 the organizers agreed that we were nervous because we weren’t at all nervous. Karin and Renee from the New York Academy of Sciences had taken care of almost everything, leaving little for us to fret about. It turned out we were right to not worry and wrong to worry about not worrying: indeed Karin, Renee, and NYAS were absolutely fantastic, orchestrating every detail of the event flawlessly, from technology to catered breaks. The venue itself is gorgeous — a well laid-out space in a modern building in the World Trade Center complex with stunning views2 and a number of nice touches, from an alcove with a computer station to check email to a subtle gradient in the wallpaper that slowly pixilates as your gaze moves from the center toward the side of the room. I came away incredibly impressed with NYAS and delighted to become a member.

Muthu provides an excellent summary of the event, divided into before and after lunch. Read that first and then come back here for my additional thoughts/notes:

  1. Costis gave us mostly bad news. He summarized some of his award winning work with Christos Papadimitriou and Paul Goldberg proving that computing equilibrium behavior in almost any moderately complex game may be beyond the reach of our computers,3 let alone our brains. As a saying goes, “if your laptop can’t find it, then neither can the market” [attribution: Kamal Jain?]. Still, all may not be lost. These results, as is the nature of computational complexity results, say only that some games are extremely hard to solve, not all games or even most games. Since nature is not adversarial (Murphy’s Law aside), it may be the case that among games that arise in the real world that we care about, a number of them can be solved for equilibrium. The problem is defining what “realistic” means in this context: an almost impossibly fuzzy task. Costis did end with some positive results, showing that anonymous games can be solved efficiently. Anonymous games crop up in realistic situations, for example in analyzing traffic, where only the quantity of cars near you matters and not the identity of the drivers inside.
  2. Asim described a sophisticated Bayesian model well suited for social network data that handles non-existant links — meaning the lack of connection between two people, by far the most common situation — much better than previous approaches. The approach is good for digging deeply into a small data set but at least for now has difficulty with moderately large amounts of data. (To get results in a reasonable amount of time, Asim had to down sample his already fairly modest sized corpus.) The talk didn’t help me overcome my bias that Bayesian methods ala UAI often don’t work well at Internet scales without modification.
  3. Susan gave a fantastic and energetic talk. She advocates economic models of online advertising that include more sophisticated users, as opposed to typical models that assume users scan from the top of the page down in a precise sequence. She went further to claim that users may actually choose their search engine based on the quality of the ads. Personally, I’m a bit skeptical about that, though I do agree that there is an indirect effect: search engines with better paying ads can afford to buy more traffic and improve their algorithmic search more. Susan highlighted the enormous shift in mindset required between economic theory and practice when just computing the mean of a data stream can take weeks (though this is changing with tools like Hadoop that can bring such computations down to hours or minutes as Sebastien confirms).
  4. Someone asked Tuomas why his expressive commerce company CombineNet uses first-price auctions instead of VCG pricing. He listed four of what he said were dozens of reasons on top of Rothkopf’s thirteen and Ausubel and Milgrom’s list. In fact he went further to say that as far as he knew no real auction anywhere in the world has ever used true VCG pricing for anything more complicated than selling a single good at a time.
  5. For those not familiar, a rump session is open to anyone to speak briefly on any relevant topic. As it turns out, in part because brevity forces clarity, and in part because editorial filtering overweights mediocrity, the rump session is often the most interesting part of a conference. The “NYCE rump” session was no exception, with topics spanning ad auctions, reputation, Internet routing, and user generated content. Ivy Li proposed a clever scheme whereby eBay sellers are motivated to reward buyers for honest feedback. Sebastien presented work with Sihem and I on an expressive bidding language for online advertising with fast allocation and pricing algorithms, with the goal of moving the industry toward an open standard. Sampath Kannan on leave at NSF had encouraging news on the funding front, laying out his vision for CS theory funding with an explicit call for proposals at the boundary of CS and economics.
  6. I think we did a good job of attracting a diversity of speakers and participants, with talks ranging from computational complexity to Bayesian models of social networks, with academia and industry represented, and with CS, economics, and business backgrounds represented.
1We had dinner at Gobo, a fantastic restaurant Muthu recommended that truly opened my eyes in terms of the tastes and textures possible with a vegetarian menu. Delicious.
2Speaking of views, I had a stunning and fascinating one from my hotel the night before, looking straight down onto ground zero of the World Trade Center complex from a relatively high floor of the Millenium Hilton (apparently intentionally misspelled). I booked the room for $185 on Hotwire, and then found out why. Though the WTC site still looks nearly empty, builders appear to be making up for lost time with round the clock construction. Put it this way: the hotel kindly provided complementary earplugs. All in all though the room and view were well worth the cost in dollars and sounds.
3Specifically, computing Nash equilibrium is PPAD-complete for most games. In terms of complexity classes, PPAD is a superset of P and a subset of NP. Almost surely there is no polynomial time algorithm, though the problem is not quite as hard as the classic NP-complete problems like traveling salesman.

Yahoo! Open Hack Day Sunnyvale, Sept 12-13, 2008

Yahoo! Open Hack Day 2008 As Jed points out, “an idea is only the first step in innovation, and it’s by far the easiest step”.

Yahoo! Hack Day was created precisely to summon and celebrate the hard step of innovation: the build it step. The goal is simple: take an idea and make it real — in 24 hours. Spend all day and all night coding until a working, useable, if brittle prototype of your idea emerges. Then show it off.

Hack Day is a religion inside Yahoo!, but on September 12-13, 2008, Yahoo! will open up its Sunnyvale campus, inviting any developer who feels like it to join the geek-out frenzy. Sign up here.

Schedule

8am-6pm PT Friday: Over 20 workshops covering YUI and the newest API offerings from Yahoo!, including BOSS, SearchMonkey, Fire Eagle, and more, and previews of what’s next.

8pm Friday: A surprise musical guest takes the stage (it’s not 2006 guest Beck, but apparently the lyrics are “hacker-friendly” and “may not be appropriate for young children”). Hacking continues all night.

2pm Saturday: Judging, including a special hack-off for the winners of the University Hack Days.

Saturday evening: Awards.

History, thoughts, and notes

At the first Open Hack Day in 2006 in Sunnyvale (see photos), 400 developers fueled by 500 pizzas and a live Beck performance cranked out 54 hacks. At Open Hack Day London lightening struck twice and it rained indoors. Bangalore followed.

If you’re a student, Yahoo! Hack Week may be coming to a campus near you. We’ve held Hack Weeks at Georgia Tech, CMU, UIUC, UC Berkeley, and Stanford, and I believe Waterloo is next. Here’s a quote describing these Hack U events: “Computer science students fueled by fast food, ultra-caffeinated beverages, and alternative music, are free to let their imaginations run wild, tapping the Yahoo! library of APIs to create hacks that advance the Internet experience.”

Why Hack Day? Many an engineer join Jed in lamenting how the PowerPointy set co-opted the term innovation, rendering it almost meaningless. Hack Day was created in part to reclaim innovation for the makers.

Why does it work? Hack Day is to programmers what NaNoWriMo is to writers: a timed artistic challenge that on it’s face is a ludicrous and artificial pretense for accomplishing a goal. Yet somehow the exercise induces a psychological state perfect for making progress on the initial “80% phase” of creation. The punishing deadline forces all meta processing aside — no critic, no perfectionist, no planner, no lazy dreamer — and encourages the raw energy embodied by Nike’s Kirk-beats-Spock slogan “just do it”.

A number of Yahoo! products, services, and features were born on a Hack Day (here are two). Yoopick was too.

Why open?

Openness is one of only three overarching goals for Yahoo!. The other two goalsstarting point for users and must buy for advertisers — are in some sense incontrovertible, yet the openness goal reflects a riskier “if we build it they will come” stand that’s grounded in Yahoo!’s respect for and debt of gratitude to Internet culture. Open Hack Day, Hadoop, Pig, cloud computing, academic relations, publications, APIs, BOSS, SearchMonkey, YUI, Pipes, OpenSocial, and OpenID are among the many examples showing that Yahoo!’s commitment to openness is real. (Jeremy, RIP 2008, said it better.)

Update 2008/09/11: Blueprint mobile SDK and more Y! Open announcements (music, homepage, mail, ads)

Update 2008/09/26: CNET says: Yahoo Open: Finally, a real answer to Google. Also, Google spouse Kara Swisher gets defensive and rewrites history.

New Yahoo! News election dashboard

Cross-posted on midasoracle.org

The Yahoo! News Political Dashboard has re-launched for the general election stretch run of the 2008 US Presidential election.

Yahoo! News political dashboard for the 2008 US general Presidential election

From the main map you can see the status of the election in every state according to either polls or Intrade prediction market odds. Hover your mouse over a state to see current numbers or click on a state to see historical trends. On the side, help you can see search trends, price blogs, story news, and demographic breakdowns at national and state levels.

You can also “create your own scenario” by picking who will win in every state. You can save and share your prediction and compare against markets, polls, history, or celebrities. More on ycorpblog.

In the markets view, states are colored either bright red or bright blue, regardless of how close the race is in that state. To see a visualization that blends colors to reflect the tightness of the race, see electoralmarkets.com.

Yahoo! News also offers a candidate badge that you can display on your blog declaring your choice. The badge features national-level polls, prediction markets, search buzz, and money raised.

New York Computer Science and Economics Day (NYCE Day) October 3, 2008

We invite participants to the first New York Computer Science and Economics Day (NYCE Day), viagra 60mg October 3 2008, unhealthy at the New York Academy of Sciences, 7 World Trade Center.

NYCE Day is a gathering for people in the NYC metropolitan area with interests in auction algorithms, economics, game theory, e-commerce, marketing, and business to discuss common research problems and topics in a relaxed environment. The aim is to foster collaboration and the exchange of ideas.

The program features invited speakers Asim Ansari (Columbia), Susan Athey (Harvard), Constantinos Daskalakis (MIT), and Tuomas Sandholm (CMU), and a rump session with short contributed presentations.

You can indicate your interest in the event on upcoming.yahoo but official registration should go through NYAS.

Your participation and suggestions are greatly welcome. Please distribute this announcement to people and groups who may be interested.

Thanks,
NYCE Day Organizers
 Anindya Ghose, NYU
 S. Muthu Muthukrishnan, Google
 David Pennock, Yahoo!
 Sergei Vassilvitskii, Yahoo!

P.S. This is one week prior and in the same location as the Symposium on Machine Learning.

P.P.S. For those familiar, NYCE Day is inspired as a Right Coast version of BAGT.

P.P.P.S. The New York Academy of Sciences in a spectacular venue. See for yourself.

Pipes dream

If you haven’t played around with Yahoo! Pipes, I highly recommend it. It’s a usable and useful service that brings web mashups to the masses, making this favorite hacker pastime as easy as dragging objects around on the screen.

For example, it took me probably about ten minutes as a first-time user to create a map mashup showing Barack Obama’s upcoming campaign stops. I “piped” the output of Washington Post’s RSS feed to a location-extractor module that identifies and geo-codes place names and renders them on a map. Here’s a screenshot of the output:

Screen shot of Yahoo! Pipe: Barack Obama 2008 US Presidential Election Campaign Travel Map

The easiest way to get started is to find an existing Pipe, clone it, and modify it as your own. Using this feature, I cloned my Obama map and in about one minute had a McCain map too.

Pipes uses a visual programming interface. The idea of “programming by picture” (I recall playing with one in the 1980s) never took hold as a mainstream tool. However, as a metaphor for mashups, where to goal is to chain together a number of sources and services, the visual approach seems exactly right. The implementation in a browser is a feat of ajaxian magic that I still find remarkable, even as Yahoo! and others are commoditizing the art. I imagine that even non-programmers should have little trouble constructing their own Pipes. Here is a screenshot of the source “code” for my Obama map:

Source code of Yahoo! Pipe: Barack Obama 2008 US Presidential Election Campaign Travel Map

Pipes has dozens of useful modules, including user input, Yahoo! Search, Flickr, and regular expressions.

You can embed the Pipe on your own website with a single line of javascript. I did this with my Obama and McCain campaign travel maps here. Or you can grab the output as an XML feed to use however you wish.

Pipes allows you to create human-readable URLs (e.g., http://pipes.yahoo.com/oddhead/obamatravelmap), a nice touch.

The icing on the cake for me is how Pipes — unlike so many other web sites, including some on Yahoo! — treats me and my Opera browser like adults:

Yahoo! Pipes treats me and my Opera browser like adults

(BTW, Pipes seems to work fine on Opera).

Unfortunately, Daniel Raffel, one of the key founders of Yahoo! Pipes, left Yahoo!. However, the team seems to be strong and continues to innovate, so I’m hopeful this fantastic service will continue to improve and thrive.

Predict Olympic medal counts on Yoopick

We just added a new feature to Yoopick designed especially for Frenchmen Chris and Emile and citizens of nineteen other countries to place their swagor* on how many Olympic medals they think their country will win.

We’ve argued that the Yoopick interface is useful for predicting almost any kind of number, and since medal count is indeed a number, we thought we’d give it a try.

Besides, Lance told us it would be a good idea.

Sign up, play, enjoy, and don’t forget to tell us what you think!

Thanks,
Sharad Goel
David Pennock
Dan Reeves

* Scientific wild-ass guess, on record

Yoopick: Olympic medal count: Select

Yoopick: Olympics medal count: France: Make pick

Yoopick: A sports prediction contest on Facebook with a research twist

I’m happy to announce the public beta launch of Yoopick, a sports prediction contest with a twist.

You pick any range you think the score difference or point spread of the game will fall into, for example you might pick Pittsburgh wins by between 2 and 11 points.

Yoopick make your pick slider interface screenshot

The more your prediction is viewed as unlikely by others, and the more you’re willing to stake on your prediction, the more you stand to gain. Of course it’s all for fun: you win and lose bragging rights only.

You can play with and against your friends on Facebook.

You can settle a pick even before the game is over, much like selling a stock in the stock market. Depending on what other players have done in the interim, you may be left with a gain or loss. You gain if you were one of the first to pick a popular outcome.

If you run out of credit, you can “work off your debt” by helping to digitize old books via the recaptcha project.

Those are the highlights if you want to go play the game. If you’re interested in more details, read on…

Motivation, Design, and Research Goals

There are a great many sources of sports predictions, including expert communities, statistical number crunchers, bookmakers, and betting exchanges. Many of these sources are highly accurate, however they typically focus on predicting the outright or spread-adjusted winner of the game. Our goal is to obtain more information about the final score, including the relative likelihood of each point spread. For example, if our system is working, on average there should be more weight put on point spreads of 3 and 7 in NFL games than on 2,4,6, or 8.

We chose sports as a test domain to tap into the avid fan base and the armies of arm chair (and Aeron chair) prognosticators out there. However, the same approach should translate well to any situation where you’d like to predict a number, for example, the vote share of a politician or the volume of sales of your company’s widget. In addition to giving you the expected value of the number, our approach gives you the confidence or variance of the prediction — in fact, it gives you the entire probability distribution, or the likelihood of every possible value of the number.

Underneath the hood, Yoopick is a type of combinatorial prediction market where the possible outcomes are the values of the point spread, and each pick is a purchase of a bundle of outcomes in a given interval. We use Hanson’s logarithmic market scoring rules market maker to price the picks — that is, to set the risk/reward ratio. This pricing mechanism also determines the gain or loss when picks are settled early.

Wins and losses on Yoopick are measured in milliyootles, a social currency useful for expressing thanks.

Our market maker can — and we expect will — lose yootles on average. Stated another way, we expect players as a whole to gain on average. At the same time, we actively work to improve our market maker to limit its losses to control inflation in the game.

Because the outcomes of a game are tied together in a unified market, picks in one region automatically affect the price of picks in other regions in a logically consistent way. Players have considerable flexibility in how and what information they can inject into the market. In particular, players can replicate the standard picks like outright winner and spread-adjusted winner if they want, or they can go beyond to pick any interval of the point spread. No matter the form of the pick, all the information flows into a single market that aggregates everything in a unified prediction. In contrast, at venues from Wall Street to Churchill Downs to High Street to Las Vegas Boulevard, markets with many outcomes are usually split into independent one-dimensional markets.

Our goal is to test whether our market design is indeed able to elicit more information than traditional methods. We hope you have fun playing in our Petri dish.

Sharad Goel
David Pennock
Daniel Reeves
Prasenjit Sarkar
Cong Yu