Category Archives: search

Famous for 15 tweets

TV era: $quote = “In the future, everyone will be world-famous for 15 minutes”;
Search era: $quote =~ s/minutes/links/;
Social era: $quote =~ s/links/tweets/;

This month I’ve had five times more traffic than in any other month since I began blogging in Oct 2006, even during woblomo.

Why? I paid Paul Graham a compliment that struck a minor viral nerve, spreading through twitter, facebook, and blogs and sending over six thousand people my way on July 16 alone according to quantcast. Of course most have since dispersed.

Oddhead Blog traffic according to Quantcast July 2010

Power on the web flows backward through referrals to the sites that people begin their day with, the sources of traffic. Referrals from social media, unpredictable and bursty though they may be, are inexorably on the rise. As they grow, power will shift away from search engines, today’s referral kings. Who knows, this may embolden publishers to take previously unthinkable steps like voluntary delisting, further eroding the value of search. This has all been said before, perhaps best by Mark Cuban starting in 2008. It would be a blow to openness and hurt users, but would spark a fascinating battle.

Another meta note: I installed a new WordPress theme: Suffusion. It’s fantastic: endlessly configurable, bug free, fast, and well designed. I happened upon it by accident when WP 3.0 broke my old theme and I couldn’t be happier. Apparently written by a teenager, I donated to his beer, er, coffee fund.

Psst: WeatherBill doesn’t know New Jersey is the new Florida: Place your bets now

Quantifying New York’s 2009 June gloom using WeatherBill and Wolfram|Alpha

In the northeastern United States, scars are slowly healing from a miserably rainy June — torturous, according to the New York Times. Status updates bemoaned “where’s the sun?”, “worst storm ever!”, “worst June ever!”. Torrential downpours came and went with Florida-like speed, turning gloom into doom: “here comes global warming”.

But how extreme was the month, really? Was our widespread misery justified quantitatively, or were we caught in our own self-indulgent Chris Harrisonism, “the most dramatic rose ceremony EVER!”.

This graphic shows that, as of June 20th, New York City was on track for near-record rainfall in inches. But that graphic, while pretty, is pretty static, and most people I heard complained about the number of days, not the volume of rain.

I wondered if I could use online tools to determine whether the number of rainy days in June was truly historic. My first thought was to try Wolfram|Alpha, a great excuse to play with the new math engine.

Wolfram|Alpha queries for “rain New Jersey June 200Y” are detailed and fascinating, showing temps, rain, cloud cover, humidity, and more, complete with graphs (hint: click “More”). But they don’t seem to directly answer how many days it rained at least some amount. The answer is displayed graphically but not numerically (the percentage and days of rain listed appears to be hours of rain divided by 24). Also, I didn’t see how to query multiple years at a time. So, in order to test whether 2009 was a record year, I would have to submit a separate query for each year (or bypass the web interface and use Mathematica directly). Still, Wolfram|Alpha does confirm that it rained 3.8 times as many hours in 2009 as 2008, already one of the wetter months on record.

WeatherBill, an endlessly configurable weather insurance service, more directly provided what I was looking for on one page. I asked for a price quote for a contract paying me $100 for every day it rains at least 0.1 inches in Newark, NJ during June 2010. It instantly spat back a price: $694.17.



WeatherBill rainy day contract for June 2010 in Newark, NJ

It also reported how much the contract would have paid — the number of rainy days times $100 — every year from 1979 to 2008, on average $620 for 6.2 days. It said I could “expect” (meaning one standard deviation, or 68% confidence interval) between 3.9 and 8.5 days of rain in a typical year. (The difference between the average and the price is further confirmation that WeatherBill charges a 10% premium.)

Below is a plot of June rainy days in Newark, NJ from 1979 to 2009. (WeatherBill doesn’t yet report June 2009 data so I entered 12 as a conservative estimate based on info from Weather Underground.)


Number of rainy days in Newark, NJ from 1979-2009

Indeed, our gloominess was justified: it rained in Newark more days in June 2009 than any other June dating back to 1979.

Intriguingly, our doominess may have been justified too. You don’t have to be a chartist to see an upward trend in rainy days over the past decade.

WeatherBill seems to assume as a baseline that past years are independent unbiased estimates of future years — usually not a bad assumption when it comes to weather. Still, if you believe the trend of increasing rain is real, either due to global warming or something else, WeatherBill offers a temptingly good bet. At $694.17, the contract (paying $100 per rainy day) would have earned a profit in 7 of the last 7 years. The chance of that streak being a coincidence is less than 1%.

If anyone places this bet, let me know. I would love to, but as of now I’m roughly $10 million in net worth short of qualifying as a WeatherBill trader.

Where to find the Yahoo!-Google letter to the CFTC about prediction markets

At the Prediction Markets Summit1 last Friday April 24 2009, I mentioned that Yahoo! and Google jointly wrote a letter to the U.S. Commodity Futures Trading Commission encouraging the legalization of small-stakes real-money prediction markets, and that Microsoft had recently written its own letter in support of the effort. (The CFTC maintains a list of all public comments responding to their request for advice on regulating prediction markets.)

I told the audience that they could learn more by searching for “cftc yahoo google” in their favorite search engine, showing the Yahoo! Search results with MidasOracle’s coverage at the top.2

It turns out that was poor advice. 63.7% of the audience probably won’t find what they’re looking for using that search.3


Yahoo! versus Google search for "cftc yahoo google"

If some search engines don’t surface the MidasOracle post, I’m hoping they’ll find this.

And back to the effort to guide the CFTC: I hope other people and companies will join. The CFTC’s request for help itself displays a clear understanding of the science and practice of prediction markets and a real willingness to listen. The more organizations that speak out in support, the greater chance we have of convincing the CFTC to take action and open the door to innovation and experimentation.

1Which I hesitated to attend and host a reception for and now regret endorsing in any way.
2In September 2008, journalist Chris Masse uncovered the letter on the CFTC website before Google or Yahoo! had announced it. We should have known: Masse is extraordinarily skilled at finding anything relevant anywhere, and has been a tireless, invaluable (and unpaid) chronicler of all-things-prediction-markets for years now.
3Even Microsoft Live has the “right” result in position 3. Interestingly, Daniel Reeves got slightly different, presumably personalized, results in Google, even less excuse for not knowing what two MO junkies were looking for with that query.

Yahoo! Key Scientific Challenges student seed program

Yahoo! Research just published its list of key scientific challenges facing the Internet industry.

It’s a great resource for students to learn about the area and find meaty research problems. There’s also a chance for graduate students to earn $5000 in seed funding, work with Yahoo! Research scientists and data, and attend a summit of like-minded students and scientists.

The challenges cover search, machine learning, data management, information extraction, economics, social science, statistics, multimedia, and computational advertising.

Here’s the list of challenges from the algorithmic economics group, my group. We hope it provides a clear picture of the goals of our group and the areas where progress is most needed.

We look forward to supporting students who love a challenge and would like to join us in building the next-generation Internet.


Yahoo! Key Scientific Challenges Program 2009

The "predict flu using search" study you didn't hear about

In October, Philip Polgreen, Yiling Chen, myself, and Forrest Nelson (representing University of Iowa, Harvard, and Yahoo!) published an article in the journal Clinical Infectious Diseases titled “Using Internet Searches for Influenza Surveillance”.

The paper describes how web search engines may be used to monitor and predict flu outbreaks. We studied four years of data from Yahoo! Search together with data on flu outbreaks and flu-related deaths in the United States. All three measures rise and fall as flu season progresses and dissipates, as you might expect. The surprising and promising finding is that web searches rise first, one to three weeks before confirmed flu cases, and five weeks before flu-related deaths. Thus web searches may serve as a valuable advance indicator for health officials to spot the onset of diseases like the flu, complementary to other indicators and forecasts.

On November 11, the New York Times broke a story about Google Flu Trends, along with an unusual announcement of a pending publication in the journal Nature.

I haven’t read the paper, but the article hints at nearly identical results:

Google … dug into its database, extracted five years of data on those queries and mapped it onto the C.D.C.’s reports of influenzalike illness. Google found a strong correlation between its data and the reports from the agency…

Tests of the new Web tool … suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

To the reporter’s credit, he interviewed Phillip and the article does mention our work in passing, though I can’t say I’m thrilled with the way it was framed:

The premise behind Google Flu Trends … has been validated by an unrelated study indicating that the data collected by Yahoo … can also help with early detection of the flu.

giving (grudging) credit to Yahoo! data rather than Yahoo! people.

The story slashdigged around the blogomediasphere quickly and thoroughly, at one point reaching #1 on the nytimes.com most-emailed list. Articles and comments praise how novel, innovative, and outside-of-the-box the idea is. The editor in chief of Nature praised the “exceptional public health implications of [the Google] paper.”

I’m thrilled to see the attention given to the topic, and the Google team deserves a huge amount of credit, especially for launching a live web site as a companion to their publication, a fantastic service of great social value. That’s an idea we had but did not pursue.

In the business world, being first often means little. However in the world of science, being first means a great deal and can be the determining factor in whether a study gets published. The truth is, although the efforts were independent, ours was published first — and Clinical Infectious Diseases scooped Nature — a decent consolation prize amid the go-google din.

Update 2008/11/24: We spoke with the Google authors and the Nature editors and our paper is cited in the Google paper, which is now published, and given fair treatment in the associated Nature News item. One nice aspect of the Google study is that they identified relevant search terms automatically by regressing all of the 50 million most frequent search queries against the CDC flu data. Congratulations and many thanks to the Google/CDC authors and the Nature editors, and thanks everyone for your comments and encouragement.

Search engine futures!

I am happy to report that on my suggestion intrade has listed futures contracts for 2008 search engine market share.

Here is how they work:

A contract will expire according to the percentage share of internet searches conducted in the United States in 2008. For example, if 53.5% of searches conducted in the United States in 2008 are made using Google then the contract listed for Google will expire at 53.5…

…Expiry will be based on the United States search share rankings published by Nielson Online.

I think this could be a fascinating market because:

  • Search engine market share is very important to these major companies, with dramatic effects on their share prices.
  • Search engine market share is fluid, so far with Google growing inexorably. However, Microsoft has cash, determination, Internet Explorer, and the willingness to experiment. Ask.com has erasers, 3D, ad budgets, and The Algorithm. Yahoo!, second in market share, often tests equal or better than Google, and new features like Search Assist are impressive.
  • The media loves to write about it.
  • A major search company might use the market to hedge. Well, this seems far-fetched but you never know. Certainly, from an economic risk management standpoint it would seem to make a great deal of sense. (Here, as always on this blog, I speak on behalf of myself and not my company.)

Finally, I have to comment on how refreshingly easy the process was in working with intrade. They went from suggestion to implementation in a matter of days. It’s a shame that US-based companies are in contrast stuck in stultifying legal and regulatory mud.

Addendum 2008/01/26: Here are links to some market research reports:
Nielsen | ComScore | HitWise | Compete

(It seems that Nielsen Netratings homepage is down; getting 404 error at the moment)

Addendum 2008/03/07: If you prefer, you can now also bet on search share just for fun with virtual currency at play.intrade.com.

(Nielsen Netratings homepage is still down, now for over a month. It’s even more ridiculous given that their own Nielsen Online website points to this page.)