Category Archives: computer science

Yahoo! Key Scientific Challenges: Applications due March 11

Applications for Yahoo!’s third annual Key Scientific Challenges Program are due March 11. Our goal is to support students working in areas we feel represent the future of the Internet. If you’re a Ph.D. student working in one of the areas below, please apply!

We are thrilled to announce Yahoo!’s third annual Key Scientific Challenges Program. This is your chance to get an inside look at — and help tackle — the big challenges that Yahoo! and the entire Internet industry are facing today. As part of the Key Scientific Challenges Program you’ll gain access to Yahoo!’s world-class scientists, some of the richest and largest data repositories in the world, and have the potential to make a huge impact on the future of the Internet while driving your research forward.

THE CHALLENGES AREAS INCLUDE:

– Search Experiences
– Machine Learning
– Data Management
– Information Extraction
– Economics
– Statistics
– Multimedia
– Computational Advertising
– Social Sciences
– Green Computing
– Security
– Privacy

KEY SCIENTIFIC CHALLENGES AWARD RECIPIENTS RECEIVE:

– $5,000 unrestricted research seed funding which can be used for conference fees and travel, lab materials, professional society membership dues, etc.

– Access to select Yahoo! datasets

– The unique opportunity to collaborate with our industry-leading scientists

– An invitation to this summer’s exclusive Key Scientific Challenges Graduate Student Summit where you’ll join the top minds in academia and industry to present your work, discuss research trends and jointly develop revolutionary approaches to fundamental problems

CRITERIA: To be eligible, you must be currently enrolled in a PhD program at any accredited institution.

We’re accepting applications from January 24th – March 11th, 2011 and winners will be announced by mid April 2011.

To learn more about the program and how to apply, visit http://labs.yahoo.com/ksc.

Predictalot goes East: Introducing Predictopus for the ICC Cricket World Cup

Yahoo! India Predictopus logo

I’m thrilled to report that Predictalot had an Indian makeover, launching as Predictopus* for the ICC Cricket World Cup. The Yahoo! India team did an incredible job, leveraging the idea and some of the code base of Predictalot, yet making it their own. Predictopus is not a YAP — it lives right on the Yahoo! Cricket website, the official homepage for the ICC Cricket World Cup. They’re also giving away Rs 10 lakhs — or about $22,000 if my calculations are correct — in prizes. Everything is bigger in India, including the crowds and the wisdom thereof. It will be great to see the game played out on a scale that dwarfs our college basketball silliness in the US.

The Y! India team reused some of the backend code but redid the frontend almost entirely. To adapt the game to cricket, among other chores, we had to modify our simulation code to estimate the starting probabilities that any team would win against any other team, even in the middle of a game. (How likely is it for India to come back at home from down 100 runs with 10 overs left and 5 wickets lost? About 25%, we think.) These starting probabilities are then refined further by the game-playing crowds.

It’s great to see an experiment from Labs grow into a full-fledged product run by a real product team in Yahoo!, a prime example of technology transfer at its best. In the meantime, we (Labs) are still gunning for a relaunch of Predictalot itself for March Madness 2011, the second year in a row. Stay tuned.

2011/02/24 Update: An eye-catching India-wide ad campaign for predictopus is live, including homepage, finance, movies, OMG, answers, mail, everywhere! Oh, and one of the prizes is a Hyundai.

predictopus ad on Yahoo! India homepage 2011/02/24


* Yes, that’s a reference to legendary Paul the Octopus, RIP.

wise.gov: NSF and IARPA funding for collective intelligence

The US National Science Foundation’s Small Business Innovation Research program provides grants to to small businesses to fund “state-of-the-art, high-risk, high-potential innovation research proposals”.

In their current call for proposals, they explicitly ask for “I2b. Tools for facilitating collective intelligence”.

These are grants of up to US$150,000 with opportunity for more later I believe. The deadline is December 3, 2010! Good luck and (not so) happy Thanksgiving to anyone working on one of these proposals. I’m glad to help if I can.


The deadline for another US government program has passed, but should yield interesting results and may lead to future opportunities. In August, the Intelligence Advanced Research Projects Activity (IARPA, the intelligence community’s DARPA), which “invests in high-risk/high-payoff research programs” in military intelligence, solicited proposals for Aggregative Contingent Estimation, or what might be called wisdom-of-crowds methods for prediction:

The ACE Program seeks technical innovations in the following areas:

  • Efficient elicitation of probabilistic judgments, including conditional probabilities for contingent events.
  • Mathematical aggregation of judgments by many individuals, based on factors that may include past performance, expertise, cognitive style, metaknowledge, and other attributes predictive of accuracy.
  • Effective representation of aggregated probabilistic forecasts and their distributions.

The full announcement is clear, detailed, and well thought out. I was impressed with the solicitors’ grasp of research in the field, an impression no doubt bolstered by the fact that some of my own papers are cited 😉 . Huge hat tip to Dan Goldstein for collating these excerpts:

The accuracy of two such methods, unweighted linear opinion pools and conventional prediction markets, has proven difficult to beat across a variety of domains.2 However, recent research suggests that it is possible to outperform these methods by using data about forecasters to weight their judgments. Some methods that have shown promise include weighting forecasters’ judgments by their level of risk aversion, cognitive style, variance in judgment, past performance, and predictions of other forecasters’ knowledge.3 Other data about forecasters may be predictive of aggregate accuracy, such as their education, experience, and cognitive diversity. To date, however, no research has optimized aggregation methods using detailed data about large numbers of forecasters and their judgments. In addition, little research has tested methods for generating conditional forecasts.

2 See, e.g., Tetlock PE, Expert Political Judgment (Princeton, NJ: Princeton University Press, 2005), 164-88; Armstrong JS, “Combining Forecasts,” in JS Armstrong, ed., Principles of Forecasting (Norwell, MA: Kluwer, 2001), 417-39; Arrow KJ, et al., “The Promise of Prediction Markets,” Science 2008; 320: 877-8; Chen Y, et al., “Information Markets Vs. Opinion Pools: An Empirical Comparison,” Proceedings of the 6th ACM Conference on Electronic Commerce, Vancouver BC, Canada, 2005.

3 See, e.g., Dani V, et al., “An empirical comparison of algorithms for aggregating expert predictions,” Proc. 22nd Conference on Uncertainty in Artificial Intelligence, UAI, 2006; Cooke RM, ElSaadany S, Huang X, “On the performance of social network and likelihood-based expert weighting schemes,” Reliability Engineering and System Safety 2008; 93:745-756; Ranjan R, Gneiting T, “Combining probability forecasts,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010; 72(1): 71-91.

[Examples:]

  • Will the incumbent party win the next presidential election in Country X?
  • Will the incumbent party win the next presidential election in Country X?
  • When will Country X hold its next parliamentary elections?
  • How many cell phones will be in use globally by 12/31/11?
  • By how much will the GDP of Country X increase from 1/1/11 to 12/31/11?
  • Will Country X default on its sovereign debt in 2011?
  • If Country X defaults on its sovereign debt in 2011, what will be the growth rate in the Eurozone in 2012?

Elicitation – Advances Sought
The ACE Program seeks methods to elicit judgments from individual forecasters on:

  • Whether an event will or will not occur
  • When an event will occur
  • The magnitude of an event
  • All of the above, conditioned on another set of events or actions
  • The confidence or likelihood a forecaster assigns to his or her judgment
  • The forecaster’s rationale for his or her judgment, as well as links to background information or evidence, expressed in no more than a couple of lines of text
  • The forecaster’s updated judgments and rationale

The elicitation methods should allow prioritization of elicitations, continuous updating of forecaster judgments and rationales, and asynchronous elicitation of judgments from more than 1,000 geographically-dispersed forecasters. While aggregation methods, detailed below, should be capable of generating probabilities, the judgments elicited from forecasters can but need not include probabilities.

Challenges include:

  • Some forecasters will be unaccustomed to providing probabilistic judgments
  • There has been virtually no research on methods to elicit conditional forecasts
  • Elicitation should require a minimum of time and effort from forecasters; elicitation should require no more than a few minutes per elicitation per forecaster
  • Training time for forecasters will be limited, and all training must be delivered within the software
  • Rewards for participation, accuracy, and reasoning must be non-monetary and of negligible face value (e.g., certificates, medals, pins)

Book of Odds is serious fun

In the Book of Odds, you can find everything from the odds an astronaut is divorced (1 in 15.54) to the odds of dying in a freak vending machine accident (1 in 112,000,000).

Book of Odds is, in their own words, “the missing dictionary, one filled not with words, but with numbers – the odds of everyday life.”

I use their words because, frankly I can’t say it better. The creators are serious wordsmiths. Their name itself is no exception. “Book of Odds” strikes the perfect chord: memorable and descriptive with a balance of authority and levity. On the site you can find plenty of amusing odds about sex, sports, and death, but also odds about health and life that make you think, as you compare the relative odds of various outcomes. Serious yet fun, in the grand tradition of the web.

I love their mission statement. They seek both to change the world — by establishing a reliable, trustworthy, and enduring new reference source — and to improve the world — by educating the public about probability, uncertainty, and decision making.

By “odds”, they do not mean predictions.

Book of Odds is not in the business of predicting the future. We are far too humble for that…

Odds Statements are based on recorded past occurrences among a large group of people. They do not pretend to describe the specific risk to a particular individual, and as such cannot be used to make personal predictions.

In other words, they report how often some property occurs among a group of people, for example the fraction all deaths caused by vending machines, not how likely you, or anyone in particular, are to die at the hands of a vending machine. Presumably if you don’t grow enraged at uncooperative vending machines or shake them wildly, you’re safer than the 1 in 112,000,000 stated odds. A less ambiguous (but clunky) name for the site would be “Book of Frequencies”.

Sometimes the site’s original articles are careful about this distinction between frequencies and predictions but other times less so. For example, this article says that your odds of becoming the next American Idol are 1 in 103,000. But of course the raw frequency (1/number-of-contestants) isn’t the right measure: your true odds depend on whether you can sing.

Their statement of What Book of Odds isn’t is refreshing:

Book of Odds is not a search-engine, decision-engine, knowledge-engine, or any other kind of engine…so please don’t compare us to Googleâ„¢. We did consider the term “probability engine” for about 25 seconds, before coming to our senses…

Book of Odds is never finished. Every day new questions are asked that we cannot yet answer…

A major question is whether consumers want frequencies, or if they want predictions. If I had to guess, I’d (predictably) say predictions — witness Nate Silver and Paul the Octopus. (I’ve mused about using *.oddhead.com to aggregate predictions from around the web.)

The site seems in need of some SEO. The odds landing pages, like this one, don’t seem to be comprehensively indexed in Bing or Google. I believe this is because there is no natural way for users (and thus spiders) to browse (crawl) them. (Is this is a conscious choice to protect their data? I don’t think so: the landing pages have great SEO-friendly URLs and titles.) The problem is exacerbated because Book of Odds own custom search is respectable but, inevitably, weaker than what we’ve become accustomed to from the major search engines.

Book of Odds launched in 2009 with a group of talented and well pedigreed founders and a surprisingly large staff. They’ve made impressive strides since, adding polls, a Yahoo! Application, an iGoogle gadget, regular original content, and a cool visual browser that, like all visual browsers, is fun but not terribly useful. They’ve won a number of awards already, including “most likely company to be a household name in five years”. That’s a low-frequency event, though Book of Odds may beat the odds. Or have some serious fun trying.

Third Annual New York Computer Science and Economics Day

Join us at NYCE Day 2010 on Friday October 15 at the lovely New York Academy of Sciences, a gathering of “researchers in the larger New York metropolitan area with interests in Computer Science, Economics, Marketing and Business and a common focus in understanding and developing the economics of internet activity.”

If you’d like to speak in the rump session, submit your topic by Monday September 13: details on the meeting webpage. The rump session is a series of five minute talks by a variety of speakers including students, and often one of the most interesting part of the program.

Three Crowd-ed events this fall

Research and Analysis of Tail Phenomenon Symposium

August 20, 2010, Sunnyvale, CA

The last decade has witnessed the emergence of enormous scale artifacts resulting from the independent action of hundreds of millions of people; for example, web repositories, social networks, mobile communication patterns, and consumption in “limitless” stores… the first Research and Analysis of Tail phenomena Symposium (RATS)… will explore the different computational, statistical, and modeling problems related to tail phenomena… We are particularly encouraging summer interns in any of the Bay Area research centers to join us in the event.
We will start with a video welcome by Chris Anderson (Wired), followed by a series of invited talks by Michael Mitzenmacher (Harvard), Aaron Clauset (Univ. of Colorado), Neel Sundaresan (eBay), Sharad Goel (Yahoo! Research, NY) and Michael Schwarz (Yahoo! Research, CA).

We invite proposals for short (20 minute) talks from students and researchers working in the area.

CrowdCof2010: 1st Annual Conference on the Future of Distributed Work

October 4, 2010, San Francisco, CA

Were you crowdsourcing before it was cool? We want to hear about your projects.

We are inviting submissions on all topics regarding crowdsourcing, including:

  • Past, present, and future of crowdsourcing
  • Quality assurance and metrics
  • Social and economic implications of crowdsourcing
  • Task design/Worker incentives
  • Innovative projects, experiments, and applications
  • Submission Guidelines

Deadline: Sept. 1

CrowdConf will bring together researchers, technologists, outsourcing entrepreneurs, legal scholars, and artists for the first time to discuss how crowdsourcing is transforming human computation and the future of work.

Confirmed Speakers:
Sharon Chirella: Vice President, Amazon Mechanical Turk
Tim Ferriss : Author, The 4-Hour Work Week
David Alan Grier: Author, When Computers Were Human
Barney Pell: Partner, Search Strategist, and Evangelist, Microsoft
Maynard Webb: CEO, LiveOps
Jonathan Zittrain: Professor of Law and Computer Science, Harvard

Computational Social Science and the Wisdom of Crowds Workshop at NIPS 2010

December 10th or 11th, 2010, Whistler, Canada

We welcome contributions on theoretical models, empirical work, and everything in between, including but not limited to:

  • Automatic aggregation of opinions or knowledge
  • Prediction markets / information markets
  • Incentives in social computation (e.g., games with a purpose)
  • Studies of events and trends (e.g., in politics)
  • Analysis of and experiments on distributed collaboration and consensus-building, including crowdsourcing (e.g., Mechanical Turk) and peer-production systems (e.g., Wikipedia and Yahoo! Answers)
  • Group dynamics and decision-making
  • Modeling network interaction content (e.g., text analysis of blog posts, tweets, emails, chats, etc.)
  • Social networks

[Covers] computational social science… [and] social computing… with an emphasis on the role of
machine learning…

Deadline for submissions: Friday October 8, 2010

Where is the betting market for P=NP when you need it?

HP research scientist Vinay Deolalikar has constructed the most credible proof yet of the most important open question in computer science. If his proof is validated (and there are extremely confident skeptics as you’ll see) he proved that P≠NP, or loosely speaking that some of the most widespread computational problems — everything from finding a good layout of circuits on a chip to solving Sudoku puzzles to computing LMSR prices in a combinatorial market — cannot be solved efficiently. Most computer scientists believe that P≠NP, but after decades of some of the smartest people in the world trying, and despite the promise of worldwide accolades and a cool $1 million from the Clay Mathematics Institute, no one has been able to prove it, until possibly now.

Scott Aaronson is a skeptic, to say the least. He made an amazing public bet to demonstrate his confidence. He pledged that if Deolalikar wins the $1 million prize, Aaronson will top it off with $200,000 of own money. Even more amazing: Aaronson made the bet without even reading the proof. [Update: I should have said “without reading the proof in detail”: see comments] (Perhaps more amazing still: a PC World journalist characterized Aaronson’s stance as “noncommittal” without a drip of sarcasm.) [Hat tip to Dan Reeves.]

As Aaronson explains:

The point is this: I really, really doubt that Deolalikar’s proof will stand. And while I haven’t studied his long, interesting paper and pinpointed the irreparable flaw… I have a way of stating my prediction that no reasonable person could hold against me: I’ve literally bet my house on it.

Aaronson is effectively offering infinite odds [Update: actually more like 2000/1 odds: see comments] that the question “P=NP?” will not be resolved in the near future. Kevin McCurley and Ron Fagin made a different (conditional) bet: Fagin offered 5/1 odds (at much lower stakes) that if the question is resolved in 2010, the answer will be P≠NP. Bill Gasarch says that he, like Aaronson, would bet that the proof is wrong… if only he were a betting man. Richard Lipton recounts a discussion about the odds of P=NP with Ken Steiglitz.

But beyond a few one-off bets and declarations, where is the central market where I can bet on P=NP? I don’t even necessarily want in on the action, I just want the odds. (Really!)

My first thought was the Foresight Exchange. It does list one related contract — Good 3SAT Algorithm by 2020 — which should presumably go to zero if Deolalikar’s proof is correct. It hasn’t budged much, consistent with skepticism (or with apathy). My second thought was the PopSci Predictions Exchange (PPX), though sadly it has retired. InklingMarkets has a poll about whether P=NP will be resolved before the other Clay Institute prize questions, but not about how it will be resolved or the odds of it happening. (The poll is one of several markets sponsored by the Woodrow Wilson Center’s Science and Technology Innovation Program — hat tip to Vince Conitzer.) I don’t see anything at longbets, and anyway longbets doesn’t provide odds despite it’s name.

In 1990 Robin Hanson provocatively asked: Could gambling save science?. That question and his thoughtful answers inspired a number of people, including me, to study prediction markets. Indeed, the Foresight Exchange was built largely in his image. P=NP seems one of the most natural claims for any scitech prediction market.

All these years later, when I really need my fix, I can’t seem to get it!


2010/08/14 Update: Smarkets comes the closest: they have real-money betting on whether P=NP will be resolved before the other Clay Institute prize questions. They report a 53% chance as of 2010/08/14 (for the record, I would bet against that). What’s missing is when the award might happen and how the question might be resolved, P=NP or P≠NP. I also don’t see a graph to check whether Deolalikar’s proof had any effect.

If it wasn’t clear in my original post, I found Aaronson’s bet incredibly useful and I am thrilled he did it. I believe he should be commended: his bet was exactly what more scientists should do. Scientists should express their opinion, and betting is a clear, credible, and quantitative way to express it. It would be as shame if some of the negative reactions caused him or others not to make similar bets in the future.

I just wish there were a central place to make bets on scientific claims and follow the odds in the vision of Robin Hanson, rather than every scientist having to declare their bet on their own individual blogs.

Famous for 15 tweets

TV era: $quote = “In the future, everyone will be world-famous for 15 minutes”;
Search era: $quote =~ s/minutes/links/;
Social era: $quote =~ s/links/tweets/;

This month I’ve had five times more traffic than in any other month since I began blogging in Oct 2006, even during woblomo.

Why? I paid Paul Graham a compliment that struck a minor viral nerve, spreading through twitter, facebook, and blogs and sending over six thousand people my way on July 16 alone according to quantcast. Of course most have since dispersed.

Oddhead Blog traffic according to Quantcast July 2010

Power on the web flows backward through referrals to the sites that people begin their day with, the sources of traffic. Referrals from social media, unpredictable and bursty though they may be, are inexorably on the rise. As they grow, power will shift away from search engines, today’s referral kings. Who knows, this may embolden publishers to take previously unthinkable steps like voluntary delisting, further eroding the value of search. This has all been said before, perhaps best by Mark Cuban starting in 2008. It would be a blow to openness and hurt users, but would spark a fascinating battle.

Another meta note: I installed a new WordPress theme: Suffusion. It’s fantastic: endlessly configurable, bug free, fast, and well designed. I happened upon it by accident when WP 3.0 broke my old theme and I couldn’t be happier. Apparently written by a teenager, I donated to his beer, er, coffee fund.

How high can high-level programming go?

Our first prototype of Predictalot was written mainly in Mathematica with a rudimentary web front end that Dan Reeves put together (with editable source code embedded right on the page via etherpad!). It proved the concept but was ugly and horribly slow.

Screenshot of pre-alpha Predictalot: Mathematica + etherpad + web

Dan and I built a second prototype in PHP. It was even uglier but about twice as fast and somewhat useable on a small scale (at least by user willing/able to formulate their own propositions in PHP). Yet it still wasn’t good enough to serve thousands of users accustomed to simplicity and speed.

Screenshot of alpha Predictalot: PHP + YAP

The final live version of Predictalot was not only pleasing to the eye — thanks to Sudar, Navneet, and Tom — but pleasingly fast, due almost entirely to the heroic efforts of Mridul M who wrote a mini PHP parser inside of java and baked in a number of datbase and caching optimizations.

Screenshot of live beta Predictalot: Java + Javascript + YAP

It seems that high-level programming languages haven’t climbed high enough. To field a fairly constrained web app that looks good and works well, we benefit greatly from having at least three specialists, for the app front end, the app back end, and the platform back end (apache, security, etc.).

Here’s a challenge to the programming language community: anything I can whip up in Mathematica I should be able to run at web scale. Math majors should be able to create Predictalot. Dan and I can mock up the basic idea of Predictalot but it still takes tremendous talent, time, and effort to turn it into a professional looking and well behaved system.

The core market math of Predictalot — a combinatorial version of Hanson’s LMSR market maker — involves summing thousands of ex terms. Here we are in the second decade of the new millenium and in order for a sum of exponentials to execute quickly and without numeric overflow, we had to work out a transformation to conduct all our summations in log space. In other words, programming still requires me to think about how my machine represents my number. That shouldn’t qualify as “high level” thinking in 2010.

I realize I may be naively asking too much. Solving the challenge fully is AI-complete. Still, while we’re making impressive strides in artificial intelligence, programming feels much the same today as it did twenty years ago. It still requires learning specialized tricks, arcane domain knowledge, and optimizations honed only over years of experience, and the most computationally intensive applications still require that extra compilation step (i.e., it’s still often necessary to use C or Java over PHP, Perl, Python, or Ruby).

Some developments hardly seem like progress. Straightforward HTML markup like border=2 has given way to unweildy CSS like style=”border:2px solid black”. In some ways the need for specialized domain knowledge has gone up, not down.

Visual programming is an oft-tried, though so far largely unsuccessful way to lower the barrier to programming. Pipes was a great effort, but YQL proved more useful and popular. Google just announced new visual developer tools for Android in an attempt to bring mobile app creation to the masses. Content management systems are getting better and broader every day, allowing more and more complex websites to be built with less time touching source code.

I look forward to the day that computational thinking can suffice to create the majority of computational objects. I suspect that day is still fifteen to twenty years away.

The most prescient footnote ever

Back in 2004, who could have imagined Apple’s astonishing rise to overtake Microsoft as the most valuable tech company in the world?

At least one person.

Paul Graham wins the award for the most prescient parenthetical statement inside a footnote ever.

In footnote 14 of Chapter 5 (p. 228) of Graham’s classic Hackers and Painters, published in 2004, Graham asks “If the the Mac was so great why did it lose?”. His explanation ends with this caveat, in parentheses:

And it hasn’t lost yet. If Apple were to grow the iPod into a cell phone with a web browser, Microsoft would be in big trouble.

Wow. Count me impressed.

To find this quote, search inside the book for “ipod cell”.

To get into a 2004 midset, look here and here.