Category Archives: artificial intelligence

Last call: Postdoc positions at Microsoft Research NYC

Microsoft Research New York City seeks outstanding applicants for 2-year postdoctoral researcher positions. We welcome applicants with a strong academic record in one of the following areas:

We will also consider applicants in other focus areas of the lab, including information retrieval, and behavioral & empirical economics. Additional information about these areas is included below. Please submit all application materials by January 11, 2013 for full consideration. Instructions are here.


COMPUTATIONAL SOCIAL SCIENCE

With an increasing amount of data on every aspect of our daily activities — from what we buy, to where we travel, to who we know — we are able to measure human behavior with precision largely thought impossible just a decade ago. Lying at the intersection of computer science, statistics and the social sciences, the emerging field of computational social science uses large-scale demographic, behavioral and network data to address longstanding questions in sociology, economics, politics, and beyond. We seek postdoc applicants with a diverse set of skills, including experience with large-scale data, scalable statistical and machine learning methods, and knowledge of a substantive social science field, such as sociology, economics, psychology, political science, or marketing.

ONLINE EXPERIMENTAL SOCIAL SCIENCE

Online experimental social science involves using the web, including crowdsourcing platforms such as Amazon’s Mechanical Turk, to study human behavior in “virtual lab” environments. Among other topics, virtual labs have been used to study the relationship between financial incentives and performance, the honesty of online workers, advertising impact as a function of exposure time, the implicit cost of “bad ads,” the testing of graphical user interfaces eliciting probabilistic information and also the relationship between network structure and social dynamics, related to social phenomena such as cooperation, learning, and collective problem solving. We seek postdoc applicants with a diverse mix of skills, including awareness of the theoretical and experimental social science literature, and experience with experimental design, as well as demonstrated statistical modeling and programming expertise. Specific experience running experiments on Amazon’s Mechanical Turk or related crowdsourcing websites, as well as managing virtual participant pools is also desirable, as is evidence of UI design ability.

ALGORITHMIC ECONOMICS AND MARKET DESIGN

Market design, the engineering arm of economics, benefits from an understanding of computation: complexity, algorithms, engineering practice, and data. Conversely, computer science in a networked world benefits from a solid foundation in economics: incentives and game theory. Scientists with hybrid expertise are crucial as social systems of all types move to electronic platforms, as people increasingly rely on programmatic trading aids, as market designers rely more on equilibrium simulations, and as optimization and machine learning algorithms become part of the inner loop of social and economic mechanisms. We seek applicants who embody a diverse mix of skills, including a background in computer science (e.g., artificial intelligence or theory) or related field, and knowledge of the theoretical and experimental economics literature. Experience building prototype systems, and a comfort level with modern programming paradigms (e.g., web programming and map-reduce) are also desirable.

MACHINE LEARNING

Machine learning is the discipline of designing efficient algorithms for making accurate predictions and optimal decisions in the face of uncertainty. It combines tools and techniques from computer science, signal processing, statistics and optimization. Microsoft offers a unique opportunity to work with extremely diverse data sources, both big and small, while also offering a very stimulating environment for cutting-edge theoretical research. We seek postdoc applicants who have demonstrated ability to do independent research, have a strong publication record at top research venues and thrive in a multidisciplinary environment.

Congratulations Pete Wurman and Kiva Systems, a bellwether of the automated economy

Congratulations to my academic sibling, friend, and Detroit Red Wings fan Pete Wurman, whose company Kiva Systems just became Amazon’s second largest acquisition ever.

In short, Kiva Systems designs, builds, and operates intelligent autonomous robots to pick and stow products in giant distribution centers for companies like Toys R Us, Walgreens, and Zappos. (The latter is an Amazon subsidiary.) The best way to understand Kiva Systems is to watch their robots in action: an amazing sight to see. Here is a clip from IEEE Spectrum:

In 2003, I remember sitting in the back seat of a car with Pete, him excitedly demo-ing the concept to me via an animated simulation on his laptop, little dots representing robots weaving in and out of each on the screen. (Pete’s laptop was a mac. In grad school, Pete was every bit the Apple fan I was and more. He and I programmed HyperCard and Newton together. Pete advocated for simplicity in design before it was cool. When I briefly switched to Windows, he never wavered.)

By 2006, the robots were real. Pete took me and our shared academic parent, Mike Wellman (who I believe also played an early role in the company), on a tour. Dots on a laptop had become squat orange robots receiving orders, fetching products, avoiding each other, seeking power, and otherwise navigating around a complex environment with computational minds of their own. The designs were inspired: for example, to lift a box, the robot spun underneath it to extend a corkscrew so that the product wouldn’t get jarred. They even added noise in the robots’ paths, so their wheels wouldn’t wear grooves in the floor (call it a floorsaver algorithm).

By coincidence, a few weeks ago, I was speaking to someone from Amazon who works on optimizing the way people (ha!) retrieve, store, and pack items in their distribution centers and I mentioned Pete’s company. He said “until that happens” he would focus on optimizing their current systems. Little did we (or at least I) know how quickly “until” would come.

Kiva Systems isn’t just an incredibly cool company run by amazing people. It’s a harbinger of things to come as the world moves inexorably toward an Automated Economy.

By the way, if you’re worried that robots will take jobs away from people, don’t. The world is a better place with mechanical devices doing mechanical tasks, leaving people to do more interesting and creative things, for example turning crazy ideas into companies. Remember that the purpose of jobs is to produce valuable things and improve the world. Despite political rhetoric, jobs are not an end to themselves. Otherwise, we should all be happy digging ditches and filling them back up, or pumping gas for people who would rather do it themselves. Think about where society should go in fifty or a hundred years when automation can handle more and more tasks. It would be a real shame if at that time people were still “working for a living” in jobs they don’t enjoy simply for the sake of keeping them occupied.

2011 ACM Conference on Electronic Commerce and fifteen other CS conferences in San Jose

If you’re in the Bay Area, come join us at the 2011 ACM Conference on Electronic Commerce, June 5-9 in San Jose, CA, one of sixteen conferences that comprise the ACM Federated Computing Research Conference, the closest thing we have to a unified computer research conference.

The main EC’11 conference includes talks on prediction markets, crowdsourcing, auctions, game theory, finance, lending, and advertising. The papers span a spectrum from theoretical to applied. If you want evidence of the latter, look no further than the roster of corporate sponsors: eBay, Facebook, Google, Microsoft, and Yahoo!.

There are also a number of interesting workshops and tutorials in conjunction with EC’11 this year, including:

Workshops:

  • 7th Ad Auction Workshop
  • Workshop on Bayesian Mechanism Design
  • Workshop on Social Computing and User Generated Content
  • 6th Workshop on Economics of Networks, Systems, and Computation
  • Workshop on Implementation Theory

Tutorials:

  • Bayesian Mechanism Design
  • Conducting Behavioral Research Using Amazon’s Mechanical Turk
  • Matching and Market Design
  • Outside Options in Mechanism Design
  • Measuring Online Advertising Effectiveness

The umbrella FCRC conference includes talks by 2011 Turing Award winner Leslie G. Valiant, IBM Watson creator David A. Ferrucci, and CMU professor, CAPTCHA co-inventor, and Games With a Purpose founder Luis von Ahn.

Hope to see many of you there!

There’s a new oracle in town

Cantor Gaming mobile device for in-running bettingLast January, a few friends and I visited the sportsbook at the M Casino in Las Vegas, one of several sportsbooks now run by Cantor Gaming, a division of Wall Street powerhouse Cantor Fitzgerald. Traditional sportsbooks stop taking bets when the sporting event in question begins. In contrast, Cantor allows “in-running betting”, a clunky phrase that means you can bet during the event: as touchdowns are scored, interceptions are made, home runs are stolen, or buzzers are beaten. Cantor went a step further and built a mobile device you can carry around with you anywhere in the casino to place your bets while watching games on TV, drink in hand. (Cantor also runs spread-betting operations in the UK and bought the venerable Hollywood Stock Exchange prediction market with the goal of turning it into a real financial exchange; they nearly succeeded, obtaining the green light from the CFTC before being shut down by lobbyists, er, Congress.)

Back to the device. It’s pretty awesome. It’s a Windows tablet computer with Cantor’s custom software — pretty well designed considering this is a financial firm. You can bet on the winner, against the spread, or on one-off propositions like whether the offensive team in an NFL game will get a first down, or whether the current drive will end with a punt, touchdown, field goal, or turnover. The interface is pretty nice. You select the type of bet you want, see the current odds, and choose how much you want to bet from a menu of common options: $5, $10, $50, etc. You can’t bet during certain moments in the game, like right before and during a play in football. When I was there only one game was available for in-running betting. Still, it’s instantly gratifying and — I hate to use this word — addictive. Once my friend saw the device in action, he instantly said “I’m getting one of those”.

When I first heard of Cantor’s foray into sports betting, I assumed they would build “betfair indoors”, meaning an exchange that simply matches bettors with each other and takes no risk of its own. I was wrong. Cantor’s mechanism is pretty clearly an intelligent automated market maker that mixes prior knowledge and market forces, much like my own beloved Predictalot minus the combinatorial aspect. Together with their claim to welcome sharps, employing a market maker means that Cantor is taking a serious risk that no one will outperform their prior “too much”, but the end result is a highly usable and impressively fun application. Kudos to Cantor.


P.S. Cantor affectionately dubbed their oracle-like algorithm for computing their prior as “Midas”, proving this guy has a knack for thingnaming.

We’re baaack: Predictalot is here for March Madness 2011

March Madness is upon us and Predictalot, the crazy game that I and others at Yahoo! Labs invented, is live again and taking your (virtual) bets. Filling out brackets is so 2009. On Predictalot, you can compose your own wild prediction, like there will be exactly seven upsets in the opening round, or neither Duke, Kentucky, Kansas, nor Pittsburgh will make the Final Four. You’ll want your laptop out and ready as you watch the games — you can buy and sell your predictions anytime, like stocks, as the on-court action moves for or against you.

Predictalot v0.3 is easier to play. We whittled down the ‘Make Prediction’ process from four steps to just two. Even if you don’t want to wager, with one click come check out the projected odds of nearly any crazy eventuality you can dream up.

Please connect to facebook and/or twitter to share your prediction prowess with your friends and followers. You’ll earn bonus points and my eternal gratitude.

The odds start off at our own prior estimate based on seeds and (new this year) the current scores of ongoing games, but ultimately settle to values set by “the crowd” — that means you — as predictions are bought and sold.

Yahoo! Labs Predictalot version 0.3 overview tab screenshot

For the math geeks, Predictalot is a combinatorial prediction market with over 9 quintillion outcomes. Prices are computed using an importance sampling approximation of a #P-hard problem.

What kind of information can we collect that a standard prediction market cannot? A standard market will say that Texas A&M is unlikely to win the tournament. Our market can say more. Yes, A&M is unlikely to reach the Final Four and even more unlikely to win apriori, but given that they somehow make it to the semifinals in Houston, less than a two hour drive from A&M’s campus, their relative odds may increase due to a home court advantage.

Here’s another advantage of the combinatorial setup. A standard bookmaker would never dare to offer the same millions of bets as Predictalot — they would face nearly unlimited possible losses because, by tradition, each bet is managed independently. By combining every bet into a single unified marketplace, we are able to limit the worst-case (virtual) loss of our market maker to a known fixed constant.

Yahoo! Key Scientific Challenges: Applications due March 11

Applications for Yahoo!’s third annual Key Scientific Challenges Program are due March 11. Our goal is to support students working in areas we feel represent the future of the Internet. If you’re a Ph.D. student working in one of the areas below, please apply!

We are thrilled to announce Yahoo!’s third annual Key Scientific Challenges Program. This is your chance to get an inside look at — and help tackle — the big challenges that Yahoo! and the entire Internet industry are facing today. As part of the Key Scientific Challenges Program you’ll gain access to Yahoo!’s world-class scientists, some of the richest and largest data repositories in the world, and have the potential to make a huge impact on the future of the Internet while driving your research forward.

THE CHALLENGES AREAS INCLUDE:

– Search Experiences
– Machine Learning
– Data Management
– Information Extraction
– Economics
– Statistics
– Multimedia
– Computational Advertising
– Social Sciences
– Green Computing
– Security
– Privacy

KEY SCIENTIFIC CHALLENGES AWARD RECIPIENTS RECEIVE:

– $5,000 unrestricted research seed funding which can be used for conference fees and travel, lab materials, professional society membership dues, etc.

– Access to select Yahoo! datasets

– The unique opportunity to collaborate with our industry-leading scientists

– An invitation to this summer’s exclusive Key Scientific Challenges Graduate Student Summit where you’ll join the top minds in academia and industry to present your work, discuss research trends and jointly develop revolutionary approaches to fundamental problems

CRITERIA: To be eligible, you must be currently enrolled in a PhD program at any accredited institution.

We’re accepting applications from January 24th – March 11th, 2011 and winners will be announced by mid April 2011.

To learn more about the program and how to apply, visit http://labs.yahoo.com/ksc.

Predictalot goes East: Introducing Predictopus for the ICC Cricket World Cup

Yahoo! India Predictopus logo

I’m thrilled to report that Predictalot had an Indian makeover, launching as Predictopus* for the ICC Cricket World Cup. The Yahoo! India team did an incredible job, leveraging the idea and some of the code base of Predictalot, yet making it their own. Predictopus is not a YAP — it lives right on the Yahoo! Cricket website, the official homepage for the ICC Cricket World Cup. They’re also giving away Rs 10 lakhs — or about $22,000 if my calculations are correct — in prizes. Everything is bigger in India, including the crowds and the wisdom thereof. It will be great to see the game played out on a scale that dwarfs our college basketball silliness in the US.

The Y! India team reused some of the backend code but redid the frontend almost entirely. To adapt the game to cricket, among other chores, we had to modify our simulation code to estimate the starting probabilities that any team would win against any other team, even in the middle of a game. (How likely is it for India to come back at home from down 100 runs with 10 overs left and 5 wickets lost? About 25%, we think.) These starting probabilities are then refined further by the game-playing crowds.

It’s great to see an experiment from Labs grow into a full-fledged product run by a real product team in Yahoo!, a prime example of technology transfer at its best. In the meantime, we (Labs) are still gunning for a relaunch of Predictalot itself for March Madness 2011, the second year in a row. Stay tuned.

2011/02/24 Update: An eye-catching India-wide ad campaign for predictopus is live, including homepage, finance, movies, OMG, answers, mail, everywhere! Oh, and one of the prizes is a Hyundai.

predictopus ad on Yahoo! India homepage 2011/02/24


* Yes, that’s a reference to legendary Paul the Octopus, RIP.

wise.gov: NSF and IARPA funding for collective intelligence

The US National Science Foundation’s Small Business Innovation Research program provides grants to to small businesses to fund “state-of-the-art, high-risk, high-potential innovation research proposals”.

In their current call for proposals, they explicitly ask for “I2b. Tools for facilitating collective intelligence”.

These are grants of up to US$150,000 with opportunity for more later I believe. The deadline is December 3, 2010! Good luck and (not so) happy Thanksgiving to anyone working on one of these proposals. I’m glad to help if I can.


The deadline for another US government program has passed, but should yield interesting results and may lead to future opportunities. In August, the Intelligence Advanced Research Projects Activity (IARPA, the intelligence community’s DARPA), which “invests in high-risk/high-payoff research programs” in military intelligence, solicited proposals for Aggregative Contingent Estimation, or what might be called wisdom-of-crowds methods for prediction:

The ACE Program seeks technical innovations in the following areas:

  • Efficient elicitation of probabilistic judgments, including conditional probabilities for contingent events.
  • Mathematical aggregation of judgments by many individuals, based on factors that may include past performance, expertise, cognitive style, metaknowledge, and other attributes predictive of accuracy.
  • Effective representation of aggregated probabilistic forecasts and their distributions.

The full announcement is clear, detailed, and well thought out. I was impressed with the solicitors’ grasp of research in the field, an impression no doubt bolstered by the fact that some of my own papers are cited 😉 . Huge hat tip to Dan Goldstein for collating these excerpts:

The accuracy of two such methods, unweighted linear opinion pools and conventional prediction markets, has proven difficult to beat across a variety of domains.2 However, recent research suggests that it is possible to outperform these methods by using data about forecasters to weight their judgments. Some methods that have shown promise include weighting forecasters’ judgments by their level of risk aversion, cognitive style, variance in judgment, past performance, and predictions of other forecasters’ knowledge.3 Other data about forecasters may be predictive of aggregate accuracy, such as their education, experience, and cognitive diversity. To date, however, no research has optimized aggregation methods using detailed data about large numbers of forecasters and their judgments. In addition, little research has tested methods for generating conditional forecasts.

2 See, e.g., Tetlock PE, Expert Political Judgment (Princeton, NJ: Princeton University Press, 2005), 164-88; Armstrong JS, “Combining Forecasts,” in JS Armstrong, ed., Principles of Forecasting (Norwell, MA: Kluwer, 2001), 417-39; Arrow KJ, et al., “The Promise of Prediction Markets,” Science 2008; 320: 877-8; Chen Y, et al., “Information Markets Vs. Opinion Pools: An Empirical Comparison,” Proceedings of the 6th ACM Conference on Electronic Commerce, Vancouver BC, Canada, 2005.

3 See, e.g., Dani V, et al., “An empirical comparison of algorithms for aggregating expert predictions,” Proc. 22nd Conference on Uncertainty in Artificial Intelligence, UAI, 2006; Cooke RM, ElSaadany S, Huang X, “On the performance of social network and likelihood-based expert weighting schemes,” Reliability Engineering and System Safety 2008; 93:745-756; Ranjan R, Gneiting T, “Combining probability forecasts,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010; 72(1): 71-91.

[Examples:]

  • Will the incumbent party win the next presidential election in Country X?
  • Will the incumbent party win the next presidential election in Country X?
  • When will Country X hold its next parliamentary elections?
  • How many cell phones will be in use globally by 12/31/11?
  • By how much will the GDP of Country X increase from 1/1/11 to 12/31/11?
  • Will Country X default on its sovereign debt in 2011?
  • If Country X defaults on its sovereign debt in 2011, what will be the growth rate in the Eurozone in 2012?

Elicitation – Advances Sought
The ACE Program seeks methods to elicit judgments from individual forecasters on:

  • Whether an event will or will not occur
  • When an event will occur
  • The magnitude of an event
  • All of the above, conditioned on another set of events or actions
  • The confidence or likelihood a forecaster assigns to his or her judgment
  • The forecaster’s rationale for his or her judgment, as well as links to background information or evidence, expressed in no more than a couple of lines of text
  • The forecaster’s updated judgments and rationale

The elicitation methods should allow prioritization of elicitations, continuous updating of forecaster judgments and rationales, and asynchronous elicitation of judgments from more than 1,000 geographically-dispersed forecasters. While aggregation methods, detailed below, should be capable of generating probabilities, the judgments elicited from forecasters can but need not include probabilities.

Challenges include:

  • Some forecasters will be unaccustomed to providing probabilistic judgments
  • There has been virtually no research on methods to elicit conditional forecasts
  • Elicitation should require a minimum of time and effort from forecasters; elicitation should require no more than a few minutes per elicitation per forecaster
  • Training time for forecasters will be limited, and all training must be delivered within the software
  • Rewards for participation, accuracy, and reasoning must be non-monetary and of negligible face value (e.g., certificates, medals, pins)

How high can high-level programming go?

Our first prototype of Predictalot was written mainly in Mathematica with a rudimentary web front end that Dan Reeves put together (with editable source code embedded right on the page via etherpad!). It proved the concept but was ugly and horribly slow.

Screenshot of pre-alpha Predictalot: Mathematica + etherpad + web

Dan and I built a second prototype in PHP. It was even uglier but about twice as fast and somewhat useable on a small scale (at least by user willing/able to formulate their own propositions in PHP). Yet it still wasn’t good enough to serve thousands of users accustomed to simplicity and speed.

Screenshot of alpha Predictalot: PHP + YAP

The final live version of Predictalot was not only pleasing to the eye — thanks to Sudar, Navneet, and Tom — but pleasingly fast, due almost entirely to the heroic efforts of Mridul M who wrote a mini PHP parser inside of java and baked in a number of datbase and caching optimizations.

Screenshot of live beta Predictalot: Java + Javascript + YAP

It seems that high-level programming languages haven’t climbed high enough. To field a fairly constrained web app that looks good and works well, we benefit greatly from having at least three specialists, for the app front end, the app back end, and the platform back end (apache, security, etc.).

Here’s a challenge to the programming language community: anything I can whip up in Mathematica I should be able to run at web scale. Math majors should be able to create Predictalot. Dan and I can mock up the basic idea of Predictalot but it still takes tremendous talent, time, and effort to turn it into a professional looking and well behaved system.

The core market math of Predictalot — a combinatorial version of Hanson’s LMSR market maker — involves summing thousands of ex terms. Here we are in the second decade of the new millenium and in order for a sum of exponentials to execute quickly and without numeric overflow, we had to work out a transformation to conduct all our summations in log space. In other words, programming still requires me to think about how my machine represents my number. That shouldn’t qualify as “high level” thinking in 2010.

I realize I may be naively asking too much. Solving the challenge fully is AI-complete. Still, while we’re making impressive strides in artificial intelligence, programming feels much the same today as it did twenty years ago. It still requires learning specialized tricks, arcane domain knowledge, and optimizations honed only over years of experience, and the most computationally intensive applications still require that extra compilation step (i.e., it’s still often necessary to use C or Java over PHP, Perl, Python, or Ruby).

Some developments hardly seem like progress. Straightforward HTML markup like border=2 has given way to unweildy CSS like style=”border:2px solid black”. In some ways the need for specialized domain knowledge has gone up, not down.

Visual programming is an oft-tried, though so far largely unsuccessful way to lower the barrier to programming. Pipes was a great effort, but YQL proved more useful and popular. Google just announced new visual developer tools for Android in an attempt to bring mobile app creation to the masses. Content management systems are getting better and broader every day, allowing more and more complex websites to be built with less time touching source code.

I look forward to the day that computational thinking can suffice to create the majority of computational objects. I suspect that day is still fifteen to twenty years away.

It’s official: More people are playing Predictalot than Mafia Wars

It’s true.

More people are playing Predictalot today than Mafia Wars or Zynga Poker… On Yahoo!, that is.

In fact, Predictalot is the #1 game app on Yahoo! Apps by daily count. By monthly count, we are 5th and rising.

A prediction is being made about every three minutes.

Come join the fun.

predictalot most popular game app on yahoo 2010-06-12