All posts by David Pennock

CS ∩ Econ news

Here are some news items about the field with no name (at least not yet, see below) that lies at the intersection of computer science and economics.

  1. The Sixth Workshop on Ad Auctions is soliciting papers. The workshop will be held June 8, 2010, in Cambridge, MA, in conjunction with the ACM Conference on Electronic Commerce (EC’10). There is a terrific organizing committee this year spanning industry and academia, CS and business schools.
  2. The EC’10 list of accepted papers is out and looks great.
  3. The first-ever Behavioral and Quantitative Game Theory Conference on Future Directions will be held May 14-16 in Newport Beach, CA. The program looks fantastic.
  4. Last fall, the University of Pennsylvania announced the first-ever undergraduate degree program in Market and Social Systems Engineering. Kudos to UPenn: the move shows impressive vision and leadership.
  5. The NSF is funding research in the CS-Econ area. They support efforts to “explore the emerging interface between computer science and economics, including algorithmic game theory, automated mechanism design, computational tractability of basic economic problems, and the role of information, trust, and reputation in markets” (page 7).
  6. The NBER Market Design working group is soliciting papers for a workshop October 8-9, 2010 in Cambridge, MA.
  7. We are now reviewing some amazing submissions to Yahoo!’s 2010 Key Scientific Challenges program. Read the challenges for the area we call Algorithmic Economics.
  8. Members of Yahoo! Labs can submit proposals to fund collaborative research with academic colleagues through the Yahoo! Faculty Research and Engagement program. If you’re interested, contact a Yahoo! Labs employee.

What should be the name? CS ∩ Econ is accurate but cryptic. At Yahoo!, we call it Algorithmic Economics. At Google, they call it Market Algorithms. The ACM Special Interest Group in this area calls it Electronic Commerce, causing complaints every year. I’ve heard people suggest Economics and Computation. The name Algorithmic Game Theory has emerged as something of a standard within the CS theory community. [Update: Noam suggests Algorithmic Game Theory and Economics and even renamed his blog accordingly.] The phrase Computational Economics makes sense but is already in use by a different field. A fun suggestion is Economatics (or Autonomics), meant to invoke a mashup of economics and automation.

Prediction markets had a similar naming/identity crisis. They’ve been called information markets, idea markets, securities markets, event markets, binary options, market in uncertainty, and more. But now almost everyone has settled on prediction markets. I’ve come to like the name and I think it’s helped establish the field in it’s own right. I hope we can settle on a good name for CS ∩ Econ in part so we can create the Journal of PerfectNameForCSEcon, an outlet sorely missing from the field.

Update 2011/10/11: The journal now exists! Called the ACM Transactions on Economics and Computation, it circumvented the naming issue.

A beautiful model (of the stock market)

Here is a histogram of the daily changes of the S&P 500 from 1950 to 2009. The x-axis is the daily log difference* and the y-axis is the number of times that difference occurred.

Histogram of log differences of S&P500 from 1950 to 2009

It turns out that a Laplace distribution is a pretty good model of the stock market. The Laplace distribution is parametrized by its median u and its average absolute difference from median b. I computed these two parameters for the S&P 500 data and plugged them into a Laplace distribution in Mathematica, then used that to generate 59 years worth of random simulated S&P 500 prices. Here is the resulting histogram.

Histogram of log differences of simulated S&P500

This is not a best fit: this is simply the same u and b computed on the data and then plugged into the distribution. Here are the two histograms on top of each other.

Histogram of log differences of S&P500 from 1950 to 2009 and simulated S&P 500

At the aggregate level the stock market is well behaved: it’s randomness is remarkably predictable. It’s amazing that this social construct — created by people for people, and itself often personified — behaves so much like a physical process, more so than any other man-made entity I can think of.

The graphs also hint at the futility of attaching reasons to price movements every single day. Today, “the prospects for continued low borrowing costs buoyed investors’ hopes for the U.S. economy”. Tomorrow, there may be “profit taking”.** Why do reporters feel obligated to explain why, or more to the point why do readers demand to be told? Why must we infer reasons, see momentum where there may be none, and assign labels like bull and bear? Why can’t we accept randomness?***

Update: Anthony Towns conducted a fascinating and puzzling follow-up analysis showing that the volatility of the stock market has been going up steadily over the years, even though the mean return has not.
__________
* The log difference is the log of the price on day d minus the log of the price on day d-1, or equivalently ln(d/(d-1)).
** I’d love to see a controlled experiment where financial reporters are given randomized reports about the Dow and watch them manufacture explanations, I imagine occasionally invoking the same cause for both ups and downs… (a) The prospect of higher interest rates spooked investors today; or (b) the Fed’s willingness to raise rates signals a recovering economy — investors rejoiced.
*** Let me clarify. The efficient market hypothesis implies both randomness and rationality: Randomness in the sense of statistical independence (no momentum), and rationality in the sense that price changes reflect new information. So I’m not saying that prices change for no reason, just that there is rarely an easily-stated single explanation for the overall market. (BTW, “profit taking” is never a valid explanation under EMH.)

Let the madness begin

Sixty-five men’s college basketball teams have been selected. Tomorrow there will be sixty-four. Half of the remaining teams will be eliminated twice every weekend for the next three weekends until only one team remains.

On April 5th, we will know who is champion. In the meantime, it’s anybody’s guess: any of 9.2 quintillion things could in principle happen.

At Predictalot it’s your guess. Make almost any prediction you can think of, like Duke will win go further than both Kansas and Kentucky, or the Atlantic Coast will lose more games than the Big East. There’s even the alphabet challenge: you pick six letters that include among them the first letters of all four final-four teams.

Following Selection Sunday yesterday, the full range of prediction types are now enabled in Predictalot encompassing hundreds of millions of predictions about your favorite teams, conferences, and regions. Check it out. Place a prediction or just lurk to see whether the crowd thinks St. Mary’s is this year’s Cinderella.

Come join our mad science experiment where crowd wisdom meets basketball madness. We’ve had many ups and down already — for example sampling is way trickier than I naively assumed initially — and I’m sure there is more to come, but that’s part of what makes building things based on unsolved scientific questions fun. Read more about the technical details in my previous posts and on the Yahoo! Research website.

And for the best general-audience description of the game, see the Yahoo! corporate blog.

Update: Read about us on the New York Times and VentureBeat.

You can even get your fix on Safari on iPhone!

Dave playing Predictalot on iPhone

Below is a graph of our exponential user growth over the last couple days. Come join the stampede!

graph of YAP installs for Predictalot

Revisiting predictions: Google and Avatar

In September 2007 I predicted that “Google [will buy] a TV ad for Google.com aimed at mass consumers”… before September 2008. I only missed it by a year and a half.

In December, before Avatar was released and while some still thought it more likely to sink than swim, Slate journalist Josh Levin asked us to predict its opening weekend box office earnings using our models. We projected between $65 and $84 million. The actual number? $77 million.

Confession: In this post I am guilty of exactly the sins I’ve complained about in the past: cherry picking positive outcomes in hindsight, and measuring probabilistic predictions categorically and in isolation. Oops.

Computer science = STEAM

At a recent meeting of the Association for Computing Machinery, the main computer science association, the CEO of ACM John White reported on efforts to increase the visibility and understanding of computer science as a discipline. He asked “Where is the C in STEM?” (STEM stands for Science, Technology, Engineering, and Math, and there are many policy efforts to promote teaching and learning in these areas.) He argued that computer science is not just the “T” in “STEM”, as many might assume. Computer science deserves attention of its own from policy makers, teachers, and students.

I agree, but if computer science is not the “T”, then what is it? It’s funny. Computer science seems to span all the letters of STEM. It’s part science, part technology, part engineering, and part math. (Ironically, even though it’s called computer science, the “S” may be the least defensible.*)

The interdisciplinary nature of computer science can be seen throughout the university system: no one knows quite where CS departments belong. At some universities they are part of engineering schools, at others they belong to schools of arts and sciences, and at still others they have moved from one school to another. That’s not to mention the information schools and business schools with heavy computer science focus. At some universities, computer science is its own school with its own Dean. (This may be the best solution.)

Actually, I’d go one step further and say that computer science also involves a good deal of “A”, or art, as Paul Graham popularized in his wonderful book Hackers and Painters, and as seen most clearly in places like the MIT Media Lab and the NYU Interactive Telecommunications Program.

So where is the C in STEM? Everywhere. Plus A. Computer science = STEAM.**

__________
* It seems that those fields who feel compelled to append the word “science” to their names (social science, political science, library science) are not particularly scientific.
** Thanks to Lance Fortnow for contributing ideas for this post, including the acronym STEAM.

Why doesn’t Pittsburgh have a Silicon Hill?

I grew up in Pittsburgh. I love Pittsburgh. I still run into people who believe Pittsburgh is a steel town. Pittsburgh is not that — the steel industry cleared out (and the air cleared up) before I moved there at age 10 in 1981 — though driving through its streets it sometimes feels like one: gritty row houses, dive bars, old-growth neighborhoods, and independent shops, worn and welcoming.

Then what is Pittsburgh?

A sports town, no doubt, but that doesn’t count.

A hospital town, perhaps. The University of Pittsburgh Medical Center is a sprawling conglomerate of hospitals, doctors, researchers, and medical school, growing organically and through acquisition. Several other private hospitals and networks dot the city.

But with one the the top five computer science departments in the world at Carnegie Mellon University churning out grads at all levels, you might think Pittsburgh would have the seeds of a high-tech ecosystem. Yet there are few major technology companies, startups, or venture capital firms to nurture them locally. (Two exceptions I can think of: Google and CombineNet. Update: Also Intel Labs Pittsburgh.) Instead, CMU students tend to flee for the coasts after graduation.

Could Pittsburgh develop a startup row, a mini Silicon Valley? Pittsburghers have been hoping for and heralding such a transformation for decades. Given the city’s famously steep (SF-worthy) gradients, there’s even a perfect name for it: Silicon Hill.

In selecting Pittsburgh for the G-20 summit, the Obama administration cited Pittsburgh as a post-industrial success story with “renewed industries that are creating the jobs of the future”. But that seems very glass half full as (paraphrasing) one of my Pittsburgh friends noted on Facebook.

Paul Graham wrote a terrific essay (as Paul Graham is wont to do) about how a city might go about buying their own Silicon Valley.* He concludes that it may be possible. “For the price of a football stadium, any town that was decent to live in could make itself one of the biggest startup hubs in the world.” His main conjecture is that the money would fund a large number of good local startups in their infancy but without forcing them to stay — the best startups simply won’t take money that constrains their future options. The funding would have to be rich enough and the environment nice enough that they simply would not want to leave.

Is Graham right and, if so, could Pittsburgh pull it off?

__________
* See also Graham’s older and longer essay How to be Silicon Valley.

Countdown to web sentience

In 2003, we wrote a paper titled 1 billion pages = 1 million dollars? Mining the web to play Who Wants to be a Millionaire?. We trained a computer to answer questions from the then-hit game show by querying Google. We combined words from the questions with words from each answer in mildly clever ways, picking the question-answer pair with the most search results. For the most part (see below), it worked.

It was a classic example of “big data, shallow reasoning” and a sign of the times. Call it Google’s Law. With enough data nothing fancy can be done, but more importantly nothing fancy need be done: even simple algorithms can look brilliant. When in comes to, say, identifying synonyms, simple pattern matching across an enormous corpus of sentences beats the most sophisticated language models developed meticulously over decades of research.

Our Millionaire player was great at answering obscure and specific questions: the high-dollar questions toward the end of the show that people find difficult. It failed mostly on the warm-up questions that people find easy — the truly trivial trivia. The reason is simple. Factual answers like the year that Mozart was born appear all over web. Statements capturing common sense for the most part do not. Big data can only go so far.*

That was 2003.

In the paper, our clearest example of a question that we could not answer was How many legs does a fish have?. No one on the web would actually bother to write down the answer to that. Or would they?

I was recently explaining all this to a colleague. To make my point, we Googled that question. Lo and behold, there it was: asked and answered — verbatim — on Yahoo! Answers. How many legs does a fish have? Zero. Apparently Yahoo! Answers also knows the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and “about 133,000” more.

Today, there are way more than 1 billion web pages: maybe closer to 1 trillion.

What’s the new lesson? Given enough time, everything will be on the web, including the fact that hungry poets blink (✓). Ok, not everything, but far more than anyone ever imagined.

It would be fun to try our Millionaire experiment again now that the web is bigger and search engines are smarter. Is there some kind of Moore’s Law for artificial intelligence as the web grows? Can sentience be far behind? 🙂

__________
* Lance agreed, predicting that IBM’s quest to build a Jeopardy-playing computer would succeed but not tell us much.

Predictalot! (And we mean alot)

I’m thrilled to announce the launch of Predictalot, a combinatorial prediction market for the NCAA Men’s Basketball playoffs. Predict almost anything you can think of, like Duke will advance further than UNC, or Every final four team name will start with U. Check the odds and invest points on your favorites. Sell your predictions anytime, even as you follow the basketball games live.

The basic game play is simple: select a prediction type, customize it, and invest points on it. Yet you’ll never run out of odds to explore: there are hundreds of millions of predictions you can make. The odds on each update continuously based on other players’ predictions and the on-court action.

Predictalot is a Yahoo! App, so you can play it at apps.yahoo.com or you can add it to your Yahoo! home page. I have to admit, it’s an incredible feeling to play a game I helped design right on the Yahoo! home page.

Predicalot app on the Yahoo! home page

That’s all you need to get started. If you’re curious and would like a peek under the hood, read on: there’s some interesting technology hidden in the engine.

Background and Details

Predictalot is a true combinatorial prediction market of the sort academics like us and Robin Hanson have been dreaming about since early in the decade. We built the first version during an internal Yahoo! Hack Day. Finally, we leveraged the Yahoo! Application Platform to quickly build a public version of the game. (Note that anyone can develop a YAP app that’s visible to millions — there’s good sample code, it supports YUI and OpenSocial, and it’s easy to get started.) After many fits and starts, late nights, and eventually all nights, we’re proud and excited to go live with Predictalot version 1.0. I can’t rave enough about the talent and dedication of the research engineers who gave the game a professional look and feel and production speed, turning a pie-in-the-sky idea into reality. We have many features and upgrades in mind for future versions, but the core functionality is in place and we hope you enjoy the game.

In the tournament, after the play-in game, the 64 top college basketball teams play 63 games in a single elimination tournament. So there are 2 to the power 63 or 9.2 quintillion total possible outcomes, or ways the entire tournament can unfold. Predictalot implicitly keeps track of the odds for them all. To put this in perspective, it’s estimated that there are about 10 quintillion individual insects on Earth. Of course, for all practical purposes, we can’t store 9.2 quintillion numbers, even with today’s computers. Instead, we compute the odds for any outcome on the fly by scanning through the predictions placed so far.

A prediction is a statement, like Duke will win in the first round, that will be either true or false in the final outcome. In this case, the prediction is true in exactly half, or 2 to the power 62 outcomes. (Note this does not mean the odds are 50% — remember the outcomes themselves are not all equally likely.) In theory, Predictalot can support predictions on any set of outcomes. That’s 2 to the power 2 to the power 63, or more than a googol predictions. For now, we restrict you to “only” hundreds of millions of predictions categorized into thirteen types. Computing the odds of a prediction precisely is too slow. Technically, the problem is #P-hard: as hard as counting SAT and harder than the travelling salesman problem. So we must resort to approximating the odds by randomly sampling the outcome space. Sampling is a tricky business — equal parts art and science — and we’re still actively exploring ways to increase the speed, stability, and accuracy of our sampling.

Because we track all possible outcomes, the predictions are automatically interconnected in ways you would expect. A large play on Duke to win the tournament instantly and automatically increases the odds of Duke winning in the first round; after all, Duke can’t win the whole thing without getting past the first round.

With 9.2 quintillion outcomes, Predictalot is to our knowledge the largest prediction market built, testing the limits of what the wisdom of crowds can produce. Predictalot is a game, and we hope it’s fun to play. We’d also like to pave the way for serious use of combinatorial prediction market technology.

Why did Yahoo! build this? Predictalot is a smarter market, letting humans and computers each do what they do best. People enter predictions in simple terms they understand like how one team fares against another. The computer handles the massive yet methodical number crunching needed to combine all the pieces together into a coherent overall prediction of a complex event. Markets like Predictalot, WeatherBill, CombineNet, and Internet advertising systems, to name a few, represent the evolution of markets in the digital age, empowering users with extreme customization. More and more, matching buyers with sellers — the core function of markets — requires sophisticated algorithms, including machine learning and optimization. Predictalot attempts to illustrate this trend in an entertaining way.

David Pennock
Mani Abrol, Janet George, Tom Gulik, Mridul Muralidharan, Sudar Muthu, Navneet Nair, Abe Othman, Daniel Reeves, Pras Sarkar

Wanted: Bluetooth sethead

In a typical pairing of a cell phone and a bluetooth device, the “smart” phone drives the “dumb” bluetooth. The computational brains and user interface controls live inside the cell phone together with the antenna. The bluetooth device simply follows orders. For example, a bluetooth headset acts as an alternate microphone and speaker for the phone. The bluetooth truly is an accessory to the phone.

I’d like a reverse sort of bluetooth device. A bluetooth “sethead”, if you will. The cellular antenna lives inside the earpiece, or maybe stays inside your pocket or bag — technically this is the “phone” but it is a dumb device with no screen or interface. The “bluetooth” part is the thing you hold in your hand with all the smarts: the processor, the address book, the screen, the controls, the camera, the gps, another microphone and speaker — everything you normally expect in a phone except the antenna.

Why do I want this? If it existed, I could choose any carrier with any phone. I select a dumb phone from the best carrier and a smart sethead from the best hardware company. A version of an iPod touch with a camera, microphone, and gps would make an ideal sethead.

A MiFi device comes close: it’s a dumb cellular antenna that creates as a mobile wifi hotspot that can connect you to Skype, etc. (I have one from Verizon Wireless and love it.) But it’s not “always on”. MiFi + iPod is great for making calls but not for receiving calls, so is not sufficient for replacing a cell phone.

Sure, the advent of setheads would speed the carriers’ transformation into “dumb pipes”, something they are resisting, but that is inevitable anyway.