I told the audience that they could learn more by searching for “cftc yahoo google” in their favorite search engine, showing the Yahoo! Search results with MidasOracle’s coverage at the top.2
It turns out that was poor advice. 63.7% of the audience probably won’t find what they’re looking for using that search.3
If some search engines don’t surface the MidasOracle post, I’m hoping they’ll find this.
And back to the effort to guide the CFTC: I hope other people and companies will join. The CFTC’s request for help itself displays a clear understanding of the science and practice of prediction markets and a real willingness to listen. The more organizations that speak out in support, the greater chance we have of convincing the CFTC to take action and open the door to innovation and experimentation.
1Which I hesitated to attend and host a reception for and now regret endorsing in any way.
2In September 2008, journalist Chris Masse uncovered the letter on the CFTC website before Google or Yahoo! had announced it. We should have known: Masse is extraordinarily skilled at finding anything relevant anywhere, and has been a tireless, invaluable (and unpaid) chronicler of all-things-prediction-markets for years now.
3Even Microsoft Live has the “right” result in position 3. Interestingly, Daniel Reeves got slightly different, presumably personalized, results in Google, even less excuse for not knowing what two MO junkies were looking for with that query.
“The No-Stats All-Star” is an entertaining, fascinating, and — warning — extremely long article by Michael Lewis in the New York Times Magazine on Shane Battier, a National Basketball Association player and Duke alumni whose intellectual and data-driven play fits perfectly into the Houston Rockets’s new emphasis on statistical modeling.
For Battier, every action is a numbers game, an attempt to maximize the probability of a good outcome. Any single outcome, good or bad, cannot be judged in isolation, as much as human nature desires it. Actions and outcomes have to be evaluated in aggregate.
Michael Lewis is a fantastic writer. Battier is an impressive player and an impressive person. Houston is not the first and certainly not the last sports team to turn to data as the arbiter of truth. This approach is destined to spread throughout industry and life, mostly because it’s right. (Yes, even for choosing shades of blue.)
This article (membership required) is remarkable mostly for the fact that it was published in 1968. (Hat tip to Jonathan Smith.) It describes an experiment in creating an artificial economy to buy and sell computer time in the cloud, an idea that has been kicked around a number of times in the intervening decades but never quite took hold, until recently if you count literal pricing in dollars in EC2. The concept of buying time on your company’s compute cluster in a pseudo currency may come back into vogue as such installations become commonplace and over demanded.
Also check out the hand drawn figure and the advertisement at the end:
It’s a great resource for students to learn about the area and find meaty research problems. There’s also a chance for graduate students to earn $5000 in seed funding, work with Yahoo! Research scientists and data, and attend a summit of like-minded students and scientists.
The challenges cover search, machine learning, data management, information extraction, economics, social science, statistics, multimedia, and computational advertising.
As I alluded to previously, I seem to be getting “intelligent spam” on my blog: comments that pass the re-captcha test and seem on-topic, yet upon further inspection clearly constitute link spam: either the author URI or a link in the comment body is spam.
Date: Fri, 9 Jan 2009 01:28:01 -0800
From: Matt.Herdy
New comment on your post #71 “A historic MayDay: The US
government’s call for help on regulating prediction markets”
Author : Matt.Herdy
Comment:
Thanks for that post. I’ll put a note in the post.
1. It’s nothing new. The CFTC will just formalize the current
status quo.
2. We are prisoner of the CFTC regulations and the US Congress’
distaste of sports “gamblingâ€. As for the profitability of prediction
exchanges in that strict environment, I don’t see how you can deny that
HedgeStreet went bankrupt even though it was well funded. Isn’t that a
hard fact?
3. You’re right, but all “pragmatists†should follow a business
plan and make profits. See point #2. Pragmatists won’t make miracles.
At first blush, the comments seems to come from a knowledgeable person: they refer to HedgeStreet, an extremely relevant yet mostly unknown company that’s not mentioned anywhere else in the post or other comments.
It turns out the comments seem intelligent because they are. In fact, they’re copied word for word from Chris Masse’s comments on his own blog.
Chris Masse’s page has a link to my page, so it could have been discovered with a “link:” query to a search engine.
Though now I understand what this spammer did, I remain puzzled exactly how they did it and especially why.
Are these comments being inserted by people, perhaps hired on Mechanical Turk or other underground equivalent? Or are they coming from robots who have either broken re-captcha or the security of my blog? (John suspects a security breach.)
Is it really worth it economically? All links in blog comments are NOFOLLOW links anyway, and disregarded by search engines for ranking purposes, so what is the point? Are they looking for actual humans to click these links?
In any case, it seems an intriguing development in the spam arms race. Are other bloggers getting “intelligent spam”? Does anyone know how it’s done and why?
Update 2010/07: Oh, the irony. I got a number of intelligent seeming comments on this post about SEO, nofollow, economics of spam, etc. that were… promoting spammy links. I left them for humor value though disabled the links.
In October, Philip Polgreen, Yiling Chen, myself, and Forrest Nelson (representing University of Iowa, Harvard, and Yahoo!) published an article in the journal Clinical Infectious Diseases titled “Using Internet Searches for Influenza Surveillance”.
The paper describes how web search engines may be used to monitor and predict flu outbreaks. We studied four years of data from Yahoo! Search together with data on flu outbreaks and flu-related deaths in the United States. All three measures rise and fall as flu season progresses and dissipates, as you might expect. The surprising and promising finding is that web searches rise first, one to three weeks before confirmed flu cases, and five weeks before flu-related deaths. Thus web searches may serve as a valuable advance indicator for health officials to spot the onset of diseases like the flu, complementary to other indicators and forecasts.
I haven’t read the paper, but the article hints at nearly identical results:
Google … dug into its database, extracted five years of data on those queries and mapped it onto the C.D.C.’s reports of influenzalike illness. Google found a strong correlation between its data and the reports from the agency…
Tests of the new Web tool … suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.
To the reporter’s credit, he interviewed Phillip and the article does mention our work in passing, though I can’t say I’m thrilled with the way it was framed:
The premise behind Google Flu Trends … has been validated by an unrelated study indicating that the data collected by Yahoo … can also help with early detection of the flu.
giving (grudging) credit to Yahoo! data rather than Yahoo! people.
I’m thrilled to see the attention given to the topic, and the Google team deserves a huge amount of credit, especially for launching a live web site as a companion to their publication, a fantastic service of great social value. That’s an idea we had but did not pursue.
In the business world, being first often means little. However in the world of science, being first means a great deal and can be the determining factor in whether a study gets published. The truth is, although the efforts were independent, ours was published first — and Clinical Infectious Diseases scooped Nature — a decent consolation prize amid the go-google din.
Update 2008/11/24: We spoke with the Google authors and the Nature editors and our paper is cited in the Google paper, which is now published, and given fair treatment in the associated Nature News item. One nice aspect of the Google study is that they identified relevant search terms automatically by regressing all of the 50 million most frequent search queries against the CDC flu data. Congratulations and many thanks to the Google/CDC authors and the Nature editors, and thanks everyone for your comments and encouragement.
WeatherBill let’s you construct an enormous variety of insurance contracts related to weather. For example, the screenshot embedded below shows how I might have insured my vacation at the New Jersey shore:
For $42.62 I could have arranged to be paid $100 per day of rain during my vacation.
(I didn’t actually purchase this mainly because the US government insists that I am a menace to myself and should not be allowed to enter into such a dangerous gamble — more on this later. And as Dan Reeves pointed out to me, it’s probably not rational to do for small sums.)
WeatherBill is an example of the evolution of financial exchanges as they embrace technology.
WeatherBill can be thought of as expressive insurance, a financial category no doubt poised for growth and a wonderful example of how computer science algorithms are finally supplanting the centuries-old exchange logic designed for humans (CombineNet is another great example).
WeatherBill can also be thought of as a combinatorial prediction market with an automated market maker, a viewpoint I’ll expand on now.
On WeatherBill, you piece together contracts by specifying a series of attributes: date range, place, type of weather, threshold temperature or participation level, minimum and maximum number of bad-weather days, etc. The user interface is extremely well done: a straightforward series of adaptive menu choices and text entry fields guide the customer through the selection process.
This flexibility quickly leads to a combinatorial explosion: given the choices on the site I’m sure the number of possible contracts you can construct runs into the millions.
Once you’ve defined when you want to be paid — according to whatever definition of bad weather makes sense for you or your business — you choose how much you want to be paid.
Finally, given all this information, WeatherBill quotes a price for your custom insurance contract, in effect the maximum amount you will lose if bad weather doesn’t materialize. Quotes are instantaneous — essentially WeatherBill is an automated market maker always willing to trade at some price on any of millions of contracts.
Side note: On WeatherBill, you control the magnitude of your bet by choosing how much you want to be paid. In a typical prediction market, you control magnitude by choosing how many shares to trade. In our own prediction market Yoopick, you control magnitude by choosing the maximum amount you are willing to lose. All three approaches are equivalent, and what’s best depends on context. I would argue that the WeatherBill and Yoopick approaches are simpler to understand, requiring less indirection. The WeatherBill approach seems most natural in an insurance context and the Yoopick approach in a gambling context.
How does the WeatherBill market maker determine prices? I don’t know the details, but their FAQ says that prices change “due to a number of factors, including WeatherBill forecast data, weather simulation, and recent Contract sales”. Certainly historical data plays an important role — in fact, with every price quote WeatherBill tells you what you would have been paid in years past. They allow contracts as few as four days into the future, so I imagine they incorporate current weather forecasts. And the FAQ implies that some form of market feedback occurs, raising prices on contract terms that are in high demand.
Interface is important. WeatherBill shows that a very complicated combinatorial market can be presented in a natural and intuitive way. Though greater expressiveness can mean greater complexity and confusion, Tuomas Sandholm is fond of pointing out that, when done right, expressiveness actually simplifies things by allowing users to speak in terms they are familiar with. WeatherBill — and to an extent Yoopick IMHO — are examples of this somewhat counterintuitive principle at work.
There is another quote from WeatherBill’s FAQ that alludes to an even higher degree of combinatorics coming soon:
Currently you can only price contracts based on one weather measurement. We’re working on making it possible to use more than one measurement, and hope to make it available soon.
If so, I can imagine the number of possible insurance contracts quickly growing into the billions or more with prices hinging on interdependencies among weather events.
Finally, back to the US government treating me like a child. It turns out that only a very limited set of people can buy contracts on WeatherBill, mainly businesses and multi-millionaires who aren’t speculators. In fact, the rules of who can play are a convoluted jumble that I believe are based on regulations from the US Commodity Futures Trading Commission.
Luckily, WeatherBill provides a nice “choose your own adventure” style navigation flow to determine whether you are allowed to participate. Most people will quickly find they are not eligible. (I don’t officially endorse the CYOA standard of re-starting over and over again until you pass.)
Even if red tape locks the average consumer out of direct access, clever companies are stepping in to mediate. In a nice intro piece on WeatherBill, Newsweek mentions that Priceline used WeatherBill to back a “Sunshine Guaranteed” promotion offering refunds to customers whose trips were rained out.
Can you think of other end-arounds to bring WeatherBill functionality to the masses? What other forms of expressive insurance would you like to see?
I mostly side with Lukas and Panos on the fantastic potential of Amazon’s Mechanical Turk, a crowdsourcing service specializing in tiny payments for simple tasks that require human brainpower, like labeling images. Within the field of computer science alone, this type of service will revolutionize how empirical research is done in communities from CHI to SIGIR, powering unprecedented speed and scale at low cost (here are twoexamples). My guess is that the impact will be even larger in the social sciences; already, a number of folks in Yahoo’s Social Dynamics research group have started running studies on mturk. (A side question is how university review boards will react.)
However there is a seedier side to mturk, and I’m of two minds about it. Some people use the service to hire sockpuppets to enter bogus ratings and reviews about their products and engage in other forms of spam. (Actually this appears to violate mturk’s stated policies.)
For example, Samuel Deskin is offering up to ten cents to turkers willing to promote his new personalized start page samfind.
EARN TEN CENTS WITH THE BONUS – EASY MONEY – JUST VOTE FOR US AND COMMENT ABOUT US
EARN FOUR CENTS IF YOU:
1. Set up an anoymous email account likke gmail or yahoo so you can register on #2 anonymously
2. Visit http://thesearchrace.com/signup.php and sign up for an account – using your anonymous email account.
3. Visit http://www.thesearchrace.com/recent.php and vote for:
samfind
By clcking “Pick”
SIX CENTS BONUS:
4. Visit the COMMENTS Page on The Search Race, it is the Button Right Next to “Picks” on this page: http://www.thesearchrace.com/recent.php and
5. Say something awesome about samfind (http://samfind.com) on The Search Race’s Comments page.
Make sure to:
1. Tell us that you Picked us.
2. Copy and Paste the Comment you typed on The Search Race’s Comment page here so we know you wrote it and we will give you the bonus!
Another type of task on mturk involves taking a piece of text and paraphrasing it so that the words are different but the meaning remains the same. Here is an example:
Paraphrase This Paragraph
Here’s the original paragraph:
You’re probably wondering how to apply a wrinkle filler to your skin. The good news is that it’s easy! There are a number of different products on the market for anti aging skin care. Each one comes with its own special application instructions, which you should always make sure to read and carefully follow. In general, however, most anti aging skin care products are simply applied to the skin and left to soak in.
Requirements:
1. Use the same writing style as much as possible.
2. Vary at least 50% of the words and phrases – but keep the same concepts. Use obviously different sentences! Your paragraph should not be just a copy of the first with a few word replacements.
3. Any keywords listed in bold in the above paragraph must be included in your paraphrase.
4. The above paragraph contains 75 words… yours must contain at least 64 words and not more than 101 words.
5. Write using American English.
6. No obvious spelling or grammar mistakes. Please use a spell-checker before submitting. A free online spell checker can be found at www.spellcheck.net.
If you find it easier to paraphrase sentence-by-sentence, then do that. Please do not enter anything in the textbox other than your written paragraph. Thanks!
I have no direct evidence, but I imagine such a task is used to create splogs (I once found what seems like such a “paraphrasing splog”), ad traps, email spam, or other plagiarized content.
It’s possible that paid spam is hitting my blog (either that or I’m overly paranoid). I’m beginning to receive comments that are almost surely coming from humans, both because they clearly reference the content of the post and because they pass the re-captcha test. However, the author’s URL seems to point to an ad trap. I wonder if these commenters (who are particularly hard to catch — you have to bother to click on the author URL) are paid workers of some crowdsourcing service?
Can and should Amazon try to filter away these kinds dubious uses of Mechanical Turk? Or is it better to have this inevitable form of economic activity out in the open? One could argue that at least systems like mturk impose a tax on pollution and spam, something long argued as an economic force to reduce spam.
My main objection to these activities is the lack of disclosure. Advertisements and press releases are paid for, but everyone knows it, and usually the funding source is known. However, the ratings, reviews, and paraphrased text coming out of mturk masquerade as authentic opinions and original content. I absolutely want mturk to succeed — it’s an innovative service of tremendous value, one of many to come out of Amazon recently — but I believe Amazon is risking a minor PR backlash by allowing these activities to flow through its servers and by profiting from them.
The reporter phrased prices in terms of the candidates’ percent chance of winning:
Traders … gave Democratic front-runner Barack Obama an 86 percent chance of being the Democratic presidential nominee, versus a 12.8 percent for Clinton…
…traders were betting the Democratic nominee would ultimately become president. They gave the Democrat a 59.1 percent chance of winning, versus a 48.8 percent chance for the Republican.
The latter numbers imply an embarrassingly incoherent market, giving the Democrats and Republicans together a 107.9% chance of winning. This is almost certainly the result of a typo, since the Republican candidate on intrade has not been much above 40 since mid 2007.
Still, typos aside, we know that the last-trade prices of candidates on intrade and IEM often don’t sum to exactly 100. So how should journalists report prediction market prices?
Byrne Hobart suggests they should stick to something strictly factual like "For $4.00, an investor could purchase a contract which would yield $10.00" if the Republican wins.
I disagree. I believe that phrasing prices as probabilities is desirable. The general public understands “percent chance” without further explanation, and interpreting prices in this way directly aligns with the prediction market industry’s message.
When converting prices to probabilities, is a journalist obligated to normalize them so they sum to 100? Should journalists report last-trade prices or bid-ask spreads or something else?
My inclination is that bid-ask spreads are better. Something like "traders gave the Democrats between a 22 and 30 percent chance of winning the state of Arkansas". These will rarely be inconsistent (otherwise arbitrage is sitting on the table) and the phrasing is still relatively easy to understand.
In measuring precipitation accuracy, the study assumed that if a forecaster predicted a 50 percent or higher chance of precipitation, they were saying it was more likely to rain than not. Less than 50 percent meant it was more likely to not rain.
That prediction was then compared to whether or not it actually did rain…
Musings of a computer scientist on predictions, odds, and markets