My blog has been hacked yet again. For those keeping track, that’s infection number three. This latest exploit is very similar to the previous one. To humans arriving via browser (e.g., me), the site appears perfectly normal and healthy. Even upon clicking ‘view source’, nothing untoward is revealed. The <title> of my blog is, as always, Oddhead Blog.
However, when Google’s or Bing’s crawlers arrive to index my corner of the web, they see a different <title> altogether — Buy Cheap Cialis Online — and immediately roll their eyes. (Actually even if you run 'curl http://blog.oddhead.com', you’ll see the spam keywords.) The effect of the attack is a kind of reverse cloaking. Cloaking is the black-hat SEO practice of serving legitimate content to crawlers and spam content to people. Here, the spam content is shown to the crawlers and the legitimate content to the people.
Once the crawlers report this appalling information back to their respective mother ships, the search engines have no choice but to delist and demote my blog in their pagerankings. Right now, if you search for or within Oddhead Blog on Google, you’ll see how poorly the bots in Mountain View think of me:

You can hardly find any deep links into my blog by searching Google. For example, try searching for Bem+Wom, my invented term for “BEtter Mousetrap, Word of Mouth”. Even try “Bem+Wom oddhead blog”. You”ll find aggregators republishing my content, but no links to the original source, my blog, anywhere in sight. (Note to self: the Bing results for Bem+Wom are awful.)
Once again I am at a loss to understand my attacker’s motivation. Clearly it’s not to sell Cialis to my users, as they remain blissfully ignorant of any changes. The only benefit to anyone is to remove one relatively obscure blog from the search engine rankings and thus to move the attacker one slot up. Having a blog tangentially about gambling probably puts me into a shady neighborhood of the web, yet reverse-cloaking your competition (even if it can be somewhat automated and strike more than one competitor) seems like an awfully indirect way to improve one’s standing in Google. It’s also possible this is an act of pure vandalism.
So what should I do? Although I partly blame WordPress for writing insecure software, I may end up paying WordPress protection money to make this problem go away. I am seriously considering giving up on self hosting and moving my whole operation to worpress.com’s hosted service, where presumably security is tighter, or at least it’s not my responsibility any more. My web hosting service, DreamHost, may also be partly to blame, yet I like the company and have been quite happy with them in many respects. Any advice, dear reader? WordPress.com? Blogger? Try again and hope the fourth time is the charm? Should I be looking to ditch DreamHost as well?




The seedy side of Amazon's Mechanical Turk
I mostly side with Lukas and Panos on the fantastic potential of Amazon’s Mechanical Turk, a crowdsourcing service specializing in tiny payments for simple tasks that require human brainpower, like labeling images. Within the field of computer science alone, this type of service will revolutionize how empirical research is done in communities from CHI to SIGIR, powering unprecedented speed and scale at low cost (here are two examples). My guess is that the impact will be even larger in the social sciences; already, a number of folks in Yahoo’s Social Dynamics research group have started running studies on mturk. (A side question is how university review boards will react.)
However there is a seedier side to mturk, and I’m of two minds about it. Some people use the service to hire sockpuppets to enter bogus ratings and reviews about their products and engage in other forms of spam. (Actually this appears to violate mturk’s stated policies.)
For example, Samuel Deskin is offering up to ten cents to turkers willing to promote his new personalized start page samfind.
In fact, Deskin is currently offering bounties on mturk for a number of different spammy activities to promote his site. On the other hand, what Deskin is doing is not illegal and is arguably not all that different than paying PRWEB to publish his rah-rah press release (Start-up, samfind, Launches Customizable Startpage to Compete with Google, Yahoo & MSN, Los Angeles, California (PRWEB) August 4, 2008). And I have to at least give him credit for offering the money under his own name.
Another type of task on mturk involves taking a piece of text and paraphrasing it so that the words are different but the meaning remains the same. Here is an example:
I have no direct evidence, but I imagine such a task is used to create splogs (I once found what seems like such a “paraphrasing splog”), ad traps, email spam, or other plagiarized content.
It’s possible that paid spam is hitting my blog (either that or I’m overly paranoid). I’m beginning to receive comments that are almost surely coming from humans, both because they clearly reference the content of the post and because they pass the re-captcha test. However, the author’s URL seems to point to an ad trap. I wonder if these commenters (who are particularly hard to catch — you have to bother to click on the author URL) are paid workers of some crowdsourcing service?
Can and should Amazon try to filter away these kinds dubious uses of Mechanical Turk? Or is it better to have this inevitable form of economic activity out in the open? One could argue that at least systems like mturk impose a tax on pollution and spam, something long argued as an economic force to reduce spam.
My main objection to these activities is the lack of disclosure. Advertisements and press releases are paid for, but everyone knows it, and usually the funding source is known. However, the ratings, reviews, and paraphrased text coming out of mturk masquerade as authentic opinions and original content. I absolutely want mturk to succeed — it’s an innovative service of tremendous value, one of many to come out of Amazon recently — but I believe Amazon is risking a minor PR backlash by allowing these activities to flow through its servers and by profiting from them.