Intelligent blog spam

As I alluded to previously, I seem to be getting “intelligent spam” on my blog: comments that pass the re-captcha test and seem on-topic, yet upon further inspection clearly constitute link spam: either the author URI or a link in the comment body is spam.

Here is one of the most clear cases, received on January 9 as a comment to my post on the CFTC’s call for proposals to regulate prediction markets:

Date: Fri, 9 Jan 2009 01:28:01 -0800
From: Matt.Herdy
New comment on your post #71 “A historic MayDay: The US
government’s call for help on regulating prediction markets”
Author : Matt.Herdy
Comment:
Thanks for that post. I’ll put a note in the post.

1. It’s nothing new. The CFTC will just formalize the current
status quo.
2. We are prisoner of the CFTC regulations and the US Congress’
distaste of sports “gambling”. As for the profitability of prediction
exchanges in that strict environment, I don’t see how you can deny that
HedgeStreet went bankrupt even though it was well funded. Isn’t that a
hard fact?
3. You’re right, but all “pragmatists” should follow a business
plan and make profits. See point #2. Pragmatists won’t make miracles.

<a href=”http://www.stretch-marks-help.com/”>Removing stretch marks</a>

At first blush, the comments seems to come from a knowledgeable person: they refer to HedgeStreet, an extremely relevant yet mostly unknown company that’s not mentioned anywhere else in the post or other comments.

It turns out the comments seem intelligent because they are. In fact, they’re copied word for word from Chris Masse’s comments on his own blog.

Chris Masse’s page has a link to my page, so it could have been discovered with a “link:” query to a search engine.

Though now I understand what this spammer did, I remain puzzled exactly how they did it and especially why.

  1. Are these comments being inserted by people, perhaps hired on Mechanical Turk or other underground equivalent? Or are they coming from robots who have either broken re-captcha or the security of my blog? (John suspects a security breach.)
  2. Is it really worth it economically? All links in blog comments are NOFOLLOW links anyway, and disregarded by search engines for ranking purposes, so what is the point? Are they looking for actual humans to click these links?

In any case, it seems an intriguing development in the spam arms race. Are other bloggers getting “intelligent spam”? Does anyone know how it’s done and why?

Update 2010/07: Oh, the irony. I got a number of intelligent seeming comments on this post about SEO, nofollow, economics of spam, etc. that were… promoting spammy links. I left them for humor value though disabled the links.

23 thoughts on “Intelligent blog spam”

  1. I have seen this before, and wrote a short entry about it in September 2007.

    At least in the example I mentioned, (presumably low-wage) humans were being paid to write seemingly intelligent blog comments that included a spam link. And in another setting (or maybe this same one, I can’t find a reference at the moment), I’d seen a service that specifically hired “writers” to search for good material written in other blogs to plagiarize and use to get seemingly intelligent material into blogs.

    As to why, I’m not sure.

  2. Dave: I appreciate the deep dive on this issue, but lets not throw out the simplest explanation: Chris Masse is a spammer. Sorry, this just looks like his work (and its even his own words).

  3. wow, this is sophisticated. My guess would be that it is an automatic attack (and not a manual one), since the benefit of one more link is slim.

    but when the stakes are higher, manual spam definitely makes economic sense too. Just to see how valuable such spams can be, check out this craigslist post:

    http://sfbay.craigslist.org/sfc/wrg/981023593.html

    (since the CL link will expire at some point, let me repeat the contents: they’re basically offering $10 per post for bloggers who give a positive review of their company’s product).

  4. In response to the remark posted by the charming Bo Cowgill, I am not the author of the comment referred by David Pennock. Being a smart and investigative researcher, David Pennock understood that already, and aimed at analyzing this strange process. (Others prefer wasting their time analyzing the “flow of information”.)

    PS: If the charming Bo Cowgill refers to my very occasional sending of mass e-mails to the registered members of my group blog, Midas Oracle ORG, I don’t think it should be qualified as spamming, but I am ready to listen to opinions from veteran Internet users.

  5. Jeff: Fantastic, thanks: this goes a long way toward explaining it. However the buyblogcomments.com website says it specifically targets “dofollow” blogs that do not insert the NOFOLLOW directive. My blog on the other hand does insert NOFOLLOW, so I’m not sure why I am targeted. I guess other similar services are less discerning (or buyblogcomments and the like aren’t very good at identifying dofollow blogs). In any case, buyblogcomments represents a fascinating gray market economy. I just can’t imagine that link (times the probability of me not deleting it) is worth anywhere close to $0.25.

    Bo: Wow. That did not even cross my mind. Do you have independent evidence of that?

    Mohammad: Does that mean you conjecture that re-captcha is broken?

  6. Another interesting aspect of the gray market for links, and a topic for another day, is how search engines consider the crime punishable by death.

    To me, the key distinction is disclosure. If the payment is disclosed, the link is an advertisement. If the payment is not disclosed, it is spam. (Yet apparently payperpost bloggers do disclose their relationship yet were still nuked.)

  7. I believe that much of this is outsourced to India to bypass Captcha filters but I would have no doubt that AI bots will indeed be eventually utilized.

    Its annoying ofcourse and its been going on for quite some time and is most definitely getting worse. It is done simply because it is so very profitable. Its similar to the spam emails: virtually no cost is involved and annoying millions of people is without penalty if even a few profitable clicks are created.

  8. FoolsGold: thanks. However if you believe that re-captcha is not broken, then there is a cost of at least a few pennies: some person has to be paid to solve the re-captcha, if not compose the comment. I can’t imagine these NOFOLLOW links being worth even a few pennies.

  9. I think the people paying for the comment spam realize that the cost is minimal and that taking the time and effort to train people about FollowLinks and NoFollowLinks is simply not worth the effort. Some of these meaningless generic blog comments about ‘nice post’ by a spam marketer are clearly not going to lead to any benefit to the spam marketer, but he doesn’t care. He is engaging in what used to be termed “four walling”. Its not often that we see anymore those “Post No Bills” signs but back in the days when Billing Crews would come into town and glue their flyers all over the place, one new trainee on the crew is supposed to have asked what sort of buildings he should glue these flyers to and supposedly the reply was the kind of buildings with four walls. The response obviously meant that these advertisements were to be posted everywhere, no matter what sort of building it was or what the desires of the owners were.

    Its the same thing with comment-spam, you hire someone to engage in four walling. Post the spam in every blog you can get access to. And not much time or effort is wasted on considerations of quality or effectiveness or the types of links involved.

  10. My understanding is that semi-intelligent commentspam placed by a human is done at a rate of 1,000 such spams for $300.00.

    It seems that ‘no follow’ links are still of great value in that Search Engines will still consider that site as a gateway to further sites which might otherwise not be discoverable by the Search Engine Spider. This still makes the remote sites more “findable” to search engines.

    It seems that vote pooling of OCR recognition algorithms is still more expensive and time consuming than simply paying someone in a remote Indian village to type the re-Captcha words manually.

  11. Underground economy? Undocumented economy perhaps. When American ports were open to pirate vessels that enriched merchants and pleased local consumers and when American merchants openly made fortunes buying goods in Boston and selling them in Madagascar pirate lairs it was clear that piracy was a major industry that was in no way an underground economy.
    I think spam will probably prove to be equally spaning the great, but largely fanciful, void that supposedly separates the underground economy.
    Search Optimization ‘experts’ probably make a great deal of money and undoubtedly exert great pressure in shaping the comment-spam traffic.
    Just as we don’t really blame the poverty stricken villagers for some “mama-san” board but instead hold some hardware manufacturer responsible, so too will we find out that ignorant villagers in India are not really as responsible for comment-spam as the various Search Optimization firms in California are.

  12. >”…ignorant villagers in India are not really as responsible for comment-spam as the various Search Optimization firms in California are.”

    Just a further note about California versus India:

    I used the analogy of comment-spam being likened to four-walling. It might be interesting to note that Four Walling is a term most often associated with the Hollywood film industry. It might also be interesting to note that most of the firms doing this comment-spam seem to be located in Encino, Studio City, and Newport Beach which are ofcourse hotbeds of film-industry financing as well as Internet Search Engine Optimization.

    Never before has it been so true that: We have me the enemy and he is us!

  13. Editor’s note: this appears to be spam. Orig URL: dazzardo.com/bingo.html

    well the thing is that not all blogs are no follow and it’s quite likely that an automated service wouldn’t pick up the difference, for example a lot of SBM sites are no follow these days but I know that the popular autmoated software SEnuke doesn’t dstinguish, also many SEOs believe that nofollow links do contribute to the serbs but as to what level they count is debated, so it is possible people are doing it on purpose hoping that nofollow still has some value or there’s the possibility of automation, and there are definatly software’s which would be capable of pulling relevent comments because I’ve seen softwares much much more impressive than this, I am currently working on a rather interesting project for blog scraping prevention, the idea is that it will give them a random bollox feed like a nursery rhyme with a link to one of my sites inserted in it, and it’s testing well, next I’m going to try and screw the comment spammers over, if somebody can think up a way to do this let me know.

  14. Editor’s note: this appears to be spam. Orig URL: dazzardo.com/bingo.html

    Oh, and also you rank quite highly in google for do follow blog….. that might have something to do with people commenting your blog, that might mean that people are naturally commenting assuming that you are a do follow blog….

  15. Editor’s note: this appears to be spam. Orig URL: shubhinetwork.com

    i used to work for a fashion company where 1/2 of my day was spent writing on other blogs in the form of fake comments. my boss would come in and say “google emoda and see what comes up on blogs, make sure you combat all the negative ones, just sound like a happy customer”. needless to say it was awful and i eventually quit. the truth is, if you produce a quality product and perform with quality service, you wont need to worry about anyone else. 🙂

    first of all,there have so many people come from different country,for example,me .i come from China.Not the every body can use English to leave the message ,they even can not read any words,but they can understand something from the pictures ,the original links or somewhere.if they want to know what’s you want to tell them know ,they need spend time to check the dictionary .so ,i think it’s too hard to leave a English comment here.

    and secondly,the frequency you update is too high .we are so excited of that .maybe cause that we forget to comments..:)..

    and finally, i feel that everybody will think that NO body will reply the comment that i leave here..and maybe nobody care the comment i leaved .this is not a personal blog ,and it’s a public place.it’s too serious to discussion some thing like “how do you feel this project?”

    and here is not a kind of SNS website (Social Networking Services).
    so nobody think that’s necessary to comment here..

    but .from now on ,i know that ,there still have some one care of the comment which have wealth of information.

    am i right?

    [a href=”www.spyshelter.com” rel=”nofollow”]Antikeylogger[/a]

    [a href=”www.spyshelter.com” rel=”nofollow”>anti keylogger[/a]

  16. Editor’s note: This appears to be spam. Orig URL: wilhelm-schuster.de

    from my point of view do follow blog may be good idea to attract traffic towards your blog. because its like bait which attract the commenter towards your site

  17. Editor’s note: This appears to be spam. Orig URL: clavierarabe-virtuel.blogspot.com

    h, and also you rank quite highly in google for do follow blog….. that might have something to do with people commenting your blog, that might mean that people are naturally commenting assuming that you are a do follow blog..

  18. I deleted the following comment but now restored it, with links disabled, for humor value:

    Name: Ellenburstyn
    URL: fuzal.com
    Submitted on: Mar 22, 2010 @ 21:36

    It seems that ‘no follow’ links are still of great value in that Search Engines will still consider that site as a gateway to further sites which might otherwise not be discoverable by the Search Engine Spider. This still makes the remote sites more “findable” to search engines.

    [a href=”fuzal.com” rel=”nofollow”] Online electronics [/a]

Leave a Reply

Your email address will not be published. Required fields are marked *