Friday, April 27, 2007

Google counting

So as you may or may not know, I've been dabbling lately in the despamming of online forum posts, with more or less success. And I've been helping people pro bono in despamming their own fora, if they're interested.

This is in response to the wave of forum spam which has arisen since the availability of XRumer in November, 2006. XRumer has a number of tricks in its toolbag to get around spam blocking measures in place, because after all, if you don't actually track people down and shoot them when they spam, you must actively want "advertising" on your forum for cheap handbag knockoffs, Viagra, and lesbian porn, right? Right.

So forum despamming is getting interesting. One of XRumer's little tricks is to post through HTTP proxies to make it difficult or impossible to do IP-based banning of posters. And that was really pretty effective -- until the very popularity of XRumer and similar spambots using HTTP proxies made traffic through HTTP proxies really, really prominent.

The epiphany I had this week was that if a proxy is well-known by forum spammers, it's going to get indexed by Google a whole lot. So the natural next step is to check Google for the IP of an untrusted poster, right?

Thus the Google counter was born (code presented here). Very simple. And it seems to block about 30% to 60% of forum spam so far.

In other news, since the Big Move back down to the Caribbean, the translation work has been incredibly voluminous. I keep thinking, "Today I'm going to do some GUI work" but I still end up falling asleep translating. (And that's really weird: yesterday I tried to type "replacement and wearing parts" and typed "replacement and wearing turbans" instead. Typing while asleep is very Zen.)

4 Comments:

At 3:21 PM, Blogger Vincent said...

Are you sure it is legal to query Google like this?

 
At 1:39 PM, Blogger Alesandro said...

Hey Michael. I'm an OP at #TheSoftwareJedi over at Freenode and was part of the original AnAppADay project. I would want to come in contact with you regarding the project, as I'm interested in reviving the IRC channel, as well as the whole project.

My nick there is The_PHP_Jedi. And also, I noticed you mentioned you were relocated to Ponce, PR. I live in San Juan, PR. :D

Hope to hear from you,
The_PHP_Jedi

 
At 6:58 AM, Blogger Michael said...

Vincent, yes, it's "legal" and, more saliently, not against Google policy. Besides, I cache results -- not that my one server has more capacity than Google's 1.25 gigaservers, but still it gives me a little more control over the traffic.

 
At 3:38 PM, Blogger Cipher said...

First of all, great work - I'm a big fan.
Ontopic:
http://xkcd.com/269/
Try this :D

 

Post a Comment

<< Home