Geocities spam solved

by Volker Weber

While I was waiting for the Bayesian filter in Spamassassin to learn the geocities spam messages, I was getting a bit worn out by the lack of success training it. Yesterday I added a new filter to /etc/mail/spamassassin.

admin@benhur:/etc/mail/spamassassin> cat vowe01_geocities.cf
uri GEOCITIES_NUM /[a-z_0-9]{1,3}\.geocities\.com\/[a-z_0-9]{1,30}/i
describe GEOCITIES_NUM Possible Geocities spam site
score GEOCITIES_NUM 5.0

uri GEOCITIES_BR /geocities\.yahoo\.com\.br\/[a-z_0-9]{1,30}/i
describe GEOCITIES_BR Possible Geocities spam site
score GEOCITIES_BR 5.0
admin@benhur:/etc/mail/spamassassin>

Case closed. No more geocities spam:

X-Spam-Status: Yes, score=5.2 required=1.0
tests=ALL_TRUSTED,BAYES_99, GEOCITIES_BR autolearn=no version=3.0.4
X-Spam-Report: * -3.3 ALL_TRUSTED Did not pass through any untrusted hosts * 5.0 GEOCITIES_BR URI: Possible Geocities spam site * 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% *

It turns out that the Bayesian filter already knows. It just did not have enough weight.

Comments

Sound very familiar to me. If you want an semi-official ruleset you can use sare_specific.cf from Rules-Emporium

I am currently seeing zero spam. Would you still advise to use the sare_specific rule?

Did you ever see a nn-spam with a link to a Geocities web site?

No? Me neither.

sare_specific.cf is a very good ruleset. I never had any false positives

ALL_TRUSTED (one of the rules hit in your example spam) should never hit against external mail. This is a huge factor in why you initially had problems catching the Geocities spam. You're also still missing out on other tests against external mail because of this.

You need to manually configure your trusted_networks configuration so that external mail does not trigger ALL_TRUSTED. There are lots of knowledgeable folks over on the SpamAssassin Users' list if you need help doing so.

I get about 5-10 spam in DAY with geocities url. They all looks and feels similar but text is different so only way to detect them is that geocities url. Is sad that I need ignore whole operator but its only way.

Sample of this kind spam.

[Removed. We all know what it looks like. vowe]

Kalle Jaakkola, 2005-12-13 23:41

Been blocking geocities for months already with the following rules:

uri URI_ISOCC_GEOCITIES_COM /[a-z]{2}[.]geocities[.]com/i
score URI_ISOCC_GEOCITIES_COM 3.5
describe URI_ISOCC_GEOCITIES_COM ISO Country Code geocities.com URL

uri URI_ANY_GEOCITIES_COM /[a-z]{1,}[.]geocities[.]com.*[?]/i
score URI_ANY_GEOCITIES_COM 3.5
describe URI_ANY_GEOCITIES_COM Mail with geocities.com URL with parameters

uri URI_ANY_GEOCITIES_YAHOO_COM /geocities[.]yahoo[.]/i
score URI_ANY_GEOCITIES_YAHOO_COM 3.5
describe URI_ANY_GEOCITIES_YAHOO_COM Mail with geocities.yahoo URL

Quite useful given that 10-15% of the spam contains links to geocities:

root@mail:/home/username/mail/spam # grep 'X-Spam-Level:' 2005 | wc -l
50396

root@mail:/home/username/mail/spam # grep GEOCITIES 2005 | wc -l
5863

This means that in 2005 I received 50396 spam mails of which about 2931 [2 hits per spam] contain geocities URLs. Given that geocities spam appeared first in August this year, we can do the maths: Out of 21000 mails, 3000 were referring to a GEOCITIES URL. This means that at this moment geocities hosts about 10-15% of all spam sites, or acts as a front/forwarder for them. Yahoo! Way to go geocities!!!!

Pieter-Bas IJdens, 2005-12-20 12:45

Post a comment











Shall I remember this for you?




Use your full name and a working email address. Unless you want your comment to be removed. No kidding.



Recent comments

Johannes Matzke on Put a Porsche in your driveway at 09:50
Jan-Piet Mens on Department of Homeland Security launches Electronic System for Travel Authorization at 08:30
Henrik Heigl on Put a Porsche in your driveway at 08:16
Simon Phipps on Department of Homeland Security launches Electronic System for Travel Authorization at 03:33
Colin Williams on Tweet of the day at 02:23
Volker Weber on Tweet of the day at 01:28
Konstantin Klein on Ich verstehe es auch nicht at 01:21
Karsten Lehmann on Tweet of the day at 00:36
Andreas Grün on Tweet of the day at 23:32
Volker Weber on Tweet of the day at 23:31
Andreas Grün on Tweet of the day at 23:26
Volker Weber on Department of Homeland Security launches Electronic System for Travel Authorization at 23:25
Andreas Grün on Department of Homeland Security launches Electronic System for Travel Authorization at 23:11
Ole Saalmann on Department of Homeland Security launches Electronic System for Travel Authorization at 20:09
Kevin Mort on Zones at 19:19
Hynek Kobelka on Department of Homeland Security launches Electronic System for Travel Authorization at 17:59
Frank Jennings on Synchronizing iPhone with ... Lotus Notes at 17:32
Ben Poole on Department of Homeland Security launches Electronic System for Travel Authorization at 17:28
Gerry Shappell on Department of Homeland Security launches Electronic System for Travel Authorization at 17:13
Stuart Mcintyre on Zones at 15:55
Adam Zeitsiff on Addicted to your BlackBerry but your wife does not approve at 15:40
Kerr Rainey on Department of Homeland Security launches Electronic System for Travel Authorization at 15:37
Jan-Piet Mens on Department of Homeland Security launches Electronic System for Travel Authorization at 15:30
Ben Rose on Zones at 15:06
Volker Weber on Tweet of the day at 12:02

Ceci n'est pas un blog

vowe.net is a personal website published by Volker Weber a.k.a. vowe. I am an author, consultant and systems architect based in Darmstadt, Germany.

rss Click here to subscribe

Hello

About me
Contact
Publications
Certificates
Frequently asked questions

Twitter Updates

More >

Poll

Can you bring a camera phone to work?

Getting poll results. Please wait...

Local time is 11:54

visitors.gif
190 visitors online

News

Other sources of news, imported into my own format to make them more accessible:

Heise Online
Schlagzeilen
Weather

Archives

As most of my articles roll off the front page rather quickly, I am making an archive of previous posts available here. You can also use the handy search box at the top of the page if you are looking for something particular.

Last 30 days
More archives

Got the T-shirt?

Got the T-shirt?
Are you buying from the US?

Systems Architecture

This site runs on an Apache web server on top of the Linux operating system. The content is managed with MovableType which is implemented in Perl. Last but not least the HTML code your browser sees is put together with PHP.

© 1992-2008 Volker Weber.
All Rights Reserved.

Impressum