Geocities spam solved

by Volker Weber

While I was waiting for the Bayesian filter in Spamassassin to learn the geocities spam messages, I was getting a bit worn out by the lack of success training it. Yesterday I added a new filter to /etc/mail/spamassassin.

admin@benhur:/etc/mail/spamassassin> cat vowe01_geocities.cf
uri GEOCITIES_NUM /[a-z_0-9]{1,3}\.geocities\.com\/[a-z_0-9]{1,30}/i
describe GEOCITIES_NUM Possible Geocities spam site
score GEOCITIES_NUM 5.0

uri GEOCITIES_BR /geocities\.yahoo\.com\.br\/[a-z_0-9]{1,30}/i
describe GEOCITIES_BR Possible Geocities spam site
score GEOCITIES_BR 5.0
admin@benhur:/etc/mail/spamassassin>

Case closed. No more geocities spam:

X-Spam-Status: Yes, score=5.2 required=1.0
tests=ALL_TRUSTED,BAYES_99, GEOCITIES_BR autolearn=no version=3.0.4
X-Spam-Report: * -3.3 ALL_TRUSTED Did not pass through any untrusted hosts * 5.0 GEOCITIES_BR URI: Possible Geocities spam site * 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% *

It turns out that the Bayesian filter already knows. It just did not have enough weight.

Comments

Sound very familiar to me. If you want an semi-official ruleset you can use sare_specific.cf from Rules-Emporium

Joern Seemann, 2005-12-04

I am currently seeing zero spam. Would you still advise to use the sare_specific rule?

Volker Weber, 2005-12-04

Did you ever see a nn-spam with a link to a Geocities web site?

No? Me neither.

Chris Linfoot, 2005-12-04

sare_specific.cf is a very good ruleset. I never had any false positives

Joern Seemann, 2005-12-05

ALL_TRUSTED (one of the rules hit in your example spam) should never hit against external mail. This is a huge factor in why you initially had problems catching the Geocities spam. You're also still missing out on other tests against external mail because of this.

You need to manually configure your trusted_networks configuration so that external mail does not trigger ALL_TRUSTED. There are lots of knowledgeable folks over on the SpamAssassin Users' list if you need help doing so.

Daryl C. W. O'Shea, 2005-12-11

I get about 5-10 spam in DAY with geocities url. They all looks and feels similar but text is different so only way to detect them is that geocities url. Is sad that I need ignore whole operator but its only way.

Sample of this kind spam.

[Removed. We all know what it looks like. vowe]

Kalle Jaakkola, 2005-12-13

Been blocking geocities for months already with the following rules:

uri URI_ISOCC_GEOCITIES_COM /[a-z]{2}[.]geocities[.]com/i
score URI_ISOCC_GEOCITIES_COM 3.5
describe URI_ISOCC_GEOCITIES_COM ISO Country Code geocities.com URL

uri URI_ANY_GEOCITIES_COM /[a-z]{1,}[.]geocities[.]com.*[?]/i
score URI_ANY_GEOCITIES_COM 3.5
describe URI_ANY_GEOCITIES_COM Mail with geocities.com URL with parameters

uri URI_ANY_GEOCITIES_YAHOO_COM /geocities[.]yahoo[.]/i
score URI_ANY_GEOCITIES_YAHOO_COM 3.5
describe URI_ANY_GEOCITIES_YAHOO_COM Mail with geocities.yahoo URL

Quite useful given that 10-15% of the spam contains links to geocities:

root@mail:/home/username/mail/spam # grep 'X-Spam-Level:' 2005 | wc -l
50396

root@mail:/home/username/mail/spam # grep GEOCITIES 2005 | wc -l
5863

This means that in 2005 I received 50396 spam mails of which about 2931 [2 hits per spam] contain geocities URLs. Given that geocities spam appeared first in August this year, we can do the maths: Out of 21000 mails, 3000 were referring to a GEOCITIES URL. This means that at this moment geocities hosts about 10-15% of all spam sites, or acts as a front/forwarder for them. Yahoo! Way to go geocities!!!!

Pieter-Bas IJdens, 2005-12-20

Old vowe.net archive pages

I explain difficult concepts in simple ways. For free, and for money. Clue procurement and bullshit detection.

vowe

Paypal vowe