Annoyed by our Taiwanese and Nigerian "friends", I decided to check out what this cool new toy at the university mail server does. And I indeed had learned of it before - and now I'm not going to part with it...

...

SpamAssassin (http://www.spamassassin.org/) is an assassin that kills spam, in other words, it's a spam filter.

SpamAssassin is basically a spam filter based on rule-based heuristics, although these days it also incorporates Bayesian filtering and it can also cooperate with other spam filtering methods (such as Vipul's Razor).

The usual assassination method is based on the score it calculates based on a large set of rules. A message that doesn't activate any filters gets score of 0, some particular "well-behaved" things lower the score a bit, while non-adherence to standards and known spammer tactics increase score (for example, the phrase "this is not spam" increases score by 0.405...=). By default, all E-mails with score greater than +5.0 are considered spam. (For what it's worth, one pr0n spam that I got today easily flew into that category with score of +20.7...) The user can change the scores for each test, and also change the threshold (some say 4.0 is better than 5.0, for example). There's also whitelists for exclusion of your friends and frequent mailers.

SpamAssassin is typically a *NIX utility. It's usually run from procmail. If only a couple of users are using it, it can be run as a stand-alone filter program. Larger sites can use client (spamc) in user .procmailrc or by the MTA itself, and run the daemon (spamd) to do the actual analysis. There's also a filtering SMTP proxy (spamproxy).

If using procmail, all you need to do is to feed the message to SpamAssassin in procmailrc, and if the message has "X-Spam-Flag: YES" header, junk it into an appropriate folder. All messages get identifying headers that tell what score the message got and so on; The messages that have been flagged as spam also contain detailed listing of what filters were triggered.

The program comes with a lot of filters, and it's possible to make custom filters. The filtering is based on the message itself (header and body), and optionally to external sources, such as the RBLs and Vipul's Razor.

SpamAssassin is written in Perl (spamc appears to be written in C, though), and can be got from CPAN too. It is distributed as open source.

For the non-*NIX folks, there's also commercial product called SpamAssassin Pro (from Deersoft Inc.) that works with Microsoft Outlook and Microsoft Exchange. There's also Bloomba's SAproxy.

(The only problem for me was that the first damn Nigerian scam mail I got that passed through it got measly +3.4...)

(Thanks to Zerotime for reminding about Bayesian filterings)