15.08.2009 04:53

Notes on Alpine and CRM114

When I wrote about my personal e-mail solution I mentioned that CRM114 Discriminator can be used as a (solid) SPAM filter. I've been using it for about 3 years now, and it served me well. I must admit I never achieved the unbelievable 99.9% accuracy they claim but it is a good solution nonetheless. It is fast, lightweight, scalable and flexible.

I use procmail which pipes all e-mail trough crm114, it is installed system wide, but I keep all "*.crm" and "*.css" files in the "~/.crm114" directory. If you decide to try it you should know that it's very well documented, and the HOWTO document in particular will get you started in no time. There is no need for me to describe the installation and setup process here. What I will talk about is using crm114 together with Alpine.

With the recent "BlameThorstenAndJenny" release the format of .css files has changed for the first time since I've been using it. I decided to rebuild them from scratch. The old ones were not giving perfect results, as I said, and hopefully I can do a better job this time around. Since I get huge amounts of SPAM daily, the first day was a little scary but I managed to get on top of it quickly. I decided not to use mailtrainer.crm, which can be used to train huge amounts of e-mail at once. Instead I rewrote some of my old scripts and did it "manually". The first few batches of SPAM and HAM that I received I trained with my crmtrain script. I used the Export function in Alpine to export each wrongly classified (or unsure) e-mail to the "~/spam" directory and the script took it from there.

Now that things have slowed down I use two scripts to train the filter directly from Alpine. I use the Pipe function in Alpine to pipe each wrongly classified (or unsure) e-mail to the respective script. The crmspam script trains SPAM, while crmham does so for HAM e-mail. I saw that mutt users have a similar setup where they send each e-mail back to them selves, but this time telling the filter how to flag it correctly. Only piping it, while stripping the old CRM114 headers, seems a bit faster and simpler.


Written by anrxc | Permalink | Filed under code