When I wrote about my
personal
e-mail solution I mentioned
that CRM114 Discriminator
can be used as a (solid) SPAM filter. I've been using it for about 3
years now, and it served me well. I must admit I never achieved the
unbelievable 99.9% accuracy they claim but it is a good
solution nonetheless. It is fast, lightweight, scalable and
flexible.
I use procmail which pipes all e-mail trough crm114, it is
installed system wide, but I keep all "*.crm" and
"*.css" files in the "~/.crm114" directory. If you
decide to try it you should know that it's very well documented, and
the HOWTO document in particular will get you started in no
time. There is no need for me to describe the installation and setup
process here. What I will talk about is using crm114 together with
Alpine.
With the recent "BlameThorstenAndJenny" release the format of
.css files has changed for the first time since I've been
using it. I decided to rebuild them from scratch. The old ones were
not giving perfect results, as I said, and hopefully I can do a better
job this time around. Since I get huge amounts of SPAM daily, the
first day was a little scary but I managed to get on top of it
quickly. I decided not to use mailtrainer.crm, which can be
used to train huge amounts of e-mail at once. Instead I rewrote some
of my old scripts and did it "manually". The first few batches of SPAM
and HAM that I received I trained with
my crmtrain
script. I used the Export function in Alpine to export each
wrongly classified (or unsure) e-mail to the "~/spam"
directory and the script took it from there.
Now that things have slowed down I use two scripts to train the filter
directly from Alpine. I use the Pipe function in Alpine to pipe
each wrongly classified (or unsure) e-mail to the respective
script. The crmspam
script trains SPAM,
while crmham
does so for HAM e-mail. I saw that mutt users have a similar
setup where they send each e-mail back to them selves, but this time
telling the filter how to flag it correctly. Only piping it, while
stripping the old CRM114 headers, seems a bit faster and simpler.