27.08.2014 21:42

E-mail infrastructure you can blog about

The "e" in eCryptfs stands for "enterprise". Interestingly in the enterprise I'm in its uses were few and far apart. I built a lot of e-mail infrastructure this year. In fact it's almost all I've been doing, and "boring old e-mail" is nothing interesting to tell your friends about. With inclusion of eCryptfs and some other bits and pieces I think it may be something worth looking at, but first to do an infrastructure design overview.

I'm not an e-mail infrastructure architect (even if we make up that term for a moment), or in other words I'm not an expert in MS Exchange, IBM Domino and some other "collaborative software", and most importantly I'm not an expert in all the laws and legal issues related to E-mail in major countries. I consult with legal departments, and so should you. Your infrastructure designs are always going to be driven by corporate e-mail policies and local law - which can, for example, require from you to archive mail for a period of 7-10 years, and do so while conforming with data protection legislation... and that makes a big difference on your infrastructure. I recommend this overview of the "Climategate" case as a good cautionary tale. With that said I now feel comfortable describing infrastructure ideas someone may end up borrowing from one day.

E-mail is critical for most business today. Wait, that sounds like a stupid generalization. As a fact I can say this for types of businesses I've been working with; managed service providers and media production companies. They all operate with teams around the world and losing their e-mail system severely degrades their ability to get work done. That is why:

The system must be highly-available and fault-tolerant


Before I go on to the pretty pictures I have to note that good network design and engineering I am taking as a given here. The network has to be redundant well in advance of services. Network engineers I worked with were very good at their jobs and I had it easy, inheriting good infrastructure.

The first layer deployed on the network is the MX frontend. If you already have, or rent, an HA frontend that can sustain abuse traffic it's an easy choice to pull mail through it too. But your mileage may vary, as it's not trivial to proxy SMTP for a SPAM filter. If the filter sees connections only from the LB cluster it would be impossible for it to perform well; no rate limiting, no reputation scoring... I prefer HAProxy. People making it are great software engineers and their software and services are superior to anything else I've used (it's true I consulted for them once as a sysadmin but that has nothing to do with my endorsements). The HAProxy PROXY protocol, or TPROXY mode can be used in some cases. Or if you are a Barracuda Networks customer instead you might have their load balancers which are supposed to integrate with their SPAM firewalls, but I've been unable to find a single implementation detail to verify their claim. Without load balancers using the SPAM filtering cluster as the MX, and load balancing across it with round-robin DNS is a common deployment:

Network diagram

I wouldn't say much about the SPAM filter, obviously it's supposed to do a very good job at rating and scanning incoming mail, and everyone has their favorites. My own favorite classifier component for many years has been the crm114 discriminator, but you can't expect from (many) people to train their own filters and that it takes 3-6 months to achieve >99% accuracy, Gmail has spoiled the world. The important thing in the context of the diagram above is that the SPAM filter needs to be redundant, and that it must have the capability to spool incoming mail if all the Mailstore backends fail.

The system must have backups and DR fail-over strategy


For building the backend, the "Mailstores", some of my favorites are Postfix, sometimes Qmail, and Dovecot. It's not relevant, but I guess someone would want to hear that too.

eCryptfs (stacked) file-system runs on top of the storage file-system, and all the mailboxes and spools are stored on it. The reasons for using it are not just related to data protection legislation. There are other solutions and faster too, block-level or hardware-based solutions for doing full disk encryption. But, being a file-system eCryptfs allows us to manipulate mail on the individual (mail) file or (mailbox) directory level. Encrypted mail can be transferred over the network to the remote backup backend very efficiently because of it. If you require, or are allowed to do, snapshots they don't necessarily have to be done at the (fancy) file-system or volume level. Common ext4/xfs and a little rsync hard-links magic work just as well (up to about 1TB on cheap slow drives).

When doing backup restores or a backend fail-over eCryptfs keys can be inserted into the kernel keyring, and data mounted on the remote file-system to take over.

The system must be secure


Everyone has their IPS and IDS favorites, and implementations. But those, together with firewalls, application firewalls, virtual private networks, access controls, two-factor authentication and file-system encryption... still do not make your private and confidential data safe. E-mail is not confidential as SMTP is a plain-text protocol. I personally think of it as being in the public domain. The solution to authenticating correspondents and to protecting your data and intellectual property of your company, both in transit and stored on the Mailstore, is PGP/GPG encryption. It is essential.

Even then, confidential data and attachments from mailboxes of employees will find their way onto your project management suite, bug tracker, wiki... But that is another topic entirely. Thanks for reading.


Written by anrxc | Permalink | Filed under crypto, work