After one year of managing a network
of 10
servers with Cfengine I'm currently building two clusters of 50
servers with Puppet (which I'm using for the first time), and have
various notes to share. With my experience I had a feeling Cfengine
just isn't right for this project, and didn't consider it
seriously. These servers are all running Debian GNU/Linux and
Puppet felt natural because of the
good Debian
integration, and the number of users whom also produced a lot of
resources. Chef was out of the picture soon because of
the scary
architecture; CouchDB, Solr
and RabbitMQ... coming from Cfengine this seemed like a bad
joke. You probably need to hire a Ruby developer when it
breaks. Puppet is somewhat better in this regard.
Puppet master needs Ruby, and has a built-in file server using
WEBrick. My first disappointment with Puppet was
WEBrick. Though
PuppetLabs claim you can scale it up to 20 servers, that
proved way off, the built-in server has problems serving as little as
5 agents/servers, and you get to see many dropped connections and
failed catalog transfers. I was forced to switch to Mongrel
and Nginx as frontend very early in the project, on both
clusters. This method works much better (even
though Apache+Passenger is the recommended method now from
PuppetLabs), and it's not a huge complication compared to WEBrick (and
Cfengine which doesn't make you jump through any hoops). Part of the
reason for this failure is my pull interval, which is 5 minutes with a
random sleep time of up to 3 minutes to avoid harmonics (which is
still a high occurrence with these intervals and WEBrick fails
miserably). In production a customer can not wait on 30/45 minute pull
intervals to get his IP address whitelisted for a service, or some
other mundane task, it must happen within 10 minutes... but I'll come
to these kind of unrealistic ideas a little later.
Unlike the Cfengine article I have no bootstrapping notes, and no
code/modules to share. By default the fresh started puppet agent will
look for a host called "puppet" and pull in what ever you defined to
bootstrap servers in your manifests. As for modules, I wrote
a ton of code and though I'd like to share it, my employer owns
it. But unlike Cfengine v3 there's a lot
of resources out there for
Puppet which can teach you everything you need to know, so I don't
feel obligated to even ask.
Interesting enough, published modules would not help you get your job
done. You will have to write your own, and your team members will have
to learn how to use your modules, which also means writing a lot of
documentation. Maybe my biggest disappointment is getting
disillusioned by a lot of Puppet advocates and DevOps prophets. I
found articles and modules most of them write, and experiences they
share are simplistic - have nothing to do with the real world. It's
like they host servers in (magic?) environments where everything is
done in one way and all servers are identical. Hosting big websites
and their apps is a much, much different affair.
Every customer does things differently, and I had to write custom
modules for each of them. Just between these two clusters a module
managing Apache is different, and you can abstract your code a lot but
you reach a point where you simply can't push it any more. Or if you
can, you create a mess that is unusable by your team members, and I'm
trying to make their jobs better not make them miserable. One customer
uses an Isilon NAS, the other has a content distribution
network, one uses Nginx as a frontend, other has chrooted web servers,
one writes logs to a NFS, other to a Syslog cluster... Now imagine
this on a scale with 2,000 customers and 3 times the servers and most
of the published infrastructure design guidelines become
laughable. Instead you find your self implementing custom solutions,
and inventing
your own rules, best that you can...
I'm ultimately here to tell you that the projects are in a better
state then they would be with the usual cluster management policy. My
best moment was an e-mail from a team member saying "I read the
code, I now understand it [Puppet]. This is fucking
awesome!". I knew at that moment I managed to build something
good (or good enough), despite the shortcomings I found, and with
nothing more than
using PuppetLabs
resources. Actually, that is not completely honest. Because I did buy
and read the
book Pro
Puppet which contains an excellent chapter on using Git
for collaboration on modules between sysadmins and developers, with
proper implementation of development, testing and production
(Puppet)environments.