22.10.2014 22:51

SysV init on Arch Linux, and Debian

Arch Linux distributes systemd as its init daemon, and has deprecated SysV init in June 2013. Debian is doing the same now and we see panic and terror sweep through that community, especially since this time thousands of my sysadmin colleagues are affected. But like with Arch Linux we are witnessing irrational behavior, loud protests all the way to the BSD camp and public threats of Debian forking. Yet all that is needed, and let's face it much simpler to achieve, is organizing a specialized user group interested in keeping SysV (or your alternative) usable in your favorite GNU/Linux distribution with members that support one another, exactly as I wrote back then about Arch Linux.

Unfortunately I'm not aware of any such group forming in the Arch Linux community around sysvinit, and I've been running SysV init alone as my PID 1 since then. It was not a big deal, but I don't always have time or the willpower to break my personal systems after a 60 hour work week, and the real problems are yet to come anyway - if (when) for example udev stops working without systemd PID 1. If you had a support group, and especially one with a few coding gurus among you most of the time chances are they would solve a difficult problem first, and everyone benefits. On some other occasions an enthusiastic user would solve it first, saving gurus from a lousy weekend.

For anyone else left standing at the cheapest part of the stadium, like me, maybe uselessd as a drop-in replacement is the way to go after major subsystems stop working in our favorite GNU/Linux distributions. I personally like what they reduced systemd to (inspired by suckless.org philosophy?), but chances are without support the project ends inside 2 years, and we would be back here duct taping in isolation.


Written by anrxc | Permalink | Filed under main, work, code

27.08.2014 21:42

E-mail infrastructure you can blog about

The "e" in eCryptfs stands for "enterprise". Interestingly in the enterprise I'm in its uses were few and far apart. I built a lot of e-mail infrastructure this year. In fact it's almost all I've been doing, and "boring old e-mail" is nothing interesting to tell your friends about. With inclusion of eCryptfs and some other bits and pieces I think it may be something worth looking at, but first to do an infrastructure design overview.

I'm not an e-mail infrastructure architect (even if we make up that term for a moment), or in other words I'm not an expert in MS Exchange, IBM Domino and some other "collaborative software", and most importantly I'm not an expert in all the laws and legal issues related to E-mail in major countries. I consult with legal departments, and so should you. Your infrastructure designs are always going to be driven by corporate e-mail policies and local law - which can, for example, require from you to archive mail for a period of 7-10 years, and do so while conforming with data protection legislation... and that makes a big difference on your infrastructure. I recommend this overview of the "Climategate" case as a good cautionary tale. With that said I now feel comfortable describing infrastructure ideas someone may end up borrowing from one day.

E-mail is critical for most business today. Wait, that sounds like a stupid generalization. As a fact I can say this for types of businesses I've been working with; managed service providers and media production companies. They all operate with teams around the world and losing their e-mail system severely degrades their ability to get work done. That is why:

The system must be highly-available and fault-tolerant


Before I go on to the pretty pictures I have to note that good network design and engineering I am taking as a given here. The network has to be redundant well in advance of services. Network engineers I worked with were very good at their jobs and I had it easy, inheriting good infrastructure.

The first layer deployed on the network is the MX frontend. If you already have, or rent, an HA frontend that can sustain abuse traffic it's an easy choice to pull mail through it too. But your mileage may vary, as it's not trivial to proxy SMTP for a SPAM filter. If the filter sees connections only from the LB cluster it would be impossible for it to perform well; no rate limiting, no reputation scoring... I prefer HAProxy. People making it are great software engineers and their software and services are superior to anything else I've used (it's true I consulted for them once as a sysadmin but that has nothing to do with my endorsements). The HAProxy PROXY protocol, or TPROXY mode can be used in some cases. Or if you are a Barracuda Networks customer instead you might have their load balancers which are supposed to integrate with their SPAM firewalls, but I've been unable to find a single implementation detail to verify their claim. Without load balancers using the SPAM filtering cluster as the MX, and load balancing across it with round-robin DNS is a common deployment:

Network diagram

I wouldn't say much about the SPAM filter, obviously it's supposed to do a very good job at rating and scanning incoming mail, and everyone has their favorites. My own favorite classifier component for many years has been the crm114 discriminator, but you can't expect from (many) people to train their own filters and that it takes 3-6 months to achieve >99% accuracy, Gmail has spoiled the world. The important thing in the context of the diagram above is that the SPAM filter needs to be redundant, and that it must have the capability to spool incoming mail if all the Mailstore backends fail.

The system must have backups and DR fail-over strategy


For building the backend, the "Mailstores", some of my favorites are Postfix, sometimes Qmail, and Dovecot. It's not relevant, but I guess someone would want to hear that too.

eCryptfs (stacked) file-system runs on top of the storage file-system, and all the mailboxes and spools are stored on it. The reasons for using it are not just related to data protection legislation. There are other solutions and faster too, block-level or hardware-based solutions for doing full disk encryption. But, being a file-system eCryptfs allows us to manipulate mail on the individual (mail) file or (mailbox) directory level. Encrypted mail can be transferred over the network to the remote backup backend very efficiently because of it. If you require, or are allowed to do, snapshots they don't necessarily have to be done at the (fancy) file-system or volume level. Common ext4/xfs and a little rsync hard-links magic work just as well (up to about 1TB on cheap slow drives).

When doing backup restores or a backend fail-over eCryptfs keys can be inserted into the kernel keyring, and data mounted on the remote file-system to take over.

The system must be secure


Everyone has their IPS and IDS favorites, and implementations. But those, together with firewalls, application firewalls, virtual private networks, access controls, two-factor authentication and file-system encryption... still do not make your private and confidential data safe. E-mail is not confidential as SMTP is a plain-text protocol. I personally think of it as being in the public domain. The solution to authenticating correspondents and to protecting your data and intellectual property of your company, both in transit and stored on the Mailstore, is PGP/GPG encryption. It is essential.

Even then, confidential data and attachments from mailboxes of employees will find their way onto your project management suite, bug tracker, wiki... But that is another topic entirely. Thanks for reading.


Written by anrxc | Permalink | Filed under crypto, work

11.01.2014 23:00

Load balancing Redis and MySQL with HAproxy

It's a common occurrence to have two and more load balancers as HA frontends to databases at high traffic sites. I've used the open-source HAproxy like this, and have seen others use it. Building this infrastructure and getting the traffic distributed evenly is not really the topic I'd like to write about, but what happens after you do.

Using HAproxy like this in front of replicated database backends is tricky, a flap on one part of the network can make one or more frontends activate the backup backends. Then you have a form of split-brain scenario on your hands with updates occurring simultaneously to all masters in a replicated set. Redis doesn't do multi-master replication and it's easier to get in trouble, with just one HA frontend, if it happens the old primaries are reactivated before you synced them with new ones.

One way to avoid this problem is building smarter infrastructure. Offloading health checks and role directing to an independent arbiter. But having one makes it a single point of failure, having more makes it another replicated nightmare to solve. I was never keen on this approach because solving it reliably is an engineering challenge each time, and I have the good sense of knowing when it can be done better by smarter people.

Last year I've been pestering HAproxy developers to implement cheap features as a start. Let's say if a fail-over to backup happens to keep the old primary permanently offline with a new special directive, which would be more reliable than gaming health check counters. Request was of course denied, they are not in it to write hacks. They always felt the agents are the best approach, and that the Loadbalancer.org associates might even come up with a common 'protocol' for health and director agents.

But developers heard my case, and I presume others who discussed the same infrastructure. HAproxy 1.5 which is about to be released as the new stable branch (source: mailing list) implements peering. Peering with the help of stick-tables, whose other improvements will bring many advancements to handling bad and unwanted traffic, but that's another topic (see HAproxy blog).

Peering synchronizes server entries in stick-tables between many HAproxy instances over TCP connections, and a backend failing health checks on one HA frontend will be removed from all. Using documentation linked above here's an example:

peers HAPEERS
    peer fedb01 192.168.15.10:1307
    peer fedb02 192.168.15.20:1307

backend users
    mode tcp
    option tcplog
    option mysql-check user haproxy
    stick-table type ip size 20k peers HAPEERS
    stick on dst
    balance roundrobin
    server mysql10 192.168.15.33:3306 maxconn 500 check port 3306 inter 2s
    server mysql12 192.168.15.34:3306 maxconn 500 check port 3306 inter 2s backup

#backend uploads
When talking about Redis in particular I'd like to emphasize improvements in HAproxy 1.5 health checks, which will allow us to query Redis nodes about their role directly, and fail-over only if a backend became the new master. If Redis Sentinel is enabled and the cluster elects a new master HAproxy will fail-over traffic to it transparently. Using documentation linked above here's an example:
backend messages
    mode tcp
    option tcplog
    option tcp-check
    #tcp-check send AUTH\ foobar\r\n
    #tcp-check expect +OK
    tcp-check send PING\r\n
    tcp-check expect +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server redis15 192.168.15.40:6379 maxconn 1024 check inter 1s
    server redis17 192.168.15.41:6379 maxconn 1024 check inter 1s
    server redis19 192.168.15.42:6379 maxconn 1024 check inter 1s


Written by anrxc | Permalink | Filed under work

29.10.2013 22:11

Nginx PID file handling and HA

Nginx by some accounts now serves most of the world's top sites, and is now an enterprise product, so I was very surprised when I couldn't find a single mention of a problem in PID file handling that I've been observing for a while.

On a restart the old Nginx 'master' process can remain active for some time until all active connections on it close and it terminates. When it does so it deletes the PID file on the file-system which no longer belongs to it, it belongs to the new Nginx 'master' process which was spawned and already wrote its own PID into the file unless you prevented it from starting in the first place while a PID file exists on the file-system.

This leads to many issues down the road, here's some of the most severe that I experienced; Monitoring system alarming at 3 in the morning about full RAID arrays when in reality Nginx kept open file-descriptors on huge logs deleted long ago - log rotation jobs simply failed to send USR1 to it, no PID file on the file-system. Then failures from sysadmins and configuration management agents alike to activate new configuration by reloading (or again restarting) the Nginx service, signals being sent to the aether, there's no PID on the file-system. That's where most of my surprise came from, how in the world is everyone else successfully automating their web farms when 10% of your configuration updates fail to apply on average? What, that's only 100 servers when you are past 1000 nodes...

Nginx developers proclaimed this a feature and invalidated the bug report. Officially this is the intended behavior: "you start new nginx instance before old nginx instance was terminated. It's not something allowed - you have to wait for an old master process to exit before starting a new one.". That's acceptable to me, but then I wonder how in the world is everyone else successfully doing high-availability with their web farms? If you have a CDN origin and edge nodes are pulling 2GB videos from it those connections are going to take a while to close, meanwhile your origin is now failing all the health checks from your HA frontends and it gets failed out...

The final official solution is that Nginx should never ever be restarted. Every configuration update can be applied by a reload (send HUP to master process). Unfortunately that doesn't work in practice (how in the world is...), Nginx fails to apply many configuration changes on a reload in my experience. If that is the true bug I sometimes hit (ie. new FastCGI caching policy failed to activate, new SSL certificates failed to activate etc. etc.) I understand it and I accept it. However I remain of the opinion that smarter PID file handling is a simple fix, and a useful thing to have.

Things to do in this situation and avoid 3AM wake-up calls for a false positive, while not giving up HA? The init script can maintain its own PID file which is a clone of the one Nginx 'master' created at the time it started, and rely on it for all future actions and so can your log rotation jobs. This hack will certainly never be distributed by an OS distribution - but many operations already package their own Nginx because of all the extra modules modern web servers require (media streaming, Lua scripting, Real IP...).


Written by anrxc | Permalink | Filed under work

29.06.2013 23:01

SysV init on Arch Linux

Arch Linux distributes systemd as its init daemon, and has finally deprecated SysV this June. I could always appreciate the elegance in Arch' simple design, and its packaging framework. Both of which make it trivial for any enthusiast to run his own init daemon, be it openrc, upstart or SysV. To my surprise this didn't seem to be the prevailing view, and many users converted their workstations to other distributions. This irrational behavior also led to censorship of users mailing lists. Which made it impossible to reach out to other UNIX enthusiasts interested in keeping SysV usable as a specialized (and unofficial) user group.

When rc.d scripts started disappearing from official packages, I rescued those I could and packaged them as rcdscripts-aic. There was no user group, just me, and in expectation of other rc.d providers I added my initials as the suffix to the package and made a decision not to monopolize /etc/rc.d/ or /usr/share/rcdscripts/ to avoid conflict. Apparently no other provider showed up, but I still use /usr/share/rcdscripts-aic/ without strict guidelines how to make use of the scripts in that directory (copy or symlink to /etc/rc.d/ and /etc/conf.d/?).

Later this month Arch Linux also deprecated directories /bin, /sbin and /usr/sbin in favor of /usr/bin. Since initscripts was at this point obsolete, unsupported and unmaintained piece of code SysV became unusable. Again with no other provider available to me I forked, and packaged initscripts-aic. At least sysvinit found a maintainer and I didn't have to take over that as well.

The goal is providing a framework around SysV init for hobbyists and UNIX enthusiasts to boot their SysV systems. Stable basic boot is the priority for me, new features are not. There is no systemd revolution, I do not wish to associate my self with any systemd trolling. I do not want my packages censored and deleted from the Arch User Repository.


Written by anrxc | Permalink | Filed under code

24.06.2013 19:29

Hosting with Puppet - Design

Two years ago I was a small time Cfengine user moving to Puppet on a large installation, and more specifically introducing it to a managed hosting provider (which is an important factor driving my whole design and decision making process later). I knew how important it's going to be to get the base design right, and I did a lot of research on Puppet infrastructure design guidelines but with superficial results. I was disappointed, the DevOps crowd was producing tons of material on configuration management, couldn't at least a small part be applicable to large installations? I didn't see it that way then, but maybe that knowledge was being reserved for consulting gigs. After criticizing it is only fair that I write something of my own on the subject.

First of all, a lot has happened since. Wikimedia decided to release all their Puppet code to the public. I learned a lot, even if most of it was what not to do - but that was the true knowledge to be gained. One of the most prominent Puppet Forge contributors, example42 labs, released the next generation of their Puppet modules, and the quality has increased immensely. The level of abstraction is high, and for the first time I felt the Forge can possibly become a provider for me. Then 8 months ago the annual PuppetConf conference hosted engineers from Mozilla and Nokia talking about design and scaling challenges they faced running Puppet in a big enterprise. Someone with >2,000 servers sharing their experiences with you, soak it up.

* Puppet design principles


Running Puppet in a hosting operation is a very specific use case. Most resources available to you will concern running one or two web applications, on a hopefully standardized software stack across a dozen servers all managed by Puppet. But here you are a level above that, running thousands of such apps and sites, across hundreds of development teams that have nothing in common. If they are developing web-apps in Lisp you are there to facilitate it, not to tell stories about Python.

Some teams are heavily involved with their infrastructure, others depend entirely on you. Finally, there are "non-managed" teams which only need you to manage hardware for them but you still want to provide them with a hosted Puppet service. All this influences my design heavily, but must not define it. If it works for a 100 apps it must work for 1 just the same, so the design principles below are universal.

- Object oriented


Do not treat manifests like recipes. Stop writing node manifests. Write modules.

Huge manifests with endless instructions, if conditionals, and node (server) logic are a trap. They introduce an endless cycle of "squeezing in just one more hack" until the day you throw it all away and re-factor from scratch. This is one of the lessons I learned from Wikimedia.

Write modules (see Modular services and Module levels) that are abstracted. Within modules write small abstracted classes with inheritance in mind (see Inheritance), and write defined types (defines) for resources that have to be instantiated many times. Write and distribute templates where possible, not static files, to reduce chances of human error, to reduce number of files to be maintained by your team, and finally number of files compiled into catalogs (which concerns scaling).

Here's a stripped down module sample to clarify this topic, and those discussed below:
# - modules/nfs/manifests/init.pp
class nfs (
    $args = 'UNSET'
    ){

    # Abstract package and service names, Arch, Debian, RedHat...
    package { 'portmap': ensure => 'installed', }
    service { 'portmap': ensure => 'running', }
}
# - modules/nfs/manifests/disable.pp
class nfs::disable inherits nfs {
    Service['portmap'] { ensure => 'stopped', }
}
# - modules/nfs/manifests/server.pp
class nfs::server (
    $args = 'UNSET'
    ){

    package  { 'nfs-kernel-server': ensure => 'installed', }
    @service { 'nfs-kernel-server': ensure => 'running', }
}
# - modules/nfs/manifests/mount.pp
define nfs::mount (
    $arg  = 'UNSET',
    $args = 'UNSET'
    ){

    mount { $arg: device => $args['foo'], }
}
# - modules/nfs/config.pp
define nfs::config (
    $args = 'UNSET'
    ){

    # configure idmapd, configure exports...
)

- Modular services


Maintain clear roles and responsibilities between modules. Do not allow overlap.

Maybe it's true that a server will never run PHP without an accompanying web server, but it's not a good reason to bundle PHP management into the apache2 module. Same principle is here to prevent combining mod_php and PHP-FPM management into a single module. Write php5, phpcgi, phpfpm modules, and use them for Apache2, Lighttpd, Nginx web servers interchangeably.

- Module levels


Exploit modulepath support. Multiple module paths are supported, they can greatly improve your design.

Reserve default /etc/puppet/modules path for modules exposing the top level API (for lack of a better acronym). These modules should define your policy for all the software you standardize on, how a software distribution is installed and how it's managed: iptables, sudo, logrotate, dcron, syslog-ng, sysklogd, rsyslog, nginx, apache2, lighttpd, php5, phpcgi, phpfpm, varnish, haproxy, tomcat, fms, darwin, mysql, postgres, redis, memcached, mongodb, cassandra, supervisor, postfix, qmail, puppet it self, puppetmaster, pepuppet (enterprise edition), pepuppetmaster...

Use the lower level modules for defining actual policy and configuration for development teams in organizations (or customers in the enterprise), and their servers. Here's an example:
- /etc/puppet/teams/t1000/
  |_ /etc/puppet/teams/t1000/files/
     |_ php5/
        |_ apc.ini
  |_ /etc/puppet/teams/t1000/manifests/
     |_ init.pp
     |_ services.pp
     |_ services/
        |_ encoder.pp
     |_ webserver.pp
     |_ webserver/
        |_ production.pp
     |_ users/
        |_ virtual.pp
  |_ /etc/puppet/teams/t1000/templates/
     |_ apache2/
        |_ virtualhost.conf.erb
For heavily involved teams the "services" classes are here to enable them to manage their own software, code deployments and simillar tasks.

- Inheritance


Understand class inheritance, and use it to abstract your code to allow for black-sheep servers.

These servers are always present - that one server in 20 which does things "just a little differently".
# - teams/t1000/manifests/init.pp
class t1000 {
    include ::iptables

    class { '::localtime': timezone => 'Etc/UTC', }

    include t1000::users::virtual
}
# - teams/t1000/manifests/webserver.pp
class t1000::webserver inherits t1000 {
    include ::apache2

    ::apache2::config { 't1000-webcluster':
        keep_alive_timeout  => 10,
        keep_alive_requests => 300,
        name_virtual_hosts  => [ "${ipaddress_eth1}:80", ],
    }
}
# - teams/t1000/manifests/webserver/production.pp
class t1000::webserver::production inherits t1000::webserver {
    include t1000::services::encoder

    ::apache2::vhost { 'foobar.com':
        content => 't1000/apache2/virtualhost.conf.erb',
        options => {
            'listen'  => "${ipaddress_eth1}:80",
            'aliases' => [ 'prod.foobar.com', ],
        },
    }
}
Understand how resources are inherited across classes. This will not work:
# - teams/t1000/manifests/webserver/legacy.pp
class t1000::webserver::legacy inherits t1000::webserver {
    include ::nginx

    # No, you won't get away with it
    Service['apache2'] { ensure => 'stopped', }
}
Only a sub-class inheriting its parent class can override resources of that parent class. But this is not a deal breaker, once you understand it. Remember our "nfs::disable" class from an earlier example, which inherited its parent class "nfs" and proceeded to override a service resource?
# - teams/t1000/manifests/webserver/legacy.pp
class t1000::webserver::legacy inherits t1000::webserver {
    include ::nginx

    include ::apache2::disable
}
This was the simplest scenario. Consider these as well: legacy server needs to run MySQL v5.1 in a cluster of v5.5 nodes, server needs Nginx h264 streaming support compiled into nginx binary and its provider is a special package, server needs PHP 5.2 to run a legacy e-commerce system...

- Function-based classifiers


Export only bottom level classes of bottom level modules to the business, as node classifiers:
# - manifests/site.pp (or External Node Classifier)
node 'man0001' { include t1000::webserver::production }
This leaves system engineers to define system policy with a 100% flexibility, and allows them to handle complex infrastructure. They in turn must ensure the business is never lacking, a server either functions as a production webserver or not, it must never include top level API classes.

- Dynamic arguments


Do not limit your templates to a fixed number of features.

Use hashes to add support for optional arbitrary settings that can be passed onto resources in defines. When a developer asks for a new feature there is nothing to modify, nothing to re-factor, options hash (in earlier "apache2::vhost" example) is extended and the template is expanded as needed with new conditionals.

- Convergence


Embrace continuous repair. Design for it.

Is it to my benefit to go all wild on class relationships to squeeze everything into a single puppet run? But if just one thing changes whole policy breaks apart. Micro manage class dependencies and resource requirements. If a webserver refused to start because a Syslog-ng FIFO was missing we know it will succeed on the next run. Within a few runs we can deploy whole clusters across continents.

There is however a specific here which is not universal, a hosting operation needs to keep agent run intervals frequent to keep up with an endless stream of support requests. Different types of operations can get away with 45-60 minute intervals, and sometimes use them for one reason or another (ie. scaling issues). I followed the work of Mark Burgees (author of Cfengine) for years and agree with Cfengine's 5 minutes intervals for just about any purpose.

- Configuration abstraction


Know how much to abstract, and where to draw the line.

Services like Memcache and MongoDB have a small set of run-time parameters. Their respective "*::config" defines can easily abstract their whole configuration files into a dozen arguments expanded into variables of a single template. Others like Redis support hundreds of run-time parameters, but if you consider that >80% of Redis servers run in production with default parameters even a 100 arguments accepted by "redis::config" is not too much. For any given server you will provide 3-4 arguments, the rest will be filled from default values, and yet when you truly need to deploy an odd-ball Redis server the flexibility to do so is there without the need to maintain a hundred redis.conf copies.

Services like MySQL and Apache2 can exist in an endless number of states, which can not be abstracted. Or to be honest they can, but you make your team miserable when you set out to make their jobs better. This is where you draw the line. For the most complex software distributions abstract only the fundamentals and commonalities needed to deploy the service. Handle everything else through "*::dotconf", "*::vhost", "*::mods" etc. defines.

- Includes


Make use of includes in services which support them, and those that don't.

Includes allow us to maintain small fundamental configuration files, which include site specific configuration from small configuration snippets dropped into their conf.d directories. This is a useful feature when trying to abstract and bring together complex infrastructures.

Services which do not support includes by default can fake them. Have the "*::dotconf" define install configuration snippets and then call an exec resource to assemble primary configuration file from individual snippets in the improvised conf.d directory (alternative approach is provided by puppet-concat). This functionality also allows you to manage shared services across shared servers, where every team provides a custom snippet in their own repository. They all end up on the shared server (after review) without the need to manage a single file across many teams (opening all kind of access-control questions).

- Service controls


Do not allow Puppet to become the enemy of the junior sysadmin.

Every defined type managing a service resource should include 3 mandatory arguments, let's call them: onboot, autorestart, and autoreload. On clustered setups it is not considered useful to bring back broken or outdated members into the pool on boot, it is also not considered useful to automatically restart such service if detected as "crashed" while it's actually down for maintenance, and often times it is not useful to restart such a service when a configuration change is detected (and in the process flush 120GB of data from memory).

Balance these arguments and provide sane defaults for every single service on its own merits. If you do not downtime will occur. You will also have sysadmins stopping Puppet agents the moment they login, naturally forgetting to start it again, and 2 weeks later you realize half of your cluster is not in compliance (Puppet monitoring is important, but is an implementation detail).

- API documentation


Document everything. Use RDoc markup and auto-generate HTML with puppet doc.

At the top of every manifest: document every single class, every single define, every single of their arguments, every single variable they search for or declare, and provide multiple usage examples for each class and define. Finally include contact information, bug tracker link and any copyright notices.

Puppet includes a tool to auto generate documentation from these headers and comments in your code. Have it run periodically refreshing your API documentation, and export it to your operations and development teams. It's not just a nice thing to do for them, it is going to save you from re-inventing the wheel on your Wiki system. Your Wiki now only needs the theory documented; what is Puppet, what is revision control, how to commit a change... and these bring me to the topics of implementation and change management, which are beyond the scope of design.


Written by anrxc | Permalink | Filed under work, code

31.05.2013 23:11

Jacks and Masters

I haven't written a single word this year. It's a busy one for me building, scaling and supporting more big sites. Interesting problems were solved, bugs were found... but still, I didn't feel like I stumbled onto anything worthy of publishing that hasn't been rehashed a 1000 times already through blogs and eventually change-logs. But thinking about my lack of material, while catching up on my podcast backlog gave me an idea to write something about the sysadmin role in all this.

Many times in the last year I returned to two books as best practices guides for building and scaling web-apps. Those are Scalability Rules and Web Operations. I recommend these books to any sysadmin interested in web operations, as they share experiences from engineers working on the top sites and there's just no other way to gain these insights unless you join them.

That brings me back to my podcast backlog. Episode 38 of the DevOps Cafe had a very interesting guest (Dave Zwieback) that talked a lot about hiring sysadmins, and generalist vs. specialist roles in systems administration today. I am a generalist which in it self is fine, but there's a big difference between "jack of all trades, master of some", and being a "master of none". I've been thinking about it since the podcast, as I wasn't thrilled with what I was doing lately, that is jumping through a lot of new technologies to facilitate all kinds of new frameworks web developers use. Often times that means skipping some kind of "logical" learning course of R&D, instead you learn enough to deploy and manage it, while the real knowledge comes within a certain period of time spent debugging problems when it breaks apart in production.

Now to tie both parts of the text together. If you want to join site reliability engineers at one of the top sites how do you justify drawing a blank when asked to explain how Varnish malloc storage works internally, if claiming you built huge caches with it? The Jack issue is amplified if you consider there is now a first, or even a second, generation of sysadmins who have never stepped into a data-center and are missing out on hardware and networking experience. Appropriate name that comes to mind is "the cloud generation", and I'm a part of it.


Written by anrxc | Permalink | Filed under work, books

09.12.2012 05:24

Hybrid IRCD for Arch Linux

Hybrid IRCD has been a favorite of mine for many years. I tried it once because a Croatian IRC network ran it and it stuck with me. I'm very happy to announce Hybrid packages for Arch Linux are available in AUR from today. I worked on it as a side project for a while and finished today thanks to the blizzard that kept me inside this weekend. Hybrid server is available as ircd-hybrid, and Hybserv2 services are available as ircd-hybrid-serv. They adhere to standards set by all other ircd providers, default configuration for both is usable out of the box, and examples for connecting services to the server are included. They were built and tested on both arches, only component not tested by me are systemd service files.


Written by anrxc | Permalink | Filed under main, code

09.12.2012 04:58

GNU/Linux and ThinkPad T420

I got a new workstation last month, a 14" laptop from the ThinkPad T series. The complete guide for TuxMobil about installing Arch Linux on it is here.

It replaced a (thicker and heavier) 13" HP ProBook 4320s which I used a little over a year, before giving up on it. In some ways ProBook was excellent, certified for SUSE Linux it had complete Linux support down to the most insignificant hardware components. In other ways it was the worst laptop I ever used. That ProBook series has chiclet-style keyboards, and I had no idea just how horrible they can be. Completely flat keys, widely spread and with bad feedback caused me a lot of wrist pain. Even after a year I never got used to the keyboard, and I was making a lot of typos, on average I would miss-type even my login every second boot. At the most basic level my job can be described as a "typist" so all this is just plain unacceptable.

The touchpad however is worse than the keyboard. It's a "clickpad", with one big surface serving as both the touchpad area and the button area. To get it in a usable state a number of patches are needed, coupled with an extensive user-space configuration. But even after a year of tweaking it was never just right. The most basic of operations like selecting text, dragging windows or pressing the middle button is an exercise in patience. Sadly clickpads are present in a huge number of laptops today.

Compared to the excellent UltraNav device in the ThinkPad they are worlds apart. Same is true of the keyboard in T420, which is simply the best laptop keyboard I've ever used. I stand behind these words as I just ordered another T420, for personal use. One could say these laptops are in different categories, but that's not entirely true. I had to avoid the latest ThinkPad models because of the chiclet-style keyboards they now have. Lenovo is claiming that's "keyboard evolution", to me they just seem cheaper to produce, and this machine could be the last ThinkPad I'll ever own. If this trend continues I don't know where to turn next for decent professional grade hardware.


Written by anrxc | Permalink | Filed under main, desktop, work

01.10.2012 01:53

Net-installing Arch Linux

Recently I had to figure out the most efficient way of net-installing Arch Linux on remote servers that fits into the deployment process, with many other operating systems, which runs a DHCP and TFTP daemons serving various operating system images.

The Arch Linux PXE wiki put me on the right track and I downloaded the archboot-x86_64 ISO, which I temporarily mounted, so I can copy the key parts of the image:

# wget http://mirrors.kernel.org/archlinux/iso/archboot/2012.06/archlinux-2012.06-1-archboot-x86_64.iso 
# mkdir /mnt/archiso
# mount -o loop,ro archlinux-2012.06-1-archboot-x86_64.iso /mnt/archiso
Let's say the TFTP daemon serves images using pxelinux, chrooted in /srv/tftpboot. The images are stored in the images/ sub-directory and the top level pxelinux.cfg configuration gets copied from the appropriate images/operating-system/ directory automatically based on the operating system selection in the provisioning tool:
# mkdir -p images/arch/arch-installer/amd64/
# cp -ar /mnt/archiso/boot/* images/arch/arch-installer/amd64/
The boot directory of the archboot ISO contains the kernel and initrd images, and a syslinux installation. I proceeded to create the pxelinux configuration to boot them, ignoring syslinux:
# cd images/arch/
# mkdir arch-installer/amd64/pxelinux.cfg/
# emacs arch-installer/amd64/pxelinux.cfg/default

  prompt 1
  timeout 1
  label linux
    kernel images/arch/arch-installer/amd64/vmlinuz_x86_64
    append initrd=images/arch/arch-installer/amd64/initramfs_x86_64.img gpt panic=60 vga=normal loglevel=3

# ln -s arch-installer/amd64/pxelinux.cfg ./pxelinux.cfg
To better visualize the end result, here's the final directory layout:
arch-installer/
arch-installer/amd64/
arch-installer/amd64/grub/*
arch-installer/amd64/pxelinux.cfg/
arch-installer/amd64/pxelinux.cfg/default
arch-installer/amd64/syslinux/*
arch-installer/amd64/initramfs_x86_64.img
arch-installer/amd64/vmlinuz_x86_64
arch-installer/amd64/vmlinuz_x86_64_lts
pxelinux.cfg/
pxelinux.cfg/default
I left the possibility of including i686 images in the future, but that is not likely ever to happen due to almost non-existent demand for this operating system on our servers. Because of that fact I didn't spend any time on further automation, like automated RAID assembly or package pre-selection. On the servers I deployed assembling big RAID arrays manually was tedious, but really nothing novel compared to dozens you have to rebuild or create every day.

From a fast mirror the base operating system installs from the Arch [core] repository in a few minutes, and included is support for a variety of boot loaders, with my favorite being syslinux which in Arch Linux has an excellent installer script "syslinux-install_update" with RAID auto detection. I also like the fact 2012.06-1 archboot ISO still includes the curses menu based installer, which was great for package selection, and the step where the base configuration files are listed for editing. Supposedly the latest desktop images now only have helper scripts for performing installations - but I wouldn't know for sure as I haven't booted an ISO in a long time, Arch is an operating system you install only once, the day you buy the workstation.

Another good thing purely from the deployment standpoint is the rolling releases nature, as the image can be used to install the latest version of the operating system at any time. Or at least until the systemd migration which might obsolete the image, but I dread that day for other reasons - I just don't see its place on servers, or our managed service with dozens of proprietary software distributions. But right now, we can deploy Arch Linux half way around the globe in 10 minutes, life is great.


Written by anrxc | Permalink | Filed under work