11.01.2014 23:00

Load balancing Redis and MySQL with HAproxy

It's a common occurrence to have two and more load balancers as HA frontends to databases at high traffic sites. I've used the open-source HAproxy like this, and have seen others use it. Building this infrastructure and getting the traffic distributed evenly is not really the topic I'd like to write about, but what happens after you do.

Using HAproxy like this in front of replicated database backends is tricky, a flap on one part of the network can make one or more frontends activate the backup backends. Then you have a form of split-brain scenario on your hands with updates occurring simultaneously to all masters in a replicated set. Redis doesn't do multi-master replication and it's easier to get in trouble, with just one HA frontend, if it happens the old primaries are reactivated before you synced them with new ones.

One way to avoid this problem is building smarter infrastructure. Offloading health checks and role directing to an independent arbiter. But having one makes it a single point of failure, having more makes it another replicated nightmare to solve. I was never keen on this approach because solving it reliably is an engineering challenge each time, and I have the good sense of knowing when it can be done better by smarter people.

Last year I've been pestering HAproxy developers to implement cheap features as a start. Let's say if a fail-over to backup happens to keep the old primary permanently offline with a new special directive, which would be more reliable than gaming health check counters. Request was of course denied, they are not in it to write hacks. They always felt the agents are the best approach, and that the Loadbalancer.org associates might even come up with a common 'protocol' for health and director agents.

But developers heard my case, and I presume others who discussed the same infrastructure. HAproxy 1.5 which is about to be released as the new stable branch (source: mailing list) implements peering. Peering with the help of stick-tables, whose other improvements will bring many advancements to handling bad and unwanted traffic, but that's another topic (see HAproxy blog).

Peering synchronizes server entries in stick-tables between many HAproxy instances over TCP connections, and a backend failing health checks on one HA frontend will be removed from all. Using documentation linked above here's an example:

    peer fedb01
    peer fedb02

backend users
    mode tcp
    option tcplog
    option mysql-check user haproxy
    stick-table type ip size 20k peers HAPEERS
    stick on dst
    balance roundrobin
    server mysql10 maxconn 500 check port 3306 inter 2s
    server mysql12 maxconn 500 check port 3306 inter 2s backup

#backend uploads
When talking about Redis in particular I'd like to emphasize improvements in HAproxy 1.5 health checks, which will allow us to query Redis nodes about their role directly, and fail-over only if a backend became the new master. If Redis Sentinel is enabled and the cluster elects a new master HAproxy will fail-over traffic to it transparently. Using documentation linked above here's an example:
backend messages
    mode tcp
    option tcplog
    option tcp-check
    #tcp-check send AUTH\ foobar\r\n
    #tcp-check expect +OK
    tcp-check send PING\r\n
    tcp-check expect +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server redis15 maxconn 1024 check inter 1s
    server redis17 maxconn 1024 check inter 1s
    server redis19 maxconn 1024 check inter 1s

Written by anrxc | Permalink | Filed under work