It's a common occurrence to have two and more load balancers as HA
frontends to databases at high traffic sites. I've used the
open-source HAproxy like this, and
have seen others use it. Building this infrastructure and getting the
traffic distributed evenly is not really the topic I'd like to write
about, but what happens after you do.
Using HAproxy like this in front of replicated database
backends is tricky, a flap on one part of the network can make one or
more frontends activate the backup backends. Then you have a
form of split-brain scenario on your hands with updates occurring
simultaneously to all masters in a replicated set. Redis doesn't do
multi-master replication and it's easier to get in trouble, with just
one HA frontend, if it happens the old primaries are reactivated
before you synced them with new ones.
One way to avoid this problem is building smarter
infrastructure. Offloading health checks and role directing to an
independent arbiter. But having one makes it a single point of
failure, having more makes it another replicated nightmare to solve. I
was never keen on this approach because solving it reliably is an
engineering challenge each time, and I have the good sense of knowing
when it can be done better by smarter people.
Last year I've been pestering HAproxy developers to implement cheap
features as a start. Let's say if a fail-over to backup
happens to keep the old primary permanently offline with a new special
directive, which would be more reliable than gaming health check
counters. Request was of course denied, they are not in it to write
hacks. They always felt the agents are the best approach, and that
the Loadbalancer.org associates might even come up with a
common 'protocol' for health and director agents.
But developers heard my case, and I presume others who discussed the
same infrastructure. HAproxy 1.5 which is about to be released as the
new stable branch (source: mailing list) implements
peering. Peering
with the help of stick-tables, whose other improvements will
bring many advancements to handling bad and unwanted traffic, but
that's another topic
(see HAproxy
blog).
Peering synchronizes server entries in stick-tables between many
HAproxy instances over TCP connections, and a backend failing health
checks on one HA frontend will be removed from all. Using
documentation linked above here's an example:
peers HAPEERS peer fedb01 192.168.15.10:1307 peer fedb02 192.168.15.20:1307 backend users mode tcp option tcplog option mysql-check user haproxy stick-table type ip size 20k peers HAPEERS stick on dst balance roundrobin server mysql10 192.168.15.33:3306 maxconn 500 check port 3306 inter 2s server mysql12 192.168.15.34:3306 maxconn 500 check port 3306 inter 2s backup #backend uploadsWhen talking about Redis in particular I'd like to emphasize improvements in HAproxy 1.5 health checks, which will allow us to query Redis nodes about their role directly, and fail-over only if a backend became the new master. If Redis Sentinel is enabled and the cluster elects a new master HAproxy will fail-over traffic to it transparently. Using documentation linked above here's an example:
backend messages mode tcp option tcplog option tcp-check #tcp-check send AUTH\ foobar\r\n #tcp-check expect +OK tcp-check send PING\r\n tcp-check expect +PONG tcp-check send info\ replication\r\n tcp-check expect string role:master tcp-check send QUIT\r\n tcp-check expect string +OK server redis15 192.168.15.40:6379 maxconn 1024 check inter 1s server redis17 192.168.15.41:6379 maxconn 1024 check inter 1s server redis19 192.168.15.42:6379 maxconn 1024 check inter 1s