It's a common occurrence to have two and more load balancers as HA
frontends to databases at high traffic sites. I've used the
open-source HAproxy like this, and
have seen others use it. Building this infrastructure and getting the
traffic distributed evenly is not really the topic I'd like to write
about, but what happens after you do.
Using HAproxy like this in front of replicated database backends is tricky, a flap on one part of the network can make one or more frontends activate the backup backends. Then you have a form of split-brain scenario on your hands with updates occurring simultaneously to all masters in a replicated set. Redis doesn't do multi-master replication and it's easier to get in trouble, with just one HA frontend, if it happens the old primaries are reactivated before you synced them with new ones.
One way to avoid this problem is building smarter infrastructure. Offloading health checks and role directing to an independent arbiter. But having one makes it a single point of failure, having more makes it another replicated nightmare to solve. I was never keen on this approach because solving it reliably is an engineering challenge each time, and I have the good sense of knowing when it can be done better by smarter people.
Last year I've been pestering HAproxy developers to implement cheap features as a start. Let's say if a fail-over to backup happens to keep the old primary permanently offline with a new special directive, which would be more reliable than gaming health check counters. Request was of course denied, they are not in it to write hacks. They always felt the agents are the best approach, and that the Loadbalancer.org associates might even come up with a common 'protocol' for health and director agents.
But developers heard my case, and I presume others who discussed the same infrastructure. HAproxy 1.5 which is about to be released as the new stable branch (source: mailing list) implements peering. Peering with the help of stick-tables, whose other improvements will bring many advancements to handling bad and unwanted traffic, but that's another topic (see HAproxy blog).
Peering synchronizes server entries in stick-tables between many HAproxy instances over TCP connections, and a backend failing health checks on one HA frontend will be removed from all. Using documentation linked above here's an example:
peers HAPEERS peer fedb01 192.168.15.10:1307 peer fedb02 192.168.15.20:1307 backend users mode tcp option tcplog option mysql-check user haproxy stick-table type ip size 20k peers HAPEERS stick on dst balance roundrobin server mysql10 192.168.15.33:3306 maxconn 500 check port 3306 inter 2s server mysql12 192.168.15.34:3306 maxconn 500 check port 3306 inter 2s backup #backend uploadsWhen talking about Redis in particular I'd like to emphasize improvements in HAproxy 1.5 health checks, which will allow us to query Redis nodes about their role directly, and fail-over only if a backend became the new master. If Redis Sentinel is enabled and the cluster elects a new master HAproxy will fail-over traffic to it transparently. Using documentation linked above here's an example:
backend messages mode tcp option tcplog option tcp-check #tcp-check send AUTH\ foobar\r\n #tcp-check expect +OK tcp-check send PING\r\n tcp-check expect +PONG tcp-check send info\ replication\r\n tcp-check expect string role:master tcp-check send QUIT\r\n tcp-check expect string +OK server redis15 192.168.15.40:6379 maxconn 1024 check inter 1s server redis17 192.168.15.41:6379 maxconn 1024 check inter 1s server redis19 192.168.15.42:6379 maxconn 1024 check inter 1s