Load balancing

Load-balancing, this is the capability to direct queries to a member of a server farm in turn.

This action can be simply done by using DNS Round-Robin, that's when you configure in a DNS more than one A record for the same name. So at each query to resolve this name, the DNS server will respond with a different IP address, cycling into the set of A records provided for this unique name.
But in this scenario, there are a lot of potential problems :
  • what about DNS caching ? Until the DNS answer is freed from the cache, the client reuse all the time the same IP address. If we want to avoid caching by using a Time To Live of 0 second in the DNS, we will load our DNS server more than it may handle
  • what about session persistence ? If for each request, coming from the same client, we forward it to a different host, in case of login, for instance, we have to re-log on each of the server, possibly loosing what we've done on the previous server.
  • what about dead peer detection ? If one of the server pointed by the DNS A record is not there, the DNS server will never know it and will continue to offer IP address of the dead host, causing connections to fail miserably.
  • what about response time of the peer ? It is possible that in our server farms we have machines more performant than others and therefore we want to give them more chance to receive requests than the other. You cannot give a weight to the A records in the DNS.
Since a long time, the Linux kernel includes the tool to do load-balancing : LVS - Linux Virtual Server. It support many features for the distribution of load accross various target servers : various routing modes accomodating various infrastructures topologies, weight, sessions persistence, ... To use it and configure it, you just have to install the ipvsadm utility presents in all Linux distribution (in RedHat, Fedora, CentOS, Ubuntu and Debian, the package is called ipvsadm - rpm or deb depending of the core packaging system).
As a load-balancer cannot be a single point of failure either, if you require to configure your LVS setup in high-availability too, you will need the Keepalived software. What it does is to maintain on at least two systems, a active-passive load-balancer configuration. Keepalived uses VRRP packets to notify and be notify of the state of other participating host. If the master would disapear, a slave will take over its role, bringing up IP and LVS rules.
Keepalived includes many features beside managing the IP and the rules :
  • checking if any target servers is still responsive. If not, remove it dynamically from the list of targets
  • synchronization of the LVS rules between the nodes
  • synchronization of connections states between the nodes