Failover of database

When it comes to high-availability of databases, numerous strategies can be applied.

Do we need a complete active-active solution doing load-balancing ? Or do we have enough with a standby database ?
The technical implications behind these choices are totally different, also the reaction to outages will be different of course.

Here are the three main scenarios of high-availability and failover for a database that we see : active-active (or multi-master), active-passive (or master-slave) and stand-by database. Please note that in these three scenario, we can have more than two nodes.

Active-Active or multi-master replication / synchronisation

Active-active will be of application when the data from one node is immediately replicated to the other node but without leaving the "copy" in a read-only state. This means that we implement a two-way replication, where data modified on one side will be also changed on the other one. You understand that with this model, if the same data is modified differently on both nodes at the same time, we will end with a replication conflict. How to solve it, must be left to the system to decide and to inform the application that the data she wrote is in fact not written. In case of failure of one node, the applications needs to switch to the other one, using a different target.
 
So, in this case, when a failure occurs, the switch is done automatically at the application level, or at least by DB connection layer. For instance, one can write an application that use a driver (ODBC, JDBC, ...) that accepts more than one database connection parameter to allow the driver itself to connect to the first DB which is able to answer. Or it can be left the application server and its DB connection pool to select a database which is up. To summarize, all nodes of such a DB cluster are accessible independantly of the other ones but the data will be replicated between each of them.
 
This kind of replication can be synchrone or asynchrone, depending if the replication happens while the data is written or some time after.

A solution for this is Bucardo for PostgreSQL
 

Active-passive or master-slave replication / synchronisation

Active-passive will be a situation where the data from one node is directly replicated to the other side, but the other side stay in read-only mode. So data can only change on the active node and there is no replication conflicts. But when time comes to activate the passive node, some delay time may be required before the read-only data are made available read-write.

In this case of configuration, we will combine the software doing the replication with a clustering software (UCARP, Heartbeat, ...) managing a virtual IP for the active node and a set of activation scripts to promote a passive node into an active node when a failure occurs.

This kind of replication can be synchrone or asynchrone, depending if the replication happens while the data is written or some time after.
 
A possible solution for this is Slony for PostgreSQL combined with UCARP.
 

Stand-by database or log-shipping replication

Stand-by database is an approach of the high-availability where an external database receives the archive logs of the active database. By replaying these logs, the stand-by database will have the same data as in the active database. In this case, both databases (the active and the stand-by) must be absolutely identical, in term of schema and of data. What is not the case in  the previous two scenarios where the initial replication will copy the data that are not present in the target. 
We call warm stand-by a stand-by database that is continuously performing log recovery and cannot be used even by read-only queries.
We call hot stand-by a stand-by database that can be used to perform read-only queries. 

This kind of replication will be asynchrone because logs are updated only after a transaction is comitted. Also, if the logs needs to go from one server to another one, the file transfer will take some time (even if it is a few milliseconds only).

Some database products, like PostgreSQL, offer a feature called streaming replication, in this mode of log replication, the stand-by servers open itself a connection to the master that will then start flowing log record on this network connection. So there is no need to set up a file exchange, external to the database. By default, in Postgres, this is asynchrone but can be configured to be synchrone. Take special care to the latency in the network connection between the master and the stand-by because a transaction can only be comitted when the master is aware that the log for this transaction has been successfully received and handled by the stand-by. Therefore this will introduce some delay in the application response time.

Depending on how often the logs are flowed from one node to the other one, there will be a more or less big delta between the data on the active database and the stand-by one.

PostgreSQL supports Warm Standby in version 8.x and 9.x. Hot Standby is supported since version 9.0 only. Streaming replication is available in PostgreSQL since version 9.0.