High availability databases use an architecture that is designed to continue to function normally even when there are hardware or network failures within the system. They have emerged as an alternative to traditional relational databases, which are generally built to be deployed on a single server and rely on a master/replica architecture to provide availability. In the master/replica model, only the master is available for data updates unless it fails, at which time a replica is elected as a new master to take over. In theory, this model addresses the issue of availability, but in practice, there are still potential single points of failure. Architectural complexity can create issues in the transfer of the master role during failover and result in downtime.
High availability databases are built to eliminate single points of failure and are optimized to ensure that the end user does not experience an interruption in service or a degradation in user experience when hardware or networks fail. They are generally designed to make it easier and less complex to ensure high availability in a multiple node and cluster environment than is possible with a relational database. This is often accomplished through a masterless architecture that uses clustering, in which multiple servers are grouped together. Because there is no master, any server within a cluster can respond to read or write requests. Data is then replicated across all servers in the cluster, providing system redundancy and minimizing the possibility of downtime.
No discussion of high availability databases and NoSQL can occur without at least briefly touching on Dr. Eric Brewer’s CAP Theorem. The CAP Theorem states that it is impossible for a distributed computer system to provide simultaneously:
The theorem argues that a system can be both available and consistent but in the presence of a system partition it is not possible to maintain Consistency and Availability. In selecting a NoSQL database, at a high level, you basically have to select two – AP, CP or CA. For a more detailed review of different NoSQL databases, more details on trade-offs and a top down approach to selecting a NoSQL database for your use case, we suggest the article NoSQL Databases: a Survey and Decision Guidance by Felix Gesser. Riak KV is classified as a highly available and partition tolerant database whereas MongoDB is partition tolerant and consistent but not highly available. Now for more on availability.
NoSQL (for “not only SQL”) is a type of database that is increasingly being adopted to provide high availability. However, as stated above, not all NoSQL databases are highly available. In choosing a database for high availability, there are some important factors to consider, including:
MAXIMUM RESILIENCY: There are different mechanisms for ensuring redundancy in a highly available NoSQL cluster. To achieve maximum resiliency, a NoSQL database with a masterless, distributed architecture is an effective choice because data is automatically distributed evenly across a cluster with no single master. Because there are multiple redundancies built into the system by default, a masterless, distributed database is very effective at eliminating single points of failure and preventing downtime.
AVAILABILITY VS. CONSISTENCY: One trade-off in with a highly available database is that it prioritizes availability over strict data consistency. Ensuring that read and write requests can be accepted even when multiple servers are offline or otherwise unreachable means that data may not be consistent across the environment for a period of time, usually a few milliseconds (known as “eventual consistency”). In cases where strict consistency is critical, such as certain transactions, this may not be acceptable. However, for many use cases where data unavailability can cause a loss of revenue, damage user trust, or result in a poor user experience—such as gaming, retail, and advertising—high availability must be balanced with the level of consistency.
READ AND WRITE AVAILABILITY: In the event of a server or network failure, some databases—even some “high availability” NoSQL databases—will accept write requests but not allow the data to be retrieved until after the cluster is repaired. This can lead to duplicate requests being resubmitted. For example, a user who does not see items show up in a shopping cart might re-add them. For applications with this type of consideration, it may be important to select a database that is designed to ensure that read and write data is always available without exception.
HIGH AVAILABILTY IS ALL ABOUT THE ARCHITECTURE.
How a database’s underlying architecture deals with failure modes is the key to minimizing downtime.
YOUR DATA IS ALWAYS AVAILABLE WITH RIAK KV.
Riak KV is a masterless, distributed database that is specifically designed to provide maximum resiliency.
For a broader discussion on high-availability clustering visit this Wiikipedia entry.