Apache Cassandra is the always-on NoSQL database that is highly scalable and available. That sounds magical and is in fact true – IF you understand how to configure it correctly ! This article describes an issue we ran into when setting up a multi-DC configuration for Cassandra failover and how it was resolved.
Single Region, Dual AZ
The diagram below shows the initial system configuration for a cluster deployed across two availability zones (AZ) in the US-East region of AWS.
Cassandra was configured in a multi-DC mode with each AZ being in one DC. The replication factor was set to 3. Cassandra ensures that at least one replica of each partition will reside across the two data centers.
Thus the dual AZs provide some protection against failures – since even if one AZ went down, the database would still be up.
But what happens if the entire US-East Region became unavailable? An always-on database that is accessed from anywhere in the country should be able to survive a full region failure (or most likely network failure to the region).
Cassandra Failover using Multi-Region, Multi-AZ
We modified the above configuration for protection against region failure but without adding any additional nodes. In future, it can be enhanced with additional nodes. The beauty of Cassandra is that you do not have to over-provision infrastructure – nodes can be added dynamically without stopping operations (albeit carefully!)
Shown below is the multi-region configuration. The definition of the DC was changed. We still use 2 DCs but now all 4 nodes in US-East are in a single DC while the 2 nodes in US-West are in the second DC.
In a Multi-DC configuration, it is important to ensure that clients are connecting to the “local” DC by default. You do this by using a load balancing policy that is both token-aware and DC-aware. For example, if a web application is running in US-East, you want it to access the Cassandra nodes in the US-East DC. This will ensure that it doesn’t suffer from latency by going across the country in its default mode. You can specify the “contact points” for the client to connect to as well – it is a good idea to specify at least one node in each DC. An example to do this programmatically is shown below:
Cluster cluster = Cluster.builder()
.addContactPoint(new InetSocketAddress("126.96.36.199", 9042))
.addContactPoint(new InetSocketAddress("188.8.131.52", 9042))
These settings can also be configured in the application.conf file. See the Java Driver documentation for details.
The default Consistency Level in Cassandra is ONE i.e. when a query reads or writes a partition, it is sufficient for one of the replicas to acknowledge the query. This level is insufficient for most non-trivial applications; especially not for applications deploying across multiple data centers that require high availability.
In general, a consistency level of QUORUM applied to both reads and writes ensures that a Cassandra database is consistent. In our example, this implies that at least 2 replicas needs to acknowledge completion of the transaction.
When a cluster is deployed across multiple regions, QUORUM could be problematic as it requires the coordinator to constantly send requests across regions that can cause significant latencies. To avoid performance issues, it is recommended to use LOCAL_QUORUM i.e. a quorum of nodes within the “local” DC is used to maintain consistency. Therefore, we configured the consistency level to be LOCAL_QUORUM.
Cassandra Failover Test
Now that the cluster and clients were configured, the next step was to do a Cassandra failover test. That is, we want to ensure that if all the US-East nodes failed, Cassandra would automatically failover to the US-West DC and clients would be blissfully unaware !
All the nodes in US-East were brought down; running a query after this failed with the following error:
Not enough replicas available for query at consistency LOCAL_QUORUM.
The LOCAL_QUORUM consistency level while performant, does not allow for the query to switch DCs. If the requirement is to have a transparent failover while maintaining a consistency across 2 nodes, then a better strategy is to use a consistency level of 2. With a level of 2, if the client cannot contact any of the local hosts, it will automatically switch to the remote DC.
Cassandra is an always-on database that can provide tremendous scalability. However, ensuring that it remains always-on requires that it be configured correctly while also maintaining its high performance.