Cassandra Consistency and Failover – Part 2

In Part 1 of this article, we saw how to configure Apache Cassandra across multiple Datacenters (DCs) as well as the clients for Cassandra solution availability. This ensures that if sufficient nodes in one DC were unavailable (or the entire DC went down), clients would seamlessly failover to the other DC. When the failed DC came back up, Cassandra would stream back the missed transactions and everything would be back to normal.

In this Part 2 of the article, we will expand our outlook to consider the whole solution – not just the database to understand how failures affect the Cassandra solution availability.

Cassandra backed application

Cassandra is a highly scalable and fast database typically used for storing high volume data such as those that arise from IoT sensors, mobile games or apps or web applications with a large user base. With the DataStax Enterprise (DSE) version that bundles Spark and Solr, there are many other interesting use cases as well. But for the purposes of this article, we will keep our focus on a simple web application; the architecture for which might look like something that is shown below:

 

Web app Cassandra architecture

Users from across the country access a set of web servers through load balancers (not shown). The request is processed by an application server tier which accesses the backed Cassandra database to fetch and store data. Scalability is built into this architecture – as our app gains new users, we can seamlessly scale all tiers to handle the load. 

Failover and Availability

As with most applications, let us assume our entire solution is deployed on cloud infrastructure like AWS. How do we ensure that our application is highly available even if an entire Availability Zone (AZ) goes down? What if an entire region (say US-West) is not accessible due to a network glitch? 

In Part 1 of this article, I showed you how to setup and configure Cassandra to be able to survive a DC failure. Our architecture using multiple DCs for Cassandra could then look like the image below.

Cassandra Solution Availability

The Cassandra database is replicated across DCs so that if one goes down, the application can failover to the other DC automatically.  This assumes that the application itself is highly available. But this assumption is likely flawed as if a DC goes down, most likely all tiers deployed on that DC will also go down.

But what good is it if Cassandra can failover to a different DC but the web and app tiers cannot? 

Cassandra Solution Availability 

We need to ensure Cassandra solution availability i.e. the entire application is highly available while ensuring that users from a particular region access only the resources within that region for maximum performance. The correct solution then, is spread the web and app server layers across the DCs. This is shown in the diagram below. Note that this still assumes that load on US-West is lower so the Cassandra US-West DC has lower number of nodes. 

 

 

Cassandra Solution Availability

Each DC’s stack only accesses the Cassandra nodes within its own DC using LOCAL_QUORUM. If a DC failure occurs, DNS switches users to the web servers in the other DC. In this case, these users from the remote DC will see longer latencies but the application will be available along with all their data. Note that Cassandra will still replicate data across the DCs ensuring consistency but each app server will only access the local DC.

Conclusion

For true high availability, it is not sufficient to look at one piece of the solution in isolation. The entire Cassandra solution availability must be taken care of by the architecture.

Multi-AZ Cassandra

Cassandra Failover and Consistency

Apache Cassandra is the always-on NoSQL database that is highly scalable and available. That sounds magical and is in fact true – IF you understand how to configure it correctly ! This article describes an issue we ran into when setting up a multi-DC configuration for Cassandra failover and how it was resolved.

Cassandra Configuration

Single Region, Dual AZ

The diagram below shows the initial system configuration for a cluster deployed across two availability zones (AZ) in the US-East region of AWS.

Multi-AZ Cassandra

We  configured Cassandra to use multiple DataCenters with each AZ being in one DC. The replication factor was set to 3. Cassandra ensures that at least one replica of each partition will reside across the two data centers.

Thus the dual AZs provide some protection against failures –  if one AZ went down, the database would still be up. 

But what happens if the entire US-East Region became unavailable? An always-on database that is accessed from anywhere in the country should be able to survive a full region failure (or most likely network failure to the region).

Cassandra Failover using Multi-Region, Multi-AZ

We modified the above configuration for protection against region failure but without adding any additional nodes. In future, we can enhance it with additional nodes as required. The beauty of Cassandra is that you do not have to over-provision infrastructure – nodes can be added dynamically without stopping operations (albeit carefully!)

Shown below is the multi-region configuration. The definition of the DC was changed. We still use 2 DCs but now all 4 nodes in US-East are in a single DC while the 2 nodes in US-West are in the second DC.

Client Configuration

Contact Points

In a Multi-DC configuration, it is important to ensure that clients are connecting to the “local” DC by default. You do this by using a load balancing policy that is both token-aware and DC-aware. For example, if a web application is running in US-East, you want it to access the Cassandra nodes in the US-East DC. This will ensure that it doesn’t suffer from latency by going across the country in its default mode. You can specify the “contact points”  for the client to connect to as well – it is a good idea to specify at least one node in each DC. An example to do this programmatically is shown below:

Cluster cluster = Cluster.builder()
    .addContactPoint(new InetSocketAddress("1.2.3.4", 9042))
    .addContactPoint(new InetSocketAddress("5.6.7.8", 9042))
.withLoadBalancingPolicy(new TokenAwarePolicy(
new DCAwareRoundRobinPolicy(localDC,
usedHostsPerRemoteDC, allowRemoteDCsForLocalConsistencyLevel
)))
    .build();

These settings can also be configured in the application.conf file. See the Java Driver documentation for details.

Consistency Level

The default Consistency Level in Cassandra is ONE i.e. when a query reads or writes a partition, it is sufficient for one of the replicas to acknowledge the query. This level is insufficient for most non-trivial applications; especially not for applications deploying across multiple data centers that require high availability.

In general, a consistency level of QUORUM applied to both reads and writes ensures that a Cassandra database is consistent. In our example, this implies that at least 2 replicas needs to acknowledge completion of the transaction. 

When a cluster is deployed across multiple regions, QUORUM could be problematic as it requires the coordinator to constantly send requests across regions that can cause significant latencies. To avoid performance issues, it is recommended to use LOCAL_QUORUM i.e. a quorum of nodes within the “local” DC is used to maintain consistency. Therefore, we configured the consistency level to be LOCAL_QUORUM.

Cassandra Failover Test

Now that the cluster and clients were configured, the next step was to do a Cassandra failover test. That is, we want to ensure that if all the US-East nodes failed, Cassandra would automatically failover to the US-West DC and clients would be blissfully unaware !

All the nodes in US-East were brought down; running a query after this failed with the following error:

Not enough replicas available for query at consistency LOCAL_QUORUM.

The LOCAL_QUORUM consistency level while performant, does not allow for the query to switch DCs. If the requirement is to have a transparent failover while maintaining a consistency across 2 nodes, then a better strategy is to use a consistency level of 2. With a level of 2, if the client cannot contact any of the local hosts, it will automatically switch to the remote DC.

Conclusion

Cassandra is an always-on database that can provide tremendous scalability. However, ensuring that it remains always-on requires that it be configured correctly while also maintaining its high performance.

 

Data Aces has experience developing and managing Apache Cassandra and DataStax Enterprise deployments. Please contact us for more information.