The Apache Cassandra database has long included support for multiple datacenters. That is, Cassandra allows an organization to configure a cluster to actively store data across disparate datacenters. There are multiple reasons do this, including:
- Improve geographic responsiveness
- Disaster recovery
- Separation of transactional and analytic workloads
In general, when trying to wrap my head around some technical details, it’s usually helpful to simplify the scenario, removing as many variables as possible. In this case, I really want to learn more about multi-datacenters with Cassandra. I really don’t want to have to spend a lot of time trying to set up a multi-datacenter scenario using a cloud provider, wrangling Kubernetes, read up on Terraform, and so forth. I really want to try to make things as simple as reasonably possible. Fortunately, most of want to try can be achieved with Docker. (Yes, I get that this is another layer, but it’s a little bit more narrow in scope.)
As we’ll see, by using Docker containers on a local machine, we can simulate a multi-datacenter setup.
Docker Support for Cassandra
Launching multiple Docker containers on a single box is usually very simple. However, getting all the networking bits to work so that the containers can talk to each other often trips me up. That’s why I was so happy to read an embarrassingly simple approach in the third edition of Cassandra: The Definitive Guide (on pages 226 and 227). It turns out that I had most of the key bits in place, but it was the intersection of introducing the concept of the Docker network and the official Docker Cassandra image that really got me past futzing with Docker to focusing on Cassandra itself.
The Docker community has been maintaining and hosting an official Cassandra Docker image on the Docker Hub website since 2015. They support multiple versions of Cassandra from the
2.1 series up to the most current, stable version of the
The overview page provides some basic usage of how to run the container and even how to create a little local cluster. But I’ll provide the
tl;dr summary in the following.
Creating Your First Multiple Local Cassandra Datacenters with Docker
One interesting byproduct of being able to create a Cassandra cluster via Docker on our local machines is that we can create the illusion of multiple, regionally separate datacenters, one on the
west coast and the other on the
We’ll be “automating” this a bit with a pinch of
bash scripting. Here’s the complete script:
Lines 29 and 30 are the key pieces from the above script:
CASSANDRA_DC. Sets the datacenter ID of this container/node. In our example, we use
westas the names of our datacenters. As you can see, it’s possible to dynamically assign the name; it doesn’t have to be configured or set up beforehand.
CASSANDRA_ENDPOINT_SNITCH. Setting this to
GossipingPropertyFileSnitchis required in order for the nodes
to discover each other.
Once that script has finished running, we can test that our two datacenters are active within the same cluster:
docker exec -it $seed_name nodetool status
Did it work?
There are many different directions to go from here to test out Cassandra’s behavior using multiple datacenters. Here are some I plan to explore:
- Create Cassandra keyspaces that evenly distribute replicas across the different datacenters
- Create Cassandra keyspaces that unevenly distribute replicas across the different datacenters
- Create Cassandra keyspaces that distribute replicas across a subset of the different datacenters
- Try using different consistency levels in
- Experiment with
tracingand see how much more node-to-node communication the cross-datacenter actions create
Well… there you have it. Hopefully that will help you to get a little more familiar with the behavior of Cassandra’s out-of-the-box multi-datacenter capabilities without having to spend too much time provisioning, configuring, and installing Cassandra on actual boxes/VMs/instances/etc. Of course, this is nowhere near a production system for a lot of reasons. It does, however, give you a nice playground in which to start exploring.
Now go back to baking sourdough bread.