In this tutorial, we will learn how to configure a multi-node cluster with (single datacenter) Cassandra on Ubuntu 18.04 .In the previous tutorial we discussed configure cassandra on single node which can be read from how to run a single-node Cassandra cluster.
Why cassandra Cluster
Cassandra is very fault-tolerant. It can be scaled to hundreds or thousands of nodes where data is automatically replicated. Even if you lose an entire data center your data will be safe.
Replication across data centers is also supported by Cassandra. Best of all, Cassandra is decentralized meaning that there is no single point of failure. Failed nodes can be replaced without any downtime.
Before We Begin
Because you’re about to build a multi-node Cassandra cluster, you must determine how many servers you’d like to have in your cluster and configure each of them.
Remember it’s standard to have at least 3 nodes, and in a basic 3 separate server configuration.But for this tutorial we will use 2 nodes ,that is two Cassandra server running on two seperate ubuntu 18.04 servers .So before the configuration we must have :
- Two ubuntu 18.04 servers with sudo privileges
- Each server must also have Cassandra installed by following this Cassandra installation guide.
Prepare the Cassandra Nodes for Clustering
It is assumed that the following is already in place:
- Cassandra 3.11.7 is installed on 2 nodes.
- Each node has open communication between the other nodes.
- The IP addresses of each node are known.
- No data is stored on the 2 Cassandra instances.
Step 1 : Clear existing Cassandra data
Servers in a Cassandra cluster are known as nodes. What you have on each server right now is a single-node Cassandra cluster. In this step, we’ll set up the nodes to function as a multi-node Cassandra cluster.
All the commands in this and subsequent steps must be repeated on each node in the cluster, so be sure to have as many terminals open as you have nodes in the cluster.
If you’ve already started your Cassandra instance you’ll need to stop it and remove the data it contains. The main reason for this is because the cluster_name needs to be the same on all nodes, and it’s best to choose one for yourself rather than use the default Test Cluster.
Note : My first cassandra server node ip : 10.0.3.246 and node2 ip : 10.0.3.4. I need to perform the further steps on both the servers.
sudo service cassandra stop
When that’s completed, delete the default dataset.
sudo rm -rf /var/lib/cassandra/data/system/*
Step 2: Configuring the Cluster
There is no central master in a Cassandra cluster. Instead you just make each one aware of the others and they work together.
Cassandra is configured using various files in the /etc/cassandra directory. The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses.
Only the following directives need to be modified to set up a multi-node Cassandra cluster:
cluster_name: This is the name of your cluster.It can be anything chosen by you to describe the name of the cluster. Space is allowed but make sure you wrap everything in quotes. All members of this cluster must have the same name.
-seeds: This is a comma-delimited list of the IP address of each node in the cluster.Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.It’s recommended that there are 3 seed nodes per data centre
listen_address: This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs changed to the IP address of the node.
rpc_address: This is the IP address for remote procedure calls. It defaults to localhost. If the server’s hostname is properly configured, leave this as is. Otherwise, change to server’s IP address or the loopback address (
endpoint_snitch: Name of the snitch, which is what tells Cassandra about what its network looks like. This defaults to SimpleSnitch, which is used for networks in one datacenter. In our case, we’ll change it to GossipingPropertyFileSnitch, which is preferred for production setups.
Open the configuration file for editing using
nano or your favorite text editor.
sudo nano /etc/cassandra/cassandra.yaml
Search the file for the following directives and modify them as below to match your cluster. Replace
your_server_ip with the IP address of the server you’re currently working on.
- seeds: list should be the same on every server, set the IP address on one machine to be the seed. It is not necessary that all machines be seeds. Seeds are nodes that Cassandra nodes use when you start Cassandra start to find other nodes.
by default it will look like seeds: “127.0.0.1” replace value with your node 1 ip (I am giving private of ip my first ec2 instance here )
Example for node 1:
cluster_name: 'Tecnotes Cluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider - seeds: 10.0.3.246 listen_address: 10.0.3.246 rpc_address: 10.0.3.246 endpoint_snitch: GossipingPropertyFileSnitch
Example for node 2:
cluster_name: 'Tecnotes Cluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider - seeds: 10.0.3.246 listen_address: 10.0.3.4 rpc_address: 10.0.3.4 endpoint_snitch: GossipingPropertyFileSnitch
Note:If you have 3 or more nodes ,You can give first and second node ips in seeds directive seperated by commas like :– seeds: first node ip,second node ip
To know more about Please read Internode communications
When you’re finished modifying the file, save and close it.
Cassandra is built to be fault tolerant and will distribute data to try to minimize the risk of a failure causing loss of data or any downtime. Cassandra therefore has the understanding of a node, a rack and a data centre. Where possible, Cassandra will ensure that the data and it’s backups are stored on a different rack and a different data centre to ensure that failure, even at a data centre level isn’t catastrophic.
You can edit the cassandra-rackdc.properties file on each node and set the dc and rack attributes. For an example we’ll assume everything is in the same dc, dc1 however two nodes will be on rack1 and one node will be on rack2. Names are irrelevant, just come up with a naming standard that helps you understand where the Cassandra instance actually is. Everything here is case sensitive so be sure you’re consistent.
Example for node 1:
By-default dc value will be dc1 ie (dc=dc1)
Example for node 2:
Example for node 3:
Note: Thesudo rm /etc/cassandra/cassandra-topology.properties
GossipingPropertyFileSnitchalways loads cassandra-topology.properties when that file is present. Remove the file from each node on any new cluster or any cluster migrated from the
To know more about the cassandra configuration for multi-node cluster visit official documentation
Starting your Cassandra cluster
The final steps are to start your cluster and connect to it.
First off, start your seed instance that were specified in the cassandra.yaml config file. Once these are up and running you can start the remaining nodes.
service cassandra start
Then wait a few seconds for discovery to work and then run on both machines:
It should show both nodes:
If you can see all the nodes you configured, you’ve just successfully set up a multi-node Cassandra cluster.
Once all of your servers have started your cluster is ready to use! Each node will have the cqlsh utility installed that you can use to interact with your Cassandra cluster. You’ll need to use one of the IP addresses Cassandra is listening on.
cqlsh your_server_ip 9042
In my case it will be as follows :
cqlsh 10.0.3.4 9042
You will see it connect:
Connected to Tecnotes Cluster at 10.0.3.4:9042.
[cqlsh 5.0.1 | Cassandra 3.11.8 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
Check from node2 server also
You now have a multi-node Cassandra cluster running with single datacenter on Ubuntu 18.04.f you need to troubleshoot the cluster, the first place to look for clues are in the log files, which are located in the