How to run a multi-node cluster with Cassandra on ubuntu 18.04

Introduction

In this tutorial, we will learn how to configure a multi-node cluster with (single datacenter) Cassandra on Ubuntu 18.04 .In the previous tutorial we discussed configure cassandra on single node which can be read from how to run a single-node Cassandra cluster.

Why cassandra Cluster

Cassandra is very fault-tolerant. It can be scaled to hundreds or thousands of nodes where data is automatically replicated. Even if you lose an entire data center your data will be safe.

Replication across data centers is also supported by Cassandra. Best of all, Cassandra is decentralized meaning that there is no single point of failure. Failed nodes can be replaced without any downtime.

Before We Begin

Because you’re about to build a multi-node Cassandra cluster, you must determine how many servers you’d like to have in your cluster and configure each of them.

Remember it’s standard to have at least 3 nodes, and in a basic 3 separate server configuration.But for this tutorial we will use 2 nodes ,that is two Cassandra server running on two seperate ubuntu 18.04 servers .So before the configuration we must have :

Two ubuntu 18.04 servers with sudo privileges
Each server must also have Cassandra installed by following this Cassandra installation guide.

Prepare the Cassandra Nodes for Clustering

It is assumed that the following is already in place:

Cassandra 3.11.7 is installed on 2 nodes.
Each node has open communication between the other nodes.
The IP addresses of each node are known.
No data is stored on the 2 Cassandra instances.

Step 1 : Clear existing Cassandra data

Servers in a Cassandra cluster are known as nodes. What you have on each server right now is a single-node Cassandra cluster. In this step, we’ll set up the nodes to function as a multi-node Cassandra cluster.

All the commands in this and subsequent steps must be repeated on each node in the cluster, so be sure to have as many terminals open as you have nodes in the cluster.

If you’ve already started your Cassandra instance you’ll need to stop it and remove the data it contains. The main reason for this is because the cluster_name needs to be the same on all nodes, and it’s best to choose one for yourself rather than use the default Test Cluster.

Note : My first cassandra server node ip : 10.0.3.246 and node2 ip : 10.0.3.4. I need to perform the further steps on both the servers.

sudo service cassandra stop

When that’s completed, delete the default dataset.

sudo rm -rf /var/lib/cassandra/data/system/*

Step 2: Configuring the Cluster

There is no central master in a Cassandra cluster. Instead you just make each one aware of the others and they work together.

Cassandra is configured using various files in the /etc/cassandra directory. The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses.

Only the following directives need to be modified to set up a multi-node Cassandra cluster:

cluster_name: This is the name of your cluster.It can be anything chosen by you to describe the name of the cluster. Space is allowed but make sure you wrap everything in quotes. All members of this cluster must have the same name.
-seeds: This is a comma-delimited list of the IP address of each node in the cluster.Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.It’s recommended that there are 3 seed nodes per data centre
listen_address: This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs changed to the IP address of the node.
rpc_address: This is the IP address for remote procedure calls. It defaults to localhost. If the server’s hostname is properly configured, leave this as is. Otherwise, change to server’s IP address or the loopback address (127.0.0.1).
endpoint_snitch: Name of the snitch, which is what tells Cassandra about what its network looks like. This defaults to SimpleSnitch, which is used for networks in one datacenter. In our case, we’ll change it to GossipingPropertyFileSnitch, which is preferred for production setups.

Note : The GossipingPropertyFileSnitch and NetworkTopologyStrategy are recommended for production environments.

Open the configuration file for editing using nano or your favorite text editor.

sudo nano /etc/cassandra/cassandra.yaml

Search the file for the following directives and modify them as below to match your cluster. Replace your_server_ip with the IP address of the server you’re currently working on.

The - seeds: list should be the same on every server, set the IP address on one machine to be the seed. It is not necessary that all machines be seeds. Seeds are nodes that Cassandra nodes use when you start Cassandra start to find other nodes.

by default it will look like seeds: “127.0.0.1” replace value with your node 1 ip (I am giving private of ip my first ec2 instance here )

seeds: “10.0.3.246”

Example for node 1:

cluster_name: 'Tecnotes Cluster'
num_tokens: 256
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        - seeds: 10.0.3.246
listen_address: 10.0.3.246
rpc_address: 10.0.3.246
endpoint_snitch: GossipingPropertyFileSnitch

Example for node 2:

cluster_name: 'Tecnotes Cluster'
num_tokens: 256
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        - seeds: 10.0.3.246
listen_address: 10.0.3.4
rpc_address: 10.0.3.4
endpoint_snitch: GossipingPropertyFileSnitch

Note:If you have 3 or more nodes ,You can give first and second node ips in seeds directive seperated by commas like :
– seeds: first node ip,second node ip

To know more about Please read Internode communications

When you’re finished modifying the file, save and close it.

Cassandra is built to be fault tolerant and will distribute data to try to minimize the risk of a failure causing loss of data or any downtime. Cassandra therefore has the understanding of a node, a rack and a data centre. Where possible, Cassandra will ensure that the data and it’s backups are stored on a different rack and a different data centre to ensure that failure, even at a data centre level isn’t catastrophic.

Optional step:

You can edit the cassandra-rackdc.properties file on each node and set the dc and rack attributes. For an example we’ll assume everything is in the same dc, dc1 however two nodes will be on rack1 and one node will be on rack2. Names are irrelevant, just come up with a naming standard that helps you understand where the Cassandra instance actually is. Everything here is case sensitive so be sure you’re consistent.

nano /etc/cassandra/cassandra-rackdc.properties

Example for node 1:

dc=uk_dc
rack=rack1

By-default dc value will be dc1 ie (dc=dc1)

Example for node 2:

dc=uk_dc
rack=rack1

Example for node 3:

dc=uk_dc
rack=rack2

Note: The GossipingPropertyFileSnitch always loads cassandra-topology.properties when that file is present. Remove the file from each node on any new cluster or any cluster migrated from the PropertyFileSnitch.
sudo rm /etc/cassandra/cassandra-topology.properties

To know more about the cassandra configuration for multi-node cluster visit official documentation

Starting your Cassandra cluster

The final steps are to start your cluster and connect to it.

First off, start your seed instance that were specified in the cassandra.yaml config file. Once these are up and running you can start the remaining nodes.

service cassandra start

Then wait a few seconds for discovery to work and then run on both machines:

nodetool status

It should show both nodes:

If you can see all the nodes you configured, you’ve just successfully set up a multi-node Cassandra cluster.

Once all of your servers have started your cluster is ready to use! Each node will have the cqlsh utility installed that you can use to interact with your Cassandra cluster. You’ll need to use one of the IP addresses Cassandra is listening on.

cqlsh your_server_ip 9042

In my case it will be as follows :

cqlsh 10.0.3.4 9042

You will see it connect:

Connected to Tecnotes Cluster at 10.0.3.4:9042.
[cqlsh 5.0.1 | Cassandra 3.11.8 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

Check from node2 server also

Conclusion

You now have a multi-node Cassandra cluster running with single datacenter on Ubuntu 18.04.f you need to troubleshoot the cluster, the first place to look for clues are in the log files, which are located in the /var/log/cassandra directory.