Tuesday, September 18, 2012

Apache Cassandra - Load Balancing the cluster


Load balancing


When adding new nodes to the cluster, the data does not automatically get shared across new nodes equally and share load proportionately. This will make the cluster completely unbalanced.
 In order to make the data get shared equally we need to shift the token range some by using the nodetool move command.
Token ranges must be calculated in a way that will make sharing of data almost equal in each of the node.
Here's a python program which can be used to calculate new tokens for the nodes. 
  • def tokens(nodes):
    • for x in xrange(nodes):
      • print 2 ** 127 / nodes * x
Run this py program for each node and try the nodetool ring command . This will tell you the load on each of the node connected to the cluster. Run the program on each node again if needed until you see the nodetool ring display the load to be shared equally between each node.

In versions of Cassandra 0.7.* and lower, there's also nodetool loadbalance: essentially a convenience over decommission + bootstrap, only instead of telling the target node where to move on the ring it will choose its location based on the same heuristic as Token selection on bootstrap. You should not use this as it doesn't rebalance the entire ring.
The status of move and balancing operations can be monitored using nodetool with the netstat argument. (Cassandra 0.6.* and lower use the streams argument).

1 comment:

  1. very informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training Bnagalore

    ReplyDelete