Tuesday, August 7, 2012

Cassandra Replication Factor - What is it ?


Replication


A Cassandra cluster always divides up the key space into ranges delimited by Tokens as described above, but additional replica placement is customizable via IReplicaPlacementStrategy in the configuration file. The standard strategies are
  • RackUnawareStrategy: replicas are always placed on the next (in increasing Token order) N-1 nodes along the ring
  • RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the same rack as the first
Note that with RackAwareStrategy, succeeding nodes along the ring should alternate data centers to avoid hot spots. For instance, if you have nodes A, B, C, and D in increasing Token order, and instead of alternating you place A and B in DC1, and C and D in DC2, then nodes C and A will have disproportionately more data on them because they will be the replica destination for every Token range in the other data center.
  • The corollary to this is, if you want to start with a single DC and add another later, when you add the second DC you should add as many nodes as you have in the first rather than adding a node or two at a time gradually.
Replication factor is not really intended to be changed in a live cluster either, but increasing it is conceptually simple: update the replication_factor from the CLI (see below), then run repair against each node in your cluster so that all the new replicas that are supposed to have the data, actually do.
Until repair is finished, you have 3 options:
  • read at ConsistencyLevel.QUORUM or ALL (depending on your existing replication factor) to make sure that a replica that actually has the data is consulted
  • continue reading at lower CL, accepting that some requests will fail (usually only the first for a given query, if ReadRepair is enabled)
  • take downtime while repair runs
The same options apply to changing replication strategy.
Reducing replication factor is easily done and only requires running cleanup afterwards to remove extra replicas.
To update the replication factor on a live cluster, forget about cassandra.yaml. Rather you want to use cassandra-cli:
  • update keyspace Keyspace1 with strategy_options = {replication_factor:3};

No comments:

Post a Comment