What’s New in Cassandra 0.8, Part 2: Counters
One of the features making its debut in Cassandra 0.8.0 is distributed counters. They allow you to … count things. (Or sum things; the counter increment need not be 1, or even positive). But a lot of stuff, very quickly, which makes them invaluable for real-time analytical tasks.
“counting,” we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.
If you had to count or sum things, available solutions previously included:
facility without any of the above problems.
integer, though this is more complex internally. When a new value is written
to a given counter column, this new value is added to whatever was the
previous value of the counter.
To create a column family holding counters, you simply indicate to Cassandra
that the default_validation_class on that column family is
CounterColumnType. For instance, using the CLI, you can create such
a column family using:
Super column families holding counters are also supported the usual way,
by specifying column_type=Super.
Using counters is then straightforward:
Note that the CLI provides a decr (decrement) operation, but this
is simply syntactic sugar for incrementing by a negative number. The
usual consistency level trade-offs apply to counter operations.
0.8.0 (the official release at the time of this writing) but has been added
for the 0.8.1 release.
Considering the counters column family created above:
does involve a read on one of the replica as part of replication. As a consequence,
counter increments are expected to be slightly slower than regular writes. Note
however that:
of disk failure, for instance), a standard way to deal with it is to remove
the problematic file and run repair to have the missing informations pulled from
the other replicas.
This is unfortunately not as simple with counters. Currently, the only
safe way to handle the loss of an sstable for a counter column family
is to remove all data for that column family, restart the node with
-Dcassandra.renew_counter_id=true (or remove the NodeIdInfo
system sstables on versions earlier than 0.8.2) and run repair once
the node is up.
(The reason you must remove all the counter sstables, even undamaged
ones, is that each node maintains a sub-count of the counter to which
it adds new increments and for which other nodes trust it to have the
most up-to-date value. Wiping the data on A ensures the replicas have
recognized that A is missing its sub-count and will re-replicate to it
on repair.)
deletions. This does mean that you will need to keep the Cassandra servers in
sync. Of course, using ntpd on an server deployment is good practice anyway, so this should not be an
important constraint.
limitations in their current form that you should be aware of:
Why Counters?
Prior to 0.8, Cassandra had no simple and efficient way to count. By“counting,” we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.
If you had to count or sum things, available solutions previously included:
- inserting a different column for each increment with a batch process to merge those
- use an external synchronization like Zookeeper (preferably through the
use of the Cages library for simplicity) - use another database such as redis to handle those counts
- unfriendly to develop against
- poor performance
- not scalable (in particular, none scales to multiple datacenter usage)
- requires additional software
facility without any of the above problems.
Using Counters
A counter is a specific kind of column whose user-visible value is a 64-bit signedinteger, though this is more complex internally. When a new value is written
to a given counter column, this new value is added to whatever was the
previous value of the counter.
To create a column family holding counters, you simply indicate to Cassandra
that the default_validation_class on that column family is
CounterColumnType. For instance, using the CLI, you can create such
a column family using:
[default@unknown] create keyspace test;
54900c80-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use test;
Authenticated to keyspace: test
[default@test] create column family counters with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;
6c7db090-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster
Super column families holding counters are also supported the usual way,
by specifying column_type=Super.
Using counters is then straightforward:
[default@test] incr counters[row][c1];
Value incremented.
[default@test] incr counters[row][c2] by 3;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=1)
=> (counter=c2, value=3)
Returned 2 results.
[default@test] decr counters[row][c2] by 4;
Value decremented.
[default@test] incr counters[row][c1] by -2;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=-1)
=> (counter=c2, value=-1)
Returned 2 results.
[default@test] del counters[row][c1];
column removed.
[default@test] get counters[row];
=> (counter=c2, value=-1)
Returned 1 results.
Note that the CLI provides a decr (decrement) operation, but this
is simply syntactic sugar for incrementing by a negative number. The
usual consistency level trade-offs apply to counter operations.
Using CQL
Let us start by noting that the support for counters in CQL is not part of0.8.0 (the official release at the time of this writing) but has been added
for the 0.8.1 release.
Considering the counters column family created above:
cqlsh> UPDATE counters SET c1 = c1 + 3, c2 = c2 - 4 WHERE key = row2;
cqlsh> select * from counters where key=row2;
KEY | c1 | c2 |
row2 | 3 | -4 |
Operational Considerations
Performance
Counters have been designed to allow for very fast writes. However, incrementdoes involve a read on one of the replica as part of replication. As a consequence,
counter increments are expected to be slightly slower than regular writes. Note
however that:
- For each write, only one of the replica has to perform a read, even with many replicas.
- At ConsistencyLevel.ONE, this read is not part of the latency the client will
observe, but is still part of the write itself. It follows that the
latency of increments at CL.ONE is very good, but care should be taken to
not overload the cluster by writing faster than it can handle.
(In JMX, you can monitor the pending tasks on the REPLICATE_ON_WRITE stage.)
Dealing with data loss
With regular column families, if an SSTable on disk is lost or corrupted (becauseof disk failure, for instance), a standard way to deal with it is to remove
the problematic file and run repair to have the missing informations pulled from
the other replicas.
This is unfortunately not as simple with counters. Currently, the only
safe way to handle the loss of an sstable for a counter column family
is to remove all data for that column family, restart the node with
-Dcassandra.renew_counter_id=true (or remove the NodeIdInfo
system sstables on versions earlier than 0.8.2) and run repair once
the node is up.
(The reason you must remove all the counter sstables, even undamaged
ones, is that each node maintains a sub-count of the counter to which
it adds new increments and for which other nodes trust it to have the
most up-to-date value. Wiping the data on A ensures the replicas have
recognized that A is missing its sub-count and will re-replicate to it
on repair.)
Other considerations
Internally, counters use server side timestamps order to deal withdeletions. This does mean that you will need to keep the Cassandra servers in
sync. Of course, using ntpd on an server deployment is good practice anyway, so this should not be an
important constraint.
Current limitations, known problems and the future
Besides the operational considerations above, Counters have a number oflimitations in their current form that you should be aware of:
- If a write times out in Cassandra,
the client cannot know if the write was persisted or not. This is not a
problem for regular columns, where the recommended way to cope with such
exception is to replay the write, since writes are idempotent. For counters however, replaying the write
in those situations may result in an over-count. On the other hand, not
replaying it may mean the write never gets recorded.
CASSANDRA-2783 is open to add an optional replay ID to counter writes. - Support for counter removal is exposed by the API, but is limited. If
you perform in a short sequence a counter increment, followed by a delete and then by
another increment, there is no guarantee that the end value will only be
the value of the second increment (the deletion could be fully ignored). The only safe use of deletion is for permanent removal,
where no new increment follows the deletion. - There is no support for time to live (TTL) on counter columns as there is
for regular columns (see CASSANDRA-1952
for more information on why). - There is no support for secondary indexes on counter columns.
- At the time of this writing, you cannot have a counter column inside a column
family of regular columns (and vice versa). The only way to use
counters is to create a column family with
default_validation_class=CounterColumnType, in which case all
columns are counters
(CASSANDRA-2614
is open to lift this limitation).