Monday, July 11, 2011

Whats new in cassandra 0.8

What’s New in Cassandra 0.8, Part 2: Counters


One of the features making its debut in Cassandra 0.8.0 is distributed counters. They allow you to … count things. (Or sum things; the counter increment need not be 1, or even positive). But a lot of stuff, very quickly, which makes them invaluable for real-time analytical tasks.

Why Counters?

Prior to 0.8, Cassandra had no simple and efficient way to count. By
“counting,” we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.
If you had to count or sum things, available solutions previously included:
  • inserting a different column for each increment with a batch process to merge those
  • use an external synchronization like Zookeeper (preferably through the
    use of the Cages library for simplicity)
  • use another database such as redis to handle those counts
Those solutions all had one or more of the following problems:
  • unfriendly to develop against
  • poor performance
  • not scalable (in particular, none scales to multiple datacenter usage)
  • requires additional software
The new counters feature solves this lack of simple and efficient counting
facility without any of the above problems.

Using Counters

A counter is a specific kind of column whose user-visible value is a 64-bit signed
integer, though this is more complex internally. When a new value is written
to a given counter column, this new value is added to whatever was the
previous value of the counter.
To create a column family holding counters, you simply indicate to Cassandra
that the default_validation_class on that column family is
CounterColumnType. For instance, using the CLI, you can create such
a column family using:

[default@unknown] create keyspace test;
54900c80-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use test;
Authenticated to keyspace: test

[default@test] create column family counters with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;
6c7db090-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster

Super column families holding counters are also supported the usual way,
by specifying column_type=Super.
Using counters is then straightforward:

[default@test] incr counters[row][c1];
Value incremented.
[default@test] incr counters[row][c2] by 3;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=1)
=> (counter=c2, value=3)

Returned 2 results.
[default@test] decr counters[row][c2] by 4;
Value decremented.
[default@test] incr counters[row][c1] by -2;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=-1)
=> (counter=c2, value=-1)
Returned 2 results.
[default@test] del counters[row][c1];
column removed.
[default@test] get counters[row];
=> (counter=c2, value=-1)
Returned 1 results.

Note that the CLI provides a decr (decrement) operation, but this
is simply syntactic sugar for incrementing by a negative number. The
usual consistency level trade-offs apply to counter operations.

Using CQL

Let us start by noting that the support for counters in CQL is not part of
0.8.0 (the official release at the time of this writing) but has been added
for the 0.8.1 release.
Considering the counters column family created above:

cqlsh> UPDATE counters SET c1 = c1 + 3, c2 = c2 - 4 WHERE key = row2;
cqlsh> select * from counters where key=row2;
     KEY | c1 | c2 |
    row2 |  3 | -4 |

Operational Considerations

Performance

Counters have been designed to allow for very fast writes. However, increment
does involve a read on one of the replica as part of replication. As a consequence,
counter increments are expected to be slightly slower than regular writes. Note
however that:
  • For each write, only one of the replica has to perform a read, even with many replicas.
  • At ConsistencyLevel.ONE, this read is not part of the latency the client will
    observe, but is still part of the write itself. It follows that the
    latency of increments at CL.ONE is very good, but care should be taken to
    not overload the cluster by writing faster than it can handle.
    (In JMX, you can monitor the pending tasks on the REPLICATE_ON_WRITE stage.)
Counter reads use the same code path than regular reads and thus offer comparable performance.

Dealing with data loss

With regular column families, if an SSTable on disk is lost or corrupted (because
of disk failure, for instance), a standard way to deal with it is to remove
the problematic file and run repair to have the missing informations pulled from
the other replicas.
This is unfortunately not as simple with counters. Currently, the only
safe way to handle the loss of an sstable for a counter column family
is to remove all data for that column family, restart the node with
-Dcassandra.renew_counter_id=true (or remove the NodeIdInfo
system sstables on versions earlier than 0.8.2) and run repair once
the node is up.
(The reason you must remove all the counter sstables, even undamaged
ones, is that each node maintains a sub-count of the counter to which
it adds new increments and for which other nodes trust it to have the
most up-to-date value. Wiping the data on A ensures the replicas have
recognized that A is missing its sub-count and will re-replicate to it
on repair.)

Other considerations

Internally, counters use server side timestamps order to deal with
deletions. This does mean that you will need to keep the Cassandra servers in
sync. Of course, using ntpd on an server deployment is good practice anyway, so this should not be an
important constraint.

Current limitations, known problems and the future

Besides the operational considerations above, Counters have a number of
limitations in their current form that you should be aware of:
  • If a write times out in Cassandra,
    the client cannot know if the write was persisted or not. This is not a
    problem for regular columns, where the recommended way to cope with such
    exception is to replay the write, since writes are idempotent. For counters however, replaying the write
    in those situations may result in an over-count. On the other hand, not
    replaying it may mean the write never gets recorded.
    CASSANDRA-2783 is open to add an optional replay ID to counter writes.
  • Support for counter removal is exposed by the API, but is limited. If
    you perform in a short sequence a counter increment, followed by a delete and then by
    another increment, there is no guarantee that the end value will only be
    the value of the second increment (the deletion could be fully ignored). The only safe use of deletion is for permanent removal,
    where no new increment follows the deletion.
  • There is no support for time to live (TTL) on counter columns as there is
    for regular columns (see CASSANDRA-1952
    for more information on why).
  • There is no support for secondary indexes on counter columns.
  • At the time of this writing, you cannot have a counter column inside a column
    family of regular columns (and vice versa). The only way to use
    counters is to create a column family with
    default_validation_class=CounterColumnType, in which case all
    columns are counters
    (CASSANDRA-2614
    is open to lift this limitation).

No comments:

Post a Comment