Monday, July 11, 2011

Whats new in cassandra 0.8

What’s New in Cassandra 0.8, Part 2: Counters


One of the features making its debut in Cassandra 0.8.0 is distributed counters. They allow you to … count things. (Or sum things; the counter increment need not be 1, or even positive). But a lot of stuff, very quickly, which makes them invaluable for real-time analytical tasks.

Why Counters?

Prior to 0.8, Cassandra had no simple and efficient way to count. By
“counting,” we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.
If you had to count or sum things, available solutions previously included:
  • inserting a different column for each increment with a batch process to merge those
  • use an external synchronization like Zookeeper (preferably through the
    use of the Cages library for simplicity)
  • use another database such as redis to handle those counts
Those solutions all had one or more of the following problems:
  • unfriendly to develop against
  • poor performance
  • not scalable (in particular, none scales to multiple datacenter usage)
  • requires additional software
The new counters feature solves this lack of simple and efficient counting
facility without any of the above problems.

Using Counters

A counter is a specific kind of column whose user-visible value is a 64-bit signed
integer, though this is more complex internally. When a new value is written
to a given counter column, this new value is added to whatever was the
previous value of the counter.
To create a column family holding counters, you simply indicate to Cassandra
that the default_validation_class on that column family is
CounterColumnType. For instance, using the CLI, you can create such
a column family using:

[default@unknown] create keyspace test;
54900c80-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use test;
Authenticated to keyspace: test

[default@test] create column family counters with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;
6c7db090-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster

Super column families holding counters are also supported the usual way,
by specifying column_type=Super.
Using counters is then straightforward:

[default@test] incr counters[row][c1];
Value incremented.
[default@test] incr counters[row][c2] by 3;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=1)
=> (counter=c2, value=3)

Returned 2 results.
[default@test] decr counters[row][c2] by 4;
Value decremented.
[default@test] incr counters[row][c1] by -2;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=-1)
=> (counter=c2, value=-1)
Returned 2 results.
[default@test] del counters[row][c1];
column removed.
[default@test] get counters[row];
=> (counter=c2, value=-1)
Returned 1 results.

Note that the CLI provides a decr (decrement) operation, but this
is simply syntactic sugar for incrementing by a negative number. The
usual consistency level trade-offs apply to counter operations.

Using CQL

Let us start by noting that the support for counters in CQL is not part of
0.8.0 (the official release at the time of this writing) but has been added
for the 0.8.1 release.
Considering the counters column family created above:

cqlsh> UPDATE counters SET c1 = c1 + 3, c2 = c2 - 4 WHERE key = row2;
cqlsh> select * from counters where key=row2;
     KEY | c1 | c2 |
    row2 |  3 | -4 |

Operational Considerations

Performance

Counters have been designed to allow for very fast writes. However, increment
does involve a read on one of the replica as part of replication. As a consequence,
counter increments are expected to be slightly slower than regular writes. Note
however that:
  • For each write, only one of the replica has to perform a read, even with many replicas.
  • At ConsistencyLevel.ONE, this read is not part of the latency the client will
    observe, but is still part of the write itself. It follows that the
    latency of increments at CL.ONE is very good, but care should be taken to
    not overload the cluster by writing faster than it can handle.
    (In JMX, you can monitor the pending tasks on the REPLICATE_ON_WRITE stage.)
Counter reads use the same code path than regular reads and thus offer comparable performance.

Dealing with data loss

With regular column families, if an SSTable on disk is lost or corrupted (because
of disk failure, for instance), a standard way to deal with it is to remove
the problematic file and run repair to have the missing informations pulled from
the other replicas.
This is unfortunately not as simple with counters. Currently, the only
safe way to handle the loss of an sstable for a counter column family
is to remove all data for that column family, restart the node with
-Dcassandra.renew_counter_id=true (or remove the NodeIdInfo
system sstables on versions earlier than 0.8.2) and run repair once
the node is up.
(The reason you must remove all the counter sstables, even undamaged
ones, is that each node maintains a sub-count of the counter to which
it adds new increments and for which other nodes trust it to have the
most up-to-date value. Wiping the data on A ensures the replicas have
recognized that A is missing its sub-count and will re-replicate to it
on repair.)

Other considerations

Internally, counters use server side timestamps order to deal with
deletions. This does mean that you will need to keep the Cassandra servers in
sync. Of course, using ntpd on an server deployment is good practice anyway, so this should not be an
important constraint.

Current limitations, known problems and the future

Besides the operational considerations above, Counters have a number of
limitations in their current form that you should be aware of:
  • If a write times out in Cassandra,
    the client cannot know if the write was persisted or not. This is not a
    problem for regular columns, where the recommended way to cope with such
    exception is to replay the write, since writes are idempotent. For counters however, replaying the write
    in those situations may result in an over-count. On the other hand, not
    replaying it may mean the write never gets recorded.
    CASSANDRA-2783 is open to add an optional replay ID to counter writes.
  • Support for counter removal is exposed by the API, but is limited. If
    you perform in a short sequence a counter increment, followed by a delete and then by
    another increment, there is no guarantee that the end value will only be
    the value of the second increment (the deletion could be fully ignored). The only safe use of deletion is for permanent removal,
    where no new increment follows the deletion.
  • There is no support for time to live (TTL) on counter columns as there is
    for regular columns (see CASSANDRA-1952
    for more information on why).
  • There is no support for secondary indexes on counter columns.
  • At the time of this writing, you cannot have a counter column inside a column
    family of regular columns (and vice versa). The only way to use
    counters is to create a column family with
    default_validation_class=CounterColumnType, in which case all
    columns are counters
    (CASSANDRA-2614
    is open to lift this limitation).

Cassandra CQL

What’s New in Cassandra 0.8, Part 1: CQL, the Cassandra Query Language



Why CQL?

Pedantic readings of the NoSQL label aside, Cassandra has never been against SQL per se. SQL is the original data DSL, and quite good at what it does.
Cassandra originally went with a Thrift RPC-based API as a way to provide a common denominator that more idiomatic clients could build upon independently. However, this worked poorly in practice: raw Thrift is too low-level to use productively, and keeping pace with new API methods to support (for example) indexes in 0.7 or distributed counters in 0.8 is too much for many maintainers to keep pace with.
CQL, the Cassandra Query Language, addresses this by pushing all implementation details to the server; all the client has to know for any operation is how to interpret “resultset” objects. So adding a feature like counters just requires teaching the CQL parser to understand “column + N” notation; no client-side changes are necessary.
CQL drivers are also hosted in-tree, to avoid the problems caused in the past by client proliferation.
At the same time, CQL is heavily based on SQL–close to a subset of SQL, in fact, which is a big win for newcomers: everyone knows what “SELECT * FROM users” means. CQL is the first step to making the learning curve on the client side as gentle as it is operationally.
(The place where CQL isn’t a strict subset of SQL–besides TABLE vs COLUMNFAMILY tokens, which may change soon–is in support for Cassandra’s wide rows and heirarchical data like supercolumns.)

A taste of CQL

Here we’ll use the cqlsh tool to create a “users” columnfamily, add some users and an index, and do some simple queries. (See our documentation for details on installing cqlsh.) Compare to the same example using the old cli interface if you’re so inclined.

cqlsh> CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;
cqlsh> USE test;

cqlsh> CREATE COLUMNFAMILY users (
   ...     key varchar PRIMARY KEY,
   ...     full_name varchar,
   ...     birth_date int,
   ...     state varchar
   ... );
cqlsh> CREATE INDEX ON users (birth_date);
cqlsh> CREATE INDEX ON users (state);
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('bsanderson', 'Brandon Sanderson', 1975, 'UT');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('prothfuss', 'Patrick Rothfuss', 1973, 'WI');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES ('htayler', 'Howard Tayler', 1968, 'UT');
cqlsh> SELECT key, state FROM users;
        key | state |
 bsanderson |    UT |
  prothfuss |    WI |
    htayler |    UT |
cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
        KEY | birth_date |         full_name | state |
 bsanderson |       1975 | Brandon Sanderson |    UT |

Current status and the road ahead

cqlsh (included with the Python driver) is already useful for testing queries against your data and managing your schema. The above example works as written in 0.8.0; ALTER COLUMNFAMILY, TTL support, and counter support are already done for 0.8.1.
Supercolumn support is the main missing feature and will probably come in 0.8.2 (minor releases are done roughly monthly). Removing Thrift as a requirement is a longer-term goal.
CQL does not change the underlying Cassandra data model; in particular, there is no support for JOINs. (For doing analytical queries with SQL against Cassandra, see Brisk.)
The Python and Node.js CQL drivers are ready for wider use; JDBC and Twisted are almost done, and PHP and Ruby are being worked on.
In short, CQL is ready for client authors to get involved. Most application developers should stick with the old Thrift RPC-based clients until 0.8.1*. If you want to give CQL a try, see the installation instructions, and for developers, the drivers tree was recently moved here.
*One such client, Hector, already has CQL query support.