Tuesday, April 26, 2011

Range Slices In Cassandra With Hector

I thought I’d write up a simple tutorial on how do a simple range slice in Cassandra with Hector for people coming from the relational world.

I’m going to outline the Cassandra version of SELECT col_name_1 FROM table_name LIMIT 100

Step 1) Setup the connection pool and the client

CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
CassandraClient client = pool.borrowClient("localhost", 9160);

Step 2) Define which Keyspace you want to use. This is equivalent to which DB or Schema.

Keyspace keyspace = client.getKeyspace("Keyspace1");

Step 3) Setup the slice predicate by defining which columns within the row you want returned. This is equivalent to SELECT col_name_1

// just look for one column for now
List colNames = new ArrayList();
colNames.add("column-name".getBytes());

// setup the slice predicate
SlicePredicate sp = new SlicePredicate();
sp.setColumn_names(colNames);

Step 4) Specify which Column Family to use. This is equivalent to a table or FROM table_name in SQL

// setup column parent (CF)
ColumnParent cp = new ColumnParent("Standard3");

Step 5) Execute the request. This is equivalent to executing a SQL query.

// get all the keys with a limit of 100 values returned
Map> map = keyspace.getRangeSlice(cp, sp, "", "", 100);

Note that the 3rd and 4th method args are empty strings. When you combine two empty strings for the start and end range, it is like saying SELECT col_name_1 FROM table_name without a WHERE clause.
Also if you have records that were deleted, the List of Column objects will be empty but the key will still be returned until is GC’d by Cassandra.

This example was used against a single Cassandra node using the Random Partitioner. Results will be returned to you in an unordered fashion.

Key trouble with KeyspaceService.multiGetSlice

I've added some additional debugging code and am confused by the
result. It appears that the result is being changed after the public
Map> execute(Cassandra.Client cassandra) of
multigetSlice is called.

diff --git a/core/src/main/java/me/prettyprint/cassandra/service/
KeyspaceServiceImpl.java b/core/src/main/java/me/prettyprint/cassandra/
service/KeyspaceServiceImpl.java
index 601d487..0ee8301 100644
--- a/core/src/main/java/me/prettyprint/cassandra/service/
KeyspaceServiceImpl.java
+++ b/core/src/main/java/me/prettyprint/cassandra/service/
KeyspaceServiceImpl.java
@@ -390,8 +390,11 @@ public class KeyspaceServiceImpl implements
KeyspaceService {

Map> result = new
HashMap>();
for (Map.Entry>
entry : cfmap.entrySet()) {
- result.put(entry.getKey(),
getColumnList(entry.getValue()));
+ System.out.println("hector raw: " + entry.getKey());
+ System.out.println("hector str: " +
StringSerializer.get().fromByteBuffer(entry.getKey()));
+ result.put(entry.getKey(),
getColumnList(entry.getValue()));
}
+
return result;
} catch (Exception e) {
throw xtrans.translate(e);
@@ -399,7 +402,12 @@ public class KeyspaceServiceImpl implements
KeyspaceService {
}
};
operateWithFailover(getCount);
- return getCount.getResult();
+ Map> res = getCount.getResult();
+ for (Map.Entry> entry : res.entrySet())
{
+ System.out.println("final hector raw: " + entry.getKey());
+ System.out.println("final hector str: " +
StringSerializer.get().fromByteBuffer(entry.getKey()));
+ }
+ return res;

}

2011-04-13 14:51:42,456 | INFO |
STDOUT | [example1.com/,
example2.com/]
2011-04-13 14:51:42,456 | INFO |
STDOUT | query keys
2011-04-13 14:51:42,456 | INFO |
STDOUT | [example1.com/,
example2.com/]
2011-04-13 14:51:42,457 | DEBUG |
me.prettyprint.cassandra.connection.HThriftClient | Transport open
status true for client CassandraClient
2011-04-13 14:51:42,457 | DEBUG |
me.prettyprint.cassandra.connection.HThriftClient | keyspace reset
from unicron to unicron
2011-04-13 14:51:42,463 | INFO |
STDOUT | hector raw:
java.nio.HeapByteBuffer[pos=61 lim=74 cap=80]
2011-04-13 14:51:42,463 | INFO |
STDOUT | hector str: example2.com/
2011-04-13 14:51:42,463 | INFO |
STDOUT | hector raw:
java.nio.HeapByteBuffer[pos=39 lim=52 cap=80]
2011-04-13 14:51:42,463 | INFO |
STDOUT | hector str: example1.com/
2011-04-13 14:51:42,464 | DEBUG |
me.prettyprint.cassandra.connection.HThriftClient | Transport open
status true for client CassandraClient
2011-04-13 14:51:42,464 | DEBUG |
me.prettyprint.cassandra.connection.ConcurrentHClientPool | Status of
releaseClient CassandraClient to queue: true
2011-04-13 14:51:42,464 | INFO |
STDOUT | final hector raw:
java.nio.HeapByteBuffer[pos=74 lim=74 cap=80]
2011-04-13 14:51:42,464 | INFO |
STDOUT | final hector str:
2011-04-13 14:51:42,464 | INFO |
STDOUT | slice keyset:
2011-04-13 14:51:42,464 | INFO |
STDOUT | [?multiget_slice
example1.com/
example2.com/
]

On Apr 13, 1:59 pm, cburroughs wrote:
> Thanks Nate. The code was written with a bit too much enthusiasm for
> the guava library, but I think it is similar. The major difference I
> saw was keySerializer.fromBytesMap, but for ByteBufferSerializer
> thats a noop. The keys of the "slice" variable below have the "?
> multiget_sliceUNPRINTABLEexample1.com/UNPRINTABLE"example2.com/"
> form. So I don't think it could be a problem with later code. If I
> follow the code from m.p.c.model.thrift.ThriftMultigetSliceQuery I
> see entry.getKey() used on thriftRet without anything special done to
> it.
>
> Example code. Keys are all urls.
>
> List keys =
> Lists.newArrayList(Iterables.transform(urls,
>
> EncodingUtils.STRING_BYTE_BUFFERER));
> Map> slice =
> ks.multigetSlice(keys,
>
> COLUMN_FAMILY, COLS_PREDICATE);
> return
> Iterables.transform(slice.entrySet(), RECORD_MAKER);
>
> -----
> static private final Function>,
> IUrlRecord> RECORD_MAKER
> = new Function>, IUrlRecord>()
> {
>
> @Override
> public IUrlRecord apply(Entry>
> row) {
> return makeRecord(row);
> }
>
> };
>
> -----
> static UrlRecord makeRecord(Entry> row) {
> return makeRecord(EncodingUtils.bbString(row.getKey()),
> row.getValue());
> }
>
> On Apr 13, 1:26 pm, Nate McCall wrote:
>
> > Take a look at:
> > m.p.c.model.thrift.ThriftMultigetSliceQuery, particularly lines 61
> > throught 70 and see if this approach is similar to how you are dealing
> > with the results from KeyspaceService.
>
> > On Wed, Apr 13, 2011 at 12:16 PM, cburroughs
wrote:
> > > I'm having trouble with Hector's KeyspaceService.multiGetSlice (yes I
> > > would rather be using the v2 api, but old code needs maintenance
> > > without large changes )
>
> > > Map> multigetSlice(List keys,
> > > ColumnParent columnParent,
> > > SlicePredicate predicate) throws HectorException;
>
> > > This List value seems correct as far as I can tell and works
> > > for keys that are both present and missing. However, the key of the
> > > Map is no the key requested. The ByteBuffer key I get back looks like
> > > a serialization of the multi_get_slice itself (see below). It looks
> > > like RowImpl doesn't do anything special with the result it gets back
> > > and I don't see a "special case multi_get_slice de-serializer"
> > > anywhere.
>
> > > For example, a request for the key "example.com" (a row key that
> > > exists) returns this as a key :
>
> > > \u0001\u0000\u0002\u0000\u0000\u0000\u000Emultiget_slice
> > > \u0000\u0000\u0000\u0004\r\u0000\u0000\u000B\u000F
> > > \u0000\u0000\u0000\u0001\u0000\u0000\u0000\fexample.com/\f
> > > \u0000\u0000\u0000\u0003\f\u0000\u0001\u000B
> > > \u0000\u0001\u0000\u0000\u0000\u0006shares\u000B
> > > \u0000\u0002\u0000\u0000\u0000\b
> > > \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0015\n
> > > \u0000\u0003\u0000\u0000\u0001. Ê\u0000\u0000\f\u0000\u0001\u000B
> > > \u0000\u0001\u0000\u0000\u0000\u0005title\u000B
> > > \u0000\u0002\u0000\u0000\u0000\u001CIANA — Example domains\n
> > > \u0000\u0003\u0000\u0000\u0001. Ê\u0000\u0000\f\u0000\u0001\u000B
> > > \u0000\u0001\u0000\u0000\u0000\u0003url\u000B
> > > \u0000\u0002\u0000\u0000\u0000\u0013http://example.com/\n
> > > \u0000\u0003\u0000\u0000\u0001. Ê\u0000\u0000\u0000
>
> > > And an example for two row keys that don't:
>
> > > [{"key":" \u0001\u0000\u0002\u0000\u0000\u0000\u000Emultiget_slice
> > > \u0000\u0000\u0000\u0002\r\u0000\u0000\u000B\u000F
> > > \u0000\u0000\u0000\u0002\u0000\u0000\u0000\rexample1.com/\f
> > > \u0000\u0000\u0000\u0000\u0000\u0000\u0000\rexample2.com/\f
> > > \u0000\u0000\u0000\u0000\u0000"},{"key":"
> > > \u0001\u0000\u0002\u0000\u0000\u0000\u000Emultiget_slice
> > > \u0000\u0000\u0000\u0002\r\u0000\u0000\u000B\u000F
> > > \u0000\u0000\u0000\u0002\u0000\u0000\u0000\rexample1.com/\f
> > > \u0000\u0000\u0000\u0000\u0000\u0000\u0000\rexample2.com/\f
> > > \u0000\u0000\u0000\u0000\u0000"}]
>
> > > This is with 0.7.0-28.
>
>

New Cassandra database can pack two billion columns into a row

The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row.

An open source database capable of holding such lengthy rows could be most useful to big data cloud computing projects and large-scale Web applications, the developers behind the Apache Software Foundation project assert.

"Apache Cassandra is a key component in cloud computing and other applications that deal with massive amounts of data and high query volumes," said Jonathan Ellis, vice president of Apache Cassandra and a co-founder of Cassandra professional support company Riptano, in a statement. "It is particularly successful in powering large web sites with sharp growth rates."

A number of large Web services have used this database, including Digg, Twitter and Facebook, which first developed the technology. The largest Cassandra cluster runs on over 400 servers, according to the project.

The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated.

The ability to create so many columns is valuable because it allows systems to create a nearly unlimited number of columns on the fly, Ellis explained in a follow-up e-mail.

Because Cassandra cannot execute SQL query commands, the additional columns would be needed to analyze the data within a specific row, elaborated computer scientist Maxim Grinev in a recent blog entry.

Other new features of 0.7 include the addition of secondary indexes -- which provides an easy way to query data on local machines -- and the ability to make changes to the schema without restarting the entire cluster.

Cassandra is one of a number of non-relational, or NoSQL, databases that offer the ability to quickly and easily store vast amounts of data, often in a clustered environment,

Social-networking giant Facebook developed Cassandra to power the service's inbox search. Because of the large amount of data it had to organize, Facebook wanted to use the Google Big Table database design, which could provide a column-and-row-oriented database structure that could be spread across many nodes.

The limit of Big Table was that it was a master-node-oriented design, Ellis said in an interview with the IDG News Service during the ApacheCon conference last November in Atlanta. The whole operation depended on a single node to coordinate read and write activities across all the other nodes. In other words, if the head node went down, the whole system would be useless, Ellis said. So Cassandra was built using an Amazon architecture called Dynamo, in combination with Big Table. Dynamo eliminated single points of failure while allowing for easy scalability. The Dynamo design is not dependent on any one master node. Any node can accept data for the whole system, as well as answer queries. Data is replicated across multiple hosts.

Cassandra is not the only clustered database built from the ideas of Big Table and Dynamo. Database startup company Cloudant developed a clustered version of the open source database CouchDB using this combination, called BigCouch. Cloudant just announced that it has amassed 2,500 users of its the hosted offering of BigCouch.

OrderedPartitioner or RandomPartitioner

For me, the main thing is a decision whether to use the OrderedPartitioner or RandomPartitioner.

If you use the RandomPartitioner, range scans are not possible. This means that you must know the exact key for any activity, INCLUDING CLEANING UP OLD DATA.

So if you've got a lot of churn, unless you have some magic way of knowing exactly which keys you've inserted stuff for, using the random partitioner you can easily "lose" stuff, which causes a disc space leak and will eventually consume all storage.

On the other hand, you can ask the ordered partitioner "what keys do I have in Column Family X between A and B" ? - and it'll tell you. You can then clean them up.

However, there is a downside as well. As Cassandra doesn't do automatic load balancing, if you use the ordered partitioner, in all likelihood all your data will end up in just one or two nodes and none in the others, which means you'll waste resources.

I don't have any easy answer for this, except you can get "best of both worlds" in some cases by putting a short hash value (of something you can enumerate easily from other data sources) on the beginning of your keys - for example a 16-bit hex hash of the user ID - which will give you 4 hex digits, followed by whatever the key is you really wanted to use.

Then if you had a list of recently-deleted users, you can just hash their IDs and range scan to clean up anything related to them.

The next tricky bit is secondary indexes - Cassandra doesn't have any - so if you need to look up X by Y, you need to insert the data under both keys, or have a pointer. Likewise, these pointers may need to be cleaned up when the thing they point to doesn't exist, but there's no easy way of querying stuff on this basis, so your app needs to Just Remember.

And application bugs may leave orphaned keys that you've forgotten about, and you'll have no way of easily detecting them, unless you write some garbage collector which periodically scans every single key in the db (this is going to take a while - but you can do it in chunks) to check for ones which aren't needed any more.

None of this is based on real usage, just what I've figured out during research. We don't use Cassandra in production.

EDIT: Cassandra now does have secondary indexes in trunk.

Getting Started With Cassandra tutorial

Based on Ronald Mathies’ intro articles to Cassandra and a few other resources I’ve been gathering, I thought I should put together a detailed guide to getting started with Cassandra. As one would expect the ☞ first post is briefly introducing Cassandra and covers the distribution details and installation steps. It should be noted that Windows may not be the best environment to install Cassandra. Also if after the brief intro you’d like to see more details about it, you should check Gary Dusbabek’s presentation on Cassandra or watch Eric Evan’s Cassandra presentation at FOSDEM.

The ☞ second article is focusing on Cassandra data model. If you are not familiar with it, this is the part you’ll want to focus on.

Column
A column is also referred to as a tuple (triplet) that contains a name, value and a timestamp. This is the smallest data container there is.
SuperColumn:
A SuperColumn is a tuple with a name and a value, it doesn’t have a timestamp like the Column tuple. Notice that the value is in this case not a binary value but more of a Map style container. The map contains key / column combinations. What is important here is that the key has the same value as the name of the Columnit refers to. So to put it simple, a SuperColumn is a container for one or more Columns. You will see that it will also make a big difference later on when we discuss the ColumnFamily and SuperColumnFamily.
ColumnFamily:
ColumnFamily is a structure that can keep an infinite number of rows, for most people with an RDBMS background, this is the structure that resembles a Table the most. When you look at the diagram you can see that a ColumnFamily has a name (comparable to the name of a Table), A map with a key (comparable to a row identifier) and a value (which is a Map containing Columns). The map with the columns have the same rules as the SuperColumn, the key has the same value as the name of the Column it refers to.
SuperColumnFamily:
Finally we have the largest container, the SuperColumnFamily, if you understand the ColumnFamily then this construction isn’t much harder, instead of having Columns in the inner most Map we have SuperColumns. So it just adds an extra dimension. As displayed in the image, the Key of the Map which contain the SuperColumns must be the same as the name of the SuperColumn (just like with the ColumnFamily).
Keyspace:
Keyspaces are quite simple again, from an RDBMS point of view you can compare this to your schema, normally you have one per application. A keyspace contains the ColumnFamilies. Note however there is no relationship between the ColumnFamiliies, they are just separate containers.
Probably the best explanation of the Cassandra data model can be found in Arin Sarkissian’s ☞ WTF is a SuperColumn?. There are other recommended resources about Cassandra and Jonathan Ellis, Cassandra project chair, has a suggested Cassandra reading list.

☞ Third article in the series is focusing on Cassandra sorting capabilities:

By default Cassandra sorts the data as soon as you store it in the database and it remains sorted. This gives you an enormous performance boost, however you need to think before you start storing data.

Sorting can be specified on the ColumnFamily CompareWith attribute, these are the options you can choose from (it is possible to create custom sorting behavior but we will cover that later):

BytesType
UTF8Type
LexicalUUIDType
TimeUUIDType
AsciiType
LongType
And there is also a way to define your own custom Cassandra sorting types described in ☞ post.

By now you should be ready to start using Cassandra and this is exactly the subject of the ☞ part 4 and ☞ part 5 of the series which cover the Thrift Cassandra client. Understanding how writes and reads are performed might be useful, so you should check Cassandra write operation and Cassandra read operation which also talk about the performance of these operation.

While initially you might not have enough data to have to decide how to partition a Cassandra cluster, once you’ll get to that point I’m pretty sure you’ll appreciate some more details on Cassandra partitioning strategies.

Last, but not least, here is a list of known Cassandra usecases that might give you a good idea of where Cassandra will fit in your next app and then you should be absolutely ready to experiment with Cassandra.

References
A Cassandra Glossary
Cassandra: Tunning for Performance
Cassandra Reads Performance Explained
Cassandra: Modeling A Facebook-Style Messenger
Presentation: Introduction to Cassandra
Cassandra Installation Guide for Ubuntu and Debian
Presentation: Cassandra Basics - Indexing
RESTful Cassandra
Tutorial: MapReduce with Riak
Why Redis? And Memcached, Cassandra, Lucene, ElasticSearch
Cassandra Gets (Better) Documentation
Cassandra reading list
Your Chance to Review the FOSDEM NoSQL Event
Cassandra Usecases: Survey Results
Cassandra Partitioning Strategies
Cassandra Write Operation Performance Explained
Getting Started with Cassandra on Windows
Presentation: Gary Dusbabek (Rackspace) on Cassandra

How to use cassandra with php

PHP application reading/writing to Cassandra :-


Ok, so let's say we finished Cassandra installation and generated PHP interface file with Thrift.

Now we can write small php application, which will connect to Cassandra and write and read some data.
I found on Internet multiple high level libraries, which allows you to manipulate with Cassandra data. In this example I will not go through all of them, bnut I will try just simple connection with low level API generated by Thrift to prove, that my connection from PHP to Cassandra really work.


Because I still missed some libraries from Thrift, I downloaded complete Thrift package from here: http://incubator.apache.org/thrift/download/
and extracted it to d:\cassandra\trift

Here http://wiki.apache.org/cassandra/ThriftExamples#PHP you can find example of connection from PHP to Cassandra, so let's make some corrections to fit it to my test and paths on my machine:
(I had to copy Cassandra files generated by Thrift to d:\cassandra\trift\lib\php\src\packages\cassandra)
open();


/* Insert some data into the Standard1 column family from the default config */

// Keyspace specified in storage=conf.xml
$keyspace = 'Keyspace1';

// reference to specific User id
$keyUserId = "1";

// Constructing the column path that we are adding information into.
$columnPath = new cassandra_ColumnPath();
$columnPath->column_family = 'Standard1';
$columnPath->super_column = null;
$columnPath->column = 'email';

// Timestamp for update
$timestamp = time();

// We want the consistency level to be ZERO which means async operations on 1 node
$consistency_level = cassandra_ConsistencyLevel::ZERO;

// Add the value to be written to the table, User Key, and path.
$value = "foobar@example.com";
$client->insert($keyspace, $keyUserId, $columnPath, $value, $timestamp, $consistency_level);

// Add a new column path to be altered.
$columnPath->column = 'age';
//Get a current timestamp
$timestamp = time();
// Update the value to be inserted for the updated column Path
$value = "24";
$client->insert($keyspace, $keyUserId, $columnPath, $value, $timestamp, $consistency_level);

/*
* use batch_insert to insert a supercolumn and its children using the standard config
* builds the structure
*
* Super1 : {
* KeyName : {
* SuperColumnName : {
* foo : fooey value
* bar : bar like thing
* }
* }
* }
*/

// build columns to insert
$column1 = new cassandra_Column();
$column1->name = 'foo';
$column1->value = 'fooey value';
$column1->timestamp = time();

$column2 = new cassandra_Column();
$column2->name = 'bar';
$column2->value = 'bar like thing';
$column2->timestamp = time();

// build super column containing the columns
$super_column = new cassandra_SuperColumn();
$super_column->name = 'SuperColumnName';
$super_column->columns = array($column1, $column2);

// create columnorsupercolumn holder class that batch_insert uses
$c_or_sc = new cassandra_ColumnOrSuperColumn();
$c_or_sc->super_column = $super_column;

// create the mutation (a map of ColumnFamily names to lists ColumnsOrSuperColumns objects
$mutation['Super1'] = array($c_or_sc);

$client->batch_insert($keyspace, 'KeyName', $mutation, $consistency_level);

/* Query for data */

// Specify what Column Family to query against.
$columnParent = new cassandra_ColumnParent();
$columnParent->column_family = "Standard1";
$columnParent->super_column = NULL;


$sliceRange = new cassandra_SliceRange();
$sliceRange->start = "";
$sliceRange->finish = "";
$predicate = new cassandra_SlicePredicate();
list() = $predicate->column_names;
$predicate->slice_range = $sliceRange;

// We want the consistency level to be ONE which means to only wait for 1 node
$consistency_level = cassandra_ConsistencyLevel::ONE;

// Issue the Query
$keyUserId = 1;
$result = $client->get_slice($keyspace, $keyUserId, $columnParent, $predicate, $consistency_level);


print_r($result);
$transport->close();



} catch (TException $tx) {
print 'TException: '.$tx->why. ' Error: '.$tx->getMessage() . "\n";
}
?>

Cassandra installation on windows 7

1. Cassandra is java based application, so first of all you need to install java on your machine.
Latest JRE you can download from here: http://www.oracle.com/technetwork/java/javase/downloads/index.html

2. Download Cassandra binary files from here: http://cassandra.apache.org/download/

3. Extract Cassandra source files. e.g. to D:\cassandra

4. set environment variables. (Go to System Properties -> Tab Advanced -> button Environment Variables ... and add system variables here)
JAVA_HOME=c:\Program Files\Java\jre6\ (it should be path to jre directory, not to bin directory ... value I used can be different on your machine)
CASSANDRA_HOME=d:\cassandra

5. modify config file d:\cassandra\conf\storage-conf.xml
I changed following:
/var/lib/cassandra/commitlog
to
d:/cassandra/commitlog
also I created directory d:/cassandra/commitlog

Next change was:
/var/lib/cassandra/data
I changed it to

d:/cassandra/data
and again I created directory d:/cassandra/data

6. You are ready to start Cassandra
Go to directory d:\cassandra\bin\ and start cassandra.bat

You should see output like this:
D:\cassandra\bin>cassandra.bat
Starting Cassandra Server
Listening for transport dt_socket at address: 8888
INFO 13:47:17,274 DiskAccessMode 'auto' determined to be mmap, indexAccessMode
is mmap
INFO 13:47:17,808 Saved Token not found. Using 15738447088364600050102917327157
1041806
INFO 13:47:17,809 Saved ClusterName not found. Using Test Cluster
INFO 13:47:17,815 Creating new commitlog segment d:/cassandra/commitlog\CommitL
og-1280922437815.log
INFO 13:47:17,886 LocationInfo has reached its threshold; switching in a fresh
Memtable at CommitLogContext(file='d:/cassandra/commitlog\CommitLog-128092243781
5.log', position=419)
INFO 13:47:17,903 Enqueuing flush of Memtable-LocationInfo@1370440457(169 bytes
, 4 operations)
INFO 13:47:17,905 Writing Memtable-LocationInfo@1370440457(169 bytes, 4 operati
ons)
INFO 13:47:18,082 Completed flushing d:\cassandra\data\system\LocationInfo-1-Da
ta.db
INFO 13:47:18,122 Starting up server gossip
INFO 13:47:18,196 Binding thrift service to localhost/127.0.0.1:9160
INFO 13:47:18,204 Cassandra starting up...