Friday, June 6, 2014

Cassandra CQL3 - Range Queries

Range Queries using CQL3 :

Once of the major advantage of using Cassandra via thrift was using the hashmap like structure . But with cql3 , it doesn’t appear to be available in a very straight forward way ..

But do not worry , it is very simple . No Maps are not the perfect replacement , because it is not possible to do range queries on a map right now

There is a better way to do it ..


Let us create a column family called testrange

CREATE TABLE testrange (
  key TEXT,
  detail_key int,
  detal_value text,
  PRIMARY KEY (key, detail_key)
);

We are creating the PK as key coupled with detail_key , this will force all the entries with same key to be a single row and each different detail_key will be a column name and we can do a =,<,> query on detail_key


Lets try this now


cqlsh:testkeyspace> describe columnfamily testrange;

CREATE TABLE testrange (
  key text,
  detail_key int,
  detal_value text,
  PRIMARY KEY (key, detail_key)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};



Inserting a few records

cqlsh:testkeyspace> insert into testrange (key,detail_key,detal_value) VALUES ('1',1,'number one');
cqlsh:testkeyspace> insert into testrange (key,detail_key,detal_value) VALUES ('1',2,'number two');
cqlsh:testkeyspace> insert into testrange (key,detail_key,detal_value) VALUES ('1',3,'number three');
cqlsh:testkeyspace> insert into testrange (key,detail_key,detal_value) VALUES ('1',4,'number four');
cqlsh:testkeyspace> insert into testrange (key,detail_key,detal_value) VALUES ('1',5,'number five');
cqlsh:testkeyspace> select * from testrange;

 key | detail_key | detal_value
-----+------------+--------------
   1 |          1 |   number one
   1 |          2 |   number two
   1 |          3 | number three
   1 |          4 |  number four
   1 |          5 |  number five

(5 rows)


So far , so good … now lets execute the range queries ..

cqlsh:testkeyspace> select * from testrange where key = '1' and detail_key > 2 and detail_key <=4;

 key | detail_key | detal_value
-----+------------+--------------
   1 |          3 | number three
   1 |          4 |  number four

(2 rows)




Cool , but how do we confirm if we are using the same range query type query that was offered by Cassandra thrift .

Lets jump back to the cli

[default@testkeyspace] list testrange;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: 1
=> (name=1:, value=, timestamp=1402037639440000)
=> (name=1:detal_value, value=6e756d626572206f6e65, timestamp=1402037639440000)
=> (name=2:, value=, timestamp=1402037649568000)
=> (name=2:detal_value, value=6e756d6265722074776f, timestamp=1402037649568000)
=> (name=3:, value=, timestamp=1402037659783000)
=> (name=3:detal_value, value=6e756d626572207468726565, timestamp=1402037659783000)
=> (name=4:, value=, timestamp=1402037668866000)
=> (name=4:detal_value, value=6e756d62657220666f7572, timestamp=1402037668866000)
=> (name=5:, value=, timestamp=1402037677374000)
=> (name=5:detal_value, value=6e756d6265722066697665, timestamp=1402037677374000)

1 Row Returned.
Elapsed time: 294 msec(s).
[default@testkeyspace]


So , we have only one row and all out detail_key are columns .. which is almost similar to what we used to do earlier with thrift.



Apache Cassandra - Getting started with CQL 3

Getting started with cql3

   Lets step through some basics of cassandra cql3 . I am going to assume you have already install cassandra and ready to get into cqlsh



<cassandra_home># bin/cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh>


Now we are ready to run some queries .


Lets create a keyspace to start with . Keyspace is the same as database if you are coming from sql land

cqlsh> CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

now the keyspace has been created..

lets use the keyspace

cqlsh> USE testkeyspace;
cqlsh:testkeyspace>

now we have a keyspace with replication factor 1 .

if you want to change your keyspace rf , you can alter at any point of time

ALTER KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 2};

this will require you to have a two node cluster setup


Dropping keyspace


DROP KEYSPACE testkeyspace;

yeah , simple so far


Now lets create a simple table

CREATE TABLE testtable(id text PRIMARY KEY, name text, avg varint, age int) WITH comment='Test table' AND read_repair_chance = 1.0;


Running a basic select query is very simple

cqlsh:testkeyspace> select * from testtable;

(0 rows)



Lets describe the table quickly

cqlsh:testkeyspace> describe columnfamily testtable;

CREATE TABLE testtable (
  id text,
  age int,
  avg varint,
  name text,
  PRIMARY KEY (id)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='Test table' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};


column family is similar to a table in sql


Now , let us insert some data into the column family

cqlsh:testkeyspace> insert into testtable (id,age,avg,name) values('1',23,10,'Arunn');

and select

cqlsh:testkeyspace> select * from testtable;

 id | age | avg | name
----+-----+-----+-------
  1 |  23 |  10 | Arunn

(1 rows)




Altering a table and adding fields is simple

ALTER TABLE testtable ADD gender varchar;


cqlsh:testkeyspace> ALTER TABLE testtable ADD gender varchar;
cqlsh:testkeyspace> select * from testtable;

 id | age | avg | gender | name
----+-----+-----+--------+-------
  1 |  23 |  10 |   null | Arunn

(1 rows)


Updates are very simple too

cqlsh:testkeyspace> update testtable set gender = 'MALE' where id = '1';
cqlsh:testkeyspace> select * from testtable;

 id | age | avg | gender | name
----+-----+-----+--------+-------
  1 |  23 |  10 |   MALE | Arunn

(1 rows)


Now lets try a simple delete and we are done with basic operations


cqlsh:testkeyspace> delete from testtable where id = '1';
cqlsh:testkeyspace> select * from testtable;

(0 rows)



Now we know how to create
·         Keyspace
·         Column family / CF
·         Insert data
·         Alter CF
·         Update row
·         Delete row



Immediate interesting thing is to try batch operations , this saves network round-trips

BEGIN BATCH
insert into testtable (id,age,avg,gender,name) values('2',30,19, 'MALE','Abc');
insert into testtable (id,age,avg,gender,name) values('3',40,16, 'MALE','Def');
update testtable set name = 'Ghi' where id = '3';
APPLY BATCH;



Counters are a very important column type in Cassandra

Counters
The counter type is used to define counter columns. A counter column is a column whose value is a 64-bit signed integer and on which 2 operations are supported: incrementation and decrementation Note the value of a counter cannot be set. A counter doesn’t exist until first incremented/decremented, and the first incrementation/decrementation is made as if the previous value was 0. Deletion of counter columns is supported but have some limitations (see the Cassandra Wiki for more information).
The use of the counter type is limited in the following way:
  • It cannot be used for column that is part of the PRIMARY KEY of a table.
  • A table that contains a counter can only contain counters. In other words, either all the columns of a table outside the PRIMARY KEY have the counter type, or none of them have it.


Map column types are interesting to store a collection of data . Cassandra also supports set and list types

cqlsh:testkeyspace> create table testmaptable (id text PRIMARY KEY , clus_columns map<int,text>);
cqlsh:testkeyspace> select * from testmaptable;

(0 rows)

cqlsh:testkeyspace> insert into testmaptable (id,clus_columns) values ('1',{1:'number one'});
cqlsh:testkeyspace> select * from testmaptable;

 id | clus_columns
----+-------------------
  1 | {1: 'number one'}

(1 rows)

cqlsh:testkeyspace> update testmaptable set clus_columns = {2:'number two'} where id = '1';
cqlsh:testkeyspace> select * from testmaptable;

 id | clus_columns
----+-------------------
  1 | {2: 'number two'}

(1 rows)


cqlsh:testkeyspace> update testmaptable set clus_columns = clus_columns + {1:'number one'} where id = '1';
cqlsh:testkeyspace> select * from testmaptable;

 id | clus_columns
----+------------------------------------
  1 | {1: 'number one', 2: 'number two'}

(1 rows)

Lets take a look at how it looks in the cassandra cli
<Cassandra_home> bin/Cassandra-cli –host localhost
Use testkeyspace;
[default@testkeyspace] list testmaptable;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: 1
=> (name=, value=, timestamp=1402035929908000)
=> (name=clus_columns:00000001, value=6e756d626572206f6e65, timestamp=1402036016296000)
=> (name=clus_columns:00000002, value=6e756d6265722074776f, timestamp=1402035971589000)

1 Row Returned.
Elapsed time: 429 msec(s).