Tuesday, October 5, 2010

Setting up multiple nodes in cassandra :

To set two different machines as node's we need to alter these three properties in
/opt/cassandra/conf/storage-conf.xml

<seed>Ip of the machine which needs to be the seed</seed>

Lets assume you want have two machines setup with cassandra

192.168.1.3 & 192.168.1.4

we want 192.168.1.3 to be the seed

So set
i. <seed>192.168.1.3</seed>
in both the machines

ii. <ListenAddress>{Respective Ip's}</ListenAddress>
<ListenAddress>192.168.1.3</ListenAddress>
<ListenAddress>192.168.1.4</ListenAddress>

iii. <ThriftAddress>{Respective Ip's}</ThriftAddress>

You are done with the configuration, Now create Keyspaces in both the machine's

</Keyspace>
<Keyspace Name="Prototype">
<ColumnFamily Name="master" CompareWith="UTF8Type" />
<ColumnFamily ColumnType="Super" CompareWith="UTF8Type" Name="Supermaster" />
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>

<ReplicationFactor>1</ReplicationFactor>

<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>


You're done. You will see the data inserted in one of the machine gets replicated in the other machine

Sorting in Cassandra :

Moving on to sorting of the data in cassandra ...


Columns are always sorted within their Row by the Column‘s name. This is important so i’ll say it again:

Columns inside a rowkey are always sorted by their name. This comparison of name depends on the

ColumnFamily's "CompareWith" Option


Types availble are :

* BytesType,
* UTF8Type,
* LexicalUUIDType,
* TimeUUIDType,
* AsciiType,
* LongType.

Creating a Super Column in Cassandra :

Inorder to create a super column we ll need to have a columnFamily with type super .
So let us add a columnFamily Supermaster to the KeySpace "Prototype"

<Keyspace Name="Prototype">
<ColumnFamily Name="master" CompareWith="UTF8Type" />

<ColumnFamily ColumnType="Super" CompareWith="UTF8Type" Name="Supermaster" />

<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
<ReplicationFactor>1</ReplicationFactor>
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>


Now lets try inserting some data into it.

cassandra> set Prototype.Supermaster['1']['address']['apt']='B-5C'
Value inserted.

Here 'address' is the Super-Column and apt is the column of the super column

Now lets fetch it :

cassandra> get Prototype.Supermaster['1']
=> (super_column=address,
(column=617074, value=B-5C, timestamp=1286273186596000))
Returned 1 results.

Adding few more entries to the super column for the rowKey :

cassandra> set Prototype.Supermaster['1']['address']['flat']='CDS, RPG'
Value inserted.
cassandra> set Prototype.Supermaster['1']['address']['area']='velachery'
Value inserted.
cassandra> set Prototype.Supermaster['1']['address']['city']='chennai'
Value inserted.


Fetching all the 4 columns in the Super-Column 'address'

cassandra> get Prototype.Supermaster['1']['address']
=> (column=flat, value=CDS, RPG, timestamp=1286273243882000)
=> (column=city, value=chennai, timestamp=1286273287246000)
=> (column=area, value=velachery, timestamp=1286273267318000)
=> (column=apt, value=B-5C, timestamp=1286273186596000)
Returned 4 results.

How to run Cassandra and basics with Cassandra

Once you have setup Cassandra on your machine , this is how you ll need to proceed to get started with how Cassandra works .



Assuming you have installed Cassandra to /opt

Starting cassandra : /opt/cassandra/bin/cassandra

Now to get into the command mode:

/opt/cassandra

bin/cassandra-cli --host localhost --port 9160


Creating a keyspace :

<keyspace Name="Prototype">
<columnfamily Name="master" CompareWith="UTF8Type" />
<replicaplacementstrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
<replicationfactor>1</ReplicationFactor>
<endpointsnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

KeySpace - compare it to database (Prorotype)
ColumFamily - compare it with a table (master)


Setting a value :

set Prototype.master['1']['name'] = 'Arun'
set Prototype.master['1']['age'] = '22'

setting value -> keyspace.columnfamily['rowKey']['colum']='value'

Rowkey is like the primary key , For eg: 1 is the primary key here , it will contain all properties of Arun
as different columns name, age etc...


Retrieval of data :

get Prototype.master['1']['name']
=> (column=name, value=Arun, timestamp=1286265373093000)

This will get only the value in the name colum of rowKey 1


get Prototype.master['1']

This will get you all the columns for the rowkey '1'

count Prototype.master['1']

will get the count of columns availble for rowkey '1'


FYI :------------------ Row keys are case sensitive ------------------

count Prototype.master['2']

will give you 0 columns exist , don't worry it will not throw a no column exist error.


But this is not the case for columns :

get Prototype.master['1']['friendlist']
Exception null

column friendlist does not exist , and it throws a exception (------------Something to be careful about--------------------)