Monday, October 1, 2012

Installing Cassandra


Introduction


This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Step 0: Prerequisites and connection to the community


Cassandra requires the most stable version of Java 1.6 you can deploy. For Sun's jvm, this means at least u19; u21 is better. Cassandra also runs on the IBM jvm, and should run on jrockit as well.
The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list (subscription required) and participate in the #cassandra channel on IRC.

Step 1: Download Cassandra Kit


  • Download links for the latest stable release can always be found on the website.
  • Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details.
  • Users of RPM-based distributions can get packages from Datastax.
  • If you are interested in building Cassandra from source, please refer to How to Build page.
For more details about misc builds, please refer to Cassandra versions and builds page.
  • If you plan to run "snapshot" command on Cassandra, it will be better to install jna.jar also. Please refer to Backup Data section.

Step 2: Edit configuration files


Cassandra configuration files can be found in conf directory under the top directory of binary and source distributions. If you have installed cassandra from RPM packages, configuration files will be placed into /etc/cassandra/conf.

Step 2.1: Edit cassandra.yaml


The distribution's sample configuration conf/cassandra.yaml contains reasonable defaults for single node operation, but you will need to make sure that the paths exist fordata_file_directoriescommitlog_directory, and saved_caches_directory.
Verify storage_port and rpc_port are not conflict with other service on your computer. By default, Cassandra uses 7000 for storage_port, and 9160 for rpc_port. The storage_port must be identical between Cassandra nodes in a cluster. Cassandra client applications will use rpc_port to connect to Cassandra.
It will be a good idea to change cluster_name to avoid unnecessary conflict with existing clusters.
initial_token. You can leave it blank, but I recommend you to set it to 0 if you are configuring your first node.

Step 2.2: Edit log4j-server.properties


conf/log4j.properties contains a path for the log file. Edit the line if you need.
# Edit the next line to point to your logs directory
log4j.appender.R.File=/var/log/cassandra/system.log

Step 2.3: Edit cassandra-env.sh


Cassandra has JMX (Java Management Extensions) interface, and the JMX_PORT is defined in conf/cassandra-env.shEdit following line if you need.
# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="7199"

By default, Cassandra will allocate memory based on physical memory your system has. For example it will allocate 1GB heap on 2GB system, and 2GB heap on 8GB system. If you want to specify Cassandra heap size, remove leading pound sign(#) on the following lines and specify memory size for them.
#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"

If you are not familiar with Java GC, 1/4 of MAX_HEAP_SIZE may be a good start point for HEAP_NEWSIZE.
Cassandra will need more than few GB heap for production use, but you can run it with smaller footprint for test drive. If you want to assign 128MB as max, edit the lines as following.
MAX_HEAP_SIZE="128M"
HEAP_NEWSIZE="32M"

If you face OutOfMemory exceptions or massive GCs with this configuration, increase these values. Don't start your production service with such tiny heap configuration!
  • Note for Mac Uses:
    Some people running OS X have trouble getting Java 6 to work. If you've kept up with Apple's updates, Java 6 should already be installed (it comes in Mac OS X 10.5 Update 1). Unfortunately, Apple does not default to using it. What you have to do is change your JAVA_HOME environment setting to/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home and add /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin to the beginning of yourPATH.

Step 3: Start up Cassandra


And now for the moment of truth, start up Cassandra by invoking bin/cassandra -f from the command line1. The service should start in the foreground and log gratuitously to standard-out. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then chances are you've succeeded.
Press "Control-C" to stop Cassandra.
If you start up Cassandra without "-f" option, it will run in background, so you need to kill the process to stop.

Step 4: Using cassandra-cli


bin/cassandra-cli is a interactive command line interface for Cassandra. You can define schema, store and fetch data with the tool. Run following command to connect to your Cassandra instance.
bin/cassandra-cli -h host -p rpc_port

example:
% bin/cassandra-cli -h 127.0.0.1 -p 9160

Then you will see following cassandra-cli prompt.
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.0.7

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown]