Thursday, June 23, 2011

Installing Cassandra

For those among us who like instant gratification, we’ll start by installing Cassandra.
Because Cassandra introduces a lot of new vocabulary, there might be some unfamiliar
terms as we walk through this. That’s OK; the idea here is to get set up quickly in a
simple configuration to make sure everything is running properly. This will serve as an
orientation. Then, we’ll take a step back and understand Cassandra in its larger context.
Installing the Binary
Cassandra is available for download from the Web at http://cassandra.apache.org. Just
click the link on the home page to download the latest release version as a gzipped
tarball. The prebuilt binary is named apache-cassandra-x.x.x-bin.tar.gz, where x.x.x
represents the version number. The download is around 10MB.
Extracting the Download
The simplest way to get started is to download the prebuilt binary. You can unpack the
compressed file using any regular ZIP utility. On Linux, GZip extraction utilities should
be preinstalled; on Windows, you’ll need to get a program such as WinZip, which is
commercial, or something like 7-Zip, which is freeware. You can download the freeware
program 7-Zip from http://www.7-zip.org.
Open your extracting program. You might have to extract the ZIP file and the TAR file
in separate steps. Once you have a folder on your filesystem called apache-cassandrax.
x.x, you’re ready to run Cassandra.
What’s In There?
Once you decompress the tarball, you’ll see that the Cassandra binary distribution
includes several directories. Let’s take a moment to look around and see what we have.
bin
This directory contains the executables to run Cassandra and the command-line
interface (CLI) client. It also has scripts to run the nodetool, which is a utility for
inspecting a cluster to determine whether it is properly configured, and to perform
a variety of maintenance operations. We look at nodetool in depth later. It also has
scripts for converting SSTables (the datafiles) to JSON and back.
conf
This directory, which is present in the source version at this location under the
package root, contains the files for configuring your Cassandra instance. There are
three basic functions: the storage-conf.xml file allows you to create your data store
by configuring your keyspace and column families; there are files related to setting
up authentication; and finally, the log4j properties let you change the logging levels
to suit your needs. We see how to use all of these when we discuss configuration
in Chapter 6.
interface
For versions 0.6 and earlier, this directory contains a single file, called
cassandra.thrift. This file represents the Remote Procedure Call (RPC) client API
that Cassandra makes available. The interface is defined using the Thrift syntax
and provides an easy means to generate clients. For a quick way to see all of the
operations that Cassandra supports, open this file in a regular text editor. You can
see that Cassandra supports clients for Java, C++, PHP, Ruby, Python, Perl, and
C# through this interface.
javadoc
This directory contains a documentation website generated using Java’s JavaDoc
tool. Note that JavaDoc reflects only the comments that are stored directly in the
Java code, and as such does not represent comprehensive documentation. It’s
helpful if you want to see how the code is laid out. Moreover, Cassandra is a
wonderful project, but the code contains precious few comments, so you might
find the JavaDoc’s usefulness limited. It may be more fruitful to simply read the
class files directly if you’re familiar with Java. Nonetheless, to read the JavaDoc,
open the javadoc/index.html file in a browser.
lib
This directory contains all of the external libraries that Cassandra needs to run.
For example, it uses two different JSON serialization libraries, the Google collections
project, and several Apache Commons libraries. This directory includes the
Thrift and Avro RPC libraries for interacting with Cassandra.
Building from Source
Cassandra uses Apache Ant for its build scripting language and the Ivy plug-in for
dependency management.
Ivy requires Ant, and building from source requires the complete JDK, version 1.6.0_20
or better, not just the JRE. If you see a message about how Ant is missing tools.jar,
either you don’t have the full JDK or you’re pointing to the wrong path in your environment
variables.
If you want to download the most cutting-edge builds, you can get the
source from Hudson, which the Cassandra project uses as its Continuous
Integration tool. See http://hudson.zones.apache.org/hudson/job/Cas
sandra/ for the latest builds and test coverage information.
If you are a Git fan, you can get a read-only trunk version of the Cassandra source using
this command:
>git clone git://git.apache.org/cassandra.git
Git is a source code management system created by Linus Torvalds to
manage development of the Linux kernel. It’s increasingly popular and
is used by projects such as Android, Fedora, Ruby on Rails, Perl, and
many Cassandra clients (as we’ll see in Chapter 8). If you’re on a Linux
distribution such as Ubuntu, it couldn’t be easier to get Git. At a
console, just type >apt-get install git and it will be installed and ready
for commands. For more information, visit http://git-scm.com/.
Because Ivy takes care of all the dependencies, it’s easy to build Cassandra once you
have the source. Just make sure you’re in the root directory of your source download
and execute the ant program, which will look for a file called build.xml in the current
directory and execute the default build target. Ant and Ivy take care of the rest. To
execute the Ant program and start compiling the source, just type:
>ant
That’s it. Ivy will retrieve all of the necessary dependencies, and Ant will build the nearly
350 source files and execute the tests. If all went well, you should see a BUILD SUCCESS
FUL message. If all did not go well, make sure that your path settings are all correct,
that you have the most recent versions of the required programs, and that you downloaded
a stable Cassandra build. You can check the Hudson report to make sure that
the source you downloaded actually can compile.
If you want to see detailed information on what is happening during the
build, you can pass Ant the -v option to cause it to output verbose details
regarding each operation it performs.
Additional Build Targets
To compile the server, you can simply execute ant as shown previously. But there are
a couple of other targets in the build file that you might be interested in:
test
Users will probably find this the most helpful, as it executes the battery of unit
tests. You can also check out the unit test sources themselves for some useful
examples of how to interact with Cassandra.
gen-thrift-java
This target generates the Apache Thrift client interface for interacting with the
database in Java.
gen-thrift-py
This target generates the Thrift client interface for Python users.
build-jar
To create a Java Archive (JAR) file for distribution, execute the command >ant
jar. This will perform a complete build and output a file into the build directory
called apache-cassandra-x.x.x.jar.
Building with Maven
The original authors of Cassandra apparently didn’t care much for Maven, so the early
releases did not include any Maven POM file. But because so many Java developers
have begun to favor Maven over Ant, and the tooling support in IDEs for Maven has
become so strong, there’s a pom.xml contribution to the project so you can build from
Maven if you prefer.
To build the source from Maven, navigate to /contrib/maven and
execute this command:
$ mvn clean install
If you have any difficulties building with Maven, you may have to get some of the
required JARs manually. As of version 0.6.3, the Maven POM doesn’t work out of
the box because some dependencies, such as the libthrift.jar file, are unavailable in a
repository.
Few developers are using Maven with Cassandra, so Maven lacks strong
support. Which is to say, use caution, because the Maven POM is often
broken.

5 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. i have downloaded the ant and cassandra , Ive set the environmetal variables for JAVA_HOME

    >> "Because Ivy takes care of all the dependencies, it’s easy to build Cassandra once you
    have the source. Just make sure you’re in the root directory of your source download
    and execute the ant program, which will look for a file called build.xml in the current
    directory and execute the default build target. Ant and Ivy take care of the rest. To
    execute the Ant program and start compiling the source, just type:
    >ant"

    im getting buildfile : fetch.xml doesnot exist

    my doubt is what is the source download that you mentioned.
    can you please give me a step by step instructions. it would be really great if you colud

    ReplyDelete
    Replies
    1. Hi Muthu , this post is probably very old .

      http://gettingstartedwithcassandra.blogspot.in/2012/10/installing-cassandra.html

      check this out. Let me know if you have issues ..

      For java client , you can try out Hector Cassandra api . it is good to the extent i have used .

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Arun I was able to build the jar but when I run it using java -jar build/apache-cassandra-3.8-SNAPSHOT.jar
    I get this error no main manifest attribute, in build/apache-cassandra-3.8-SNAPSHOT.jar

    ReplyDelete