Binary Memtable is the name of Cassandra's bulk-load interface. It avoids several kinds of overhead associated with the normal Thrift API:
- Converting to Thrift from the internal structures and back
- Routing (copying) from a coordinator node to the replica nodes
- Writing to the commitlog
- Serializing the internal structures to on-disk format
The tradeoff you make is that it is considerably less convenient to use than Thrift:
- You must use the StorageProxy API, only available as Java code
- You must pre-serialize the rows yourself
- The rows you send are not live for querying until a flush occurs (either normally because the Binary Memtable fills up, or because you request one with nodetool)
- You must write an entire row at once
There is an example of using Hadoop to load data through the Binary Memtable interface at https://svn.apache.org/repos/asf/cassandra/trunk/contrib/bmt_example/. .
No comments:
Post a Comment