Projects/Running Cassandra

From RAD Lab

Jump to: navigation, search

Contents

Getting Cassandra

We have been told by the Cassandra devs to use the development branch going forward. To check out the code using svn do:

 svn checkout http://the-cassandra-project.googlecode.com/svn/branches/development/ cassandra-dev

This should check out the code for you in a directory called cassandra-dev

Building Cassandra

How to build Cassandra on linux

  1. Building the Cassandra Sources
    1. Get the java JDK 1.7 from http://download.java.net/jdk7/binaries/
    2. Install the JDK and add its bin dir to your path
    3. Run 'ant jar' (if you don't have ant you should be able to just 'apt-get install ant')
  2. Building the thrift interface
    1. First, you'll need to install thrift
      1. Download thrift from Apache incubator
      2. The following is a superset of the libraries you'll need. (ruby, python, et al. are needed before compiling thrift to get the appropriate language library)
        1. sudo apt-get install libboost*
        2. sudo apt-get install libevent
        3. sudo apt-get install autoconf
        4. sudo apt-get install automake
        5. sudo apt-get install libtool
        6. sudo apt-get install gcc
        7. sudo apt-get install g++
        8. sudo apt-get install flex
        9. sudo apt-get install byacc
        10. sudo apt-get install ruby1.8-dev
        11. sudo apt-get install python-dev
        12. sudo apt-get install bison
        13. sudo apt-get install pkg-config
        14. sudo apt-get install ruby
        15. sudo apt-get install python
      3. cd into the thrift directory
      4. ./bootstrap
      5. ./configure
      6. make
      7. sudo make install
    1. The develpment branch only ships with the java interface defined for some reason, and it is missing some of the required support .thrift files. To build the c++ code use the file below (Other languages will have to modify this. It should only require adding the --gen [lang] line to the thrift call and adding a namespace declaration for that language)
    1. You will also need fb303.thrift and reflection_limited.thrift. You can get them at http://the-cassandra-project.googlecode.com/svn/trunk/interface/fb303.thrift and http://the-cassandra-project.googlecode.com/svn/trunk/interface/reflection_limited.thrift Put them in the interface directory as well.
    2. cd into the interface directory and run 'thrift --gen [lang] cassandra.thrift && thrift --gen [lang] fb303.thrift'.
    3. So for example, to build the cpp interface you just do 'thrift --gen cpp cassandra.thrift && thrift --gen cpp fb303.thrift'


Replace interface/cassandra.thrift with the following:

#!/usr/local/bin/thrift --gen java --gen cpp

#
# Interface definition for peer storage
# 

include "fb303.thrift"

namespace java  com.facebook.infrastructure.service
namespace cpp facebook.infrastructure.service


struct column_t {
   1: string                        columnName,
   2: string                        value,
   3: i64                           timestamp,
}

typedef map< string, list<column_t>  > column_family_map

struct batch_mutation_t {
   1: string                        table,
   2: string                        key,
   3: column_family_map             cfmap,
}

struct superColumn_t {
   1: string                        name,
   2: list<column_t>				columns,
}

typedef map< string, list<superColumn_t>  > superColumn_family_map

struct batch_mutation_super_t {
   1: string                        table,
   2: string                        key,
   3: superColumn_family_map        cfmap,
}


service Cassandra extends fb303.FacebookService {

  list<column_t> get_slice(string tablename,string key,string columnFamily_column, i32 start = -1 , i32 count = -1),
  column_t       get_column(string tablename,string key,string columnFamily_column),
  i32            get_column_count(string tablename,string key,string columnFamily_column),
  async void     insert(string tablename,string key,string columnFamily_column, string cellData,i64 timestamp),
  async void     batch_insert(batch_mutation_t batchMutation),
  bool           batch_insert_blocking(batch_mutation_t batchMutation),
  async void     remove(string tablename,string key,string columnFamily_column),
  list<column_t> get_columns_since(string tablename, string key, string columnFamily_column, i64 timeStamp),

  list<superColumn_t> get_slice_super(string tablename, string key, string columnFamily_superColumnName, i32 start = -1 , i32 count = -1),
  superColumn_t       get_superColumn(string tablename,string key,string columnFamily),
  async void          batch_insert_superColumn(batch_mutation_super_t batchMutationSuper),
  bool                batch_insert_superColumn_blocking(batch_mutation_super_t batchMutationSuper),
}
It looks like the only thing that's changed in the development version is in:
struct column_t {
   1: string                        columnName,
   2: string                        value,
   3: i32                           timestamp,
}
where timestamp went from i64 (in the version on the website) to i32. (fb303.thrift & reflection_limited.thrift are the same)
We could probably just copy the entire interface directory from the non-dev version back in... I'm working on seeing if this is true.
We might want to ask them about it.
-jesse 9/18/08

Setting up Cassandra

  • Create the logging directory, set in conf/logging.props
The default directory is /var/cassandra/logs

Running Cassandra

You should now be able to run the bin/start-server script (you might have to chmod 750 bin/start-server first). This will start up your Cassandra node and start listening on the thrift interface .

Talking to Cassandra

You should now be able to talk to Cassandra using the thrift interface. You can find a sample c++ program and Makefile below. Just put these two files in their own sub-directory within the cassandra directory

Save the following to nclient.cpp. You will need to set THRIFT_HOST and THRIFT_PORT to the location you ran your cassandra instance

// simple client

#include <string.h>
#include <sstream>
#include <iostream>
#include <stdlib.h>

#include "Cassandra.h"

#include <protocol/TBinaryProtocol.h>
#include <transport/TSocket.h>
#include <transport/TTransportUtils.h>

using namespace std;
using namespace facebook::thrift;
using namespace facebook::thrift::protocol;
using namespace facebook::thrift::transport;

//using namespace facebook::infrastructure::service;
using namespace facebook::fb303;

using namespace boost;

static int timestamp = 0;

// Set these to the location and port of your thrift interface
#define THRIFT_HOST "localhost"
#define THRIFT_PORT 9160

static vector<string> strsplit(string str) {
  vector<string> ret;
  string::size_type loc,loc2;

  loc = 0;
  loc2 = str.find(" ");
  while(loc2 != string::npos) {
    ret.push_back(str.substr(loc,loc2-loc));
    loc = loc2+1;
    loc2 = str.find(" ",loc);
  }
  ret.push_back(str.substr(loc,loc2==string::npos?loc2:(loc2-loc)));
  return ret;
}

static void insertVal(CassandraClient client,
		      string tablename,
		      string rowkey,
		      string colFam,
		      string data) {
  client.insert(tablename,rowkey,colFam,data,timestamp++);
  cout << "called insert"<<endl;
}

static int countCols(CassandraClient client,
		   string tablename,
		   string rowkey,
		   string colFam) {
  return client.get_column_count(tablename,rowkey,colFam);
}

void getCol(CassandraClient client,
	   column_t* col,
	   string tablename,
	   string rowkey,
	   string colfam) {
  client.get_column(*col,tablename,rowkey,colfam);
}
		
	       

static void printInsertUsage() {
  printf("invalid insert, insert is used as:\n");
  printf("insert [tablename] [rowkey] [columnFamily] [data]\n");
}

static void printCountUsage() {
  printf("invalid count, count is used as:\n");
  printf("count [tablename] [rowkey] [columnFamily]\n");
}

static void printGetUsage() {
  printf("invalid get, get is used as:\n");
  printf("get [tablename] [rowkey] [columnFamily]\n");
}

int main(int argc,char* argv) {
  string line;
  shared_ptr<TTransport> socket(new TSocket(THRIFT_HOST, THRIFT_PORT));
  shared_ptr<TTransport> transport(new TBufferedTransport(socket));
  shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
  CassandraClient client(protocol);

  try {
    transport->open();
    
    for (;;) {
      cout << "> ";
      cout.flush();
      getline(cin,line);
      
      vector<string> v = strsplit(line);
      string cmd = v[0];

      if (cmd == "quit")
	break;

      else if (cmd == "insert") {
	if (v.size() != 5) {
	  printInsertUsage();
	  continue;
	}
	cout << "doing: insert |"<<v[1]<<"| |"<<v[2]<<"| |"<<v[3]<<"| |"<<v[4]<<"|"<<endl;
	insertVal(client,v[1],v[2],v[3],v[4]);
	continue;
      }

      else if (cmd == "count") {
	if (v.size() != 4) {
	  printCountUsage();
	  continue;
	}
	cout << "count of "<<v[1]<<" "<<v[2]<<" "<<v[3]<<" is "<<countCols(client,v[1],v[2],v[3])<<endl;
      }

      else if (cmd == "get") {
	if (v.size() != 4) {
	  printGetUsage();
	  continue;
	}
	cout << "getting "<<v[1]<<" "<<v[2]<<" "<<v[3]<<" is:"<<endl;
	column_t col;
	getCol(client,&col,v[1],v[2],v[3]);
	cout << "Column Name:\t"<<col.columnName<<endl;
	cout << "Value:\t\t"<<col.value<<endl;
	cout << "Timestamp:\t"<<col.timestamp<<endl;
      }

      else {
	cout << "Invalid command: "<<line<<endl;
	continue;
      }
    
    }

    transport->close();

  } catch (TException &tx) {
    printf("ERROR: %s\n", tx.what());
  }
} 

Save the following to Makefile. You might have to change the CFLAGS and GEN_SRC vars if you put it somewhere else. Also, set the THRIFT_DIR variable to the location of your thrift include files.

    1. Probably, THRIFT_DIR=/usr/local/include/thrift
# Makefile for nclient

THRIFT_DIR = /usr/local/include/thrift
LIB_DIR = /usr/local/lib

CXX=g++
CFLAGS = -Wall -g  -I ../interface/gen-cpp -I${THRIFT_DIR}
LIBS = -L${LIB_DIR} -lstdc++ -lthrift

GEN_SRC = ../interface/gen-cpp/FacebookService.cpp ../interface/gen-cpp/fb303_types.cpp ../interface/gen-cpp/cassandra_types.cpp ../interface/gen-cpp/Cassandra.cpp

TARGET = nclient

OBJECTS = nclient.o

default: nclient

nclient: nclient.cpp
	$(CXX) -o $(TARGET) $(CFLAGS) $(LIBS) nclient.cpp $(GEN_SRC)

clean:
	@rm -f *.o $(TARGET)

Run a make in the directory you put those files in and you should get a binary called nclient that you can run. It accepts the commands 'insert' 'count' 'get' and 'quit'. If you didn't change the table defined in storage-conf.xml you should be able to run:

  • insert Mytable myKey Test:myCol myData
  • get Mytable myKey Test:myCol

and get back out myData

Cassandra Caveats

  1. If you've defined some set of ColumnFamlies and then you add data to the system and then try to add another ColumnFamily, inserts to the ne ColumnFamily will silently fail. You need to remove your existing data and restart.
  2. If you are trying to do a batch_insert and any of the columns in your batch_mutation have empty values (because you just didn't assign them) the insert will silently fail.
  3. If you're trying to insert a column and the column name has any of the characters space, dash, or colon (" ","-",":") the insert will silently fail. (Other characters might cause this as well)