Projects/Running Cassandra
From RAD Lab
Contents |
Getting Cassandra
We have been told by the Cassandra devs to use the development branch going forward. To check out the code using svn do:
svn checkout http://the-cassandra-project.googlecode.com/svn/branches/development/ cassandra-dev
This should check out the code for you in a directory called cassandra-dev
Building Cassandra
How to build Cassandra on linux
- Building the Cassandra Sources
- Get the java JDK 1.7 from http://download.java.net/jdk7/binaries/
- Install the JDK and add its bin dir to your path
- Run 'ant jar' (if you don't have ant you should be able to just 'apt-get install ant')
- Building the thrift interface
- First, you'll need to install thrift
- Download thrift from Apache incubator
- The following is a superset of the libraries you'll need. (ruby, python, et al. are needed before compiling thrift to get the appropriate language library)
- sudo apt-get install libboost*
- sudo apt-get install libevent
- sudo apt-get install autoconf
- sudo apt-get install automake
- sudo apt-get install libtool
- sudo apt-get install gcc
- sudo apt-get install g++
- sudo apt-get install flex
- sudo apt-get install byacc
- sudo apt-get install ruby1.8-dev
- sudo apt-get install python-dev
- sudo apt-get install bison
- sudo apt-get install pkg-config
- sudo apt-get install ruby
- sudo apt-get install python
- cd into the thrift directory
- ./bootstrap
- ./configure
- make
- sudo make install
- First, you'll need to install thrift
- The develpment branch only ships with the java interface defined for some reason, and it is missing some of the required support .thrift files. To build the c++ code use the file below (Other languages will have to modify this. It should only require adding the --gen [lang] line to the thrift call and adding a namespace declaration for that language)
- You will also need fb303.thrift and reflection_limited.thrift. You can get them at http://the-cassandra-project.googlecode.com/svn/trunk/interface/fb303.thrift and http://the-cassandra-project.googlecode.com/svn/trunk/interface/reflection_limited.thrift Put them in the interface directory as well.
- cd into the interface directory and run 'thrift --gen [lang] cassandra.thrift && thrift --gen [lang] fb303.thrift'.
- So for example, to build the cpp interface you just do 'thrift --gen cpp cassandra.thrift && thrift --gen cpp fb303.thrift'
Replace interface/cassandra.thrift with the following:
#!/usr/local/bin/thrift --gen java --gen cpp
#
# Interface definition for peer storage
#
include "fb303.thrift"
namespace java com.facebook.infrastructure.service
namespace cpp facebook.infrastructure.service
struct column_t {
1: string columnName,
2: string value,
3: i64 timestamp,
}
typedef map< string, list<column_t> > column_family_map
struct batch_mutation_t {
1: string table,
2: string key,
3: column_family_map cfmap,
}
struct superColumn_t {
1: string name,
2: list<column_t> columns,
}
typedef map< string, list<superColumn_t> > superColumn_family_map
struct batch_mutation_super_t {
1: string table,
2: string key,
3: superColumn_family_map cfmap,
}
service Cassandra extends fb303.FacebookService {
list<column_t> get_slice(string tablename,string key,string columnFamily_column, i32 start = -1 , i32 count = -1),
column_t get_column(string tablename,string key,string columnFamily_column),
i32 get_column_count(string tablename,string key,string columnFamily_column),
async void insert(string tablename,string key,string columnFamily_column, string cellData,i64 timestamp),
async void batch_insert(batch_mutation_t batchMutation),
bool batch_insert_blocking(batch_mutation_t batchMutation),
async void remove(string tablename,string key,string columnFamily_column),
list<column_t> get_columns_since(string tablename, string key, string columnFamily_column, i64 timeStamp),
list<superColumn_t> get_slice_super(string tablename, string key, string columnFamily_superColumnName, i32 start = -1 , i32 count = -1),
superColumn_t get_superColumn(string tablename,string key,string columnFamily),
async void batch_insert_superColumn(batch_mutation_super_t batchMutationSuper),
bool batch_insert_superColumn_blocking(batch_mutation_super_t batchMutationSuper),
}
It looks like the only thing that's changed in the development version is in:
struct column_t {
1: string columnName,
2: string value,
3: i32 timestamp,
}
where timestamp went from i64 (in the version on the website) to i32. (fb303.thrift & reflection_limited.thrift are the same)
We could probably just copy the entire interface directory from the non-dev version back in... I'm working on seeing if this is true.
We might want to ask them about it.
-jesse 9/18/08
Setting up Cassandra
- Create the logging directory, set in conf/logging.props
- The default directory is /var/cassandra/logs
- Look at conf/storage-conf.xml and set the various data directories
- For information on the configuration parameters, see http://code.google.com/p/the-cassandra-project/wiki/ConfReference
Running Cassandra
You should now be able to run the bin/start-server script (you might have to chmod 750 bin/start-server first). This will start up your Cassandra node and start listening on the thrift interface .
Talking to Cassandra
You should now be able to talk to Cassandra using the thrift interface. You can find a sample c++ program and Makefile below. Just put these two files in their own sub-directory within the cassandra directory
Save the following to nclient.cpp. You will need to set THRIFT_HOST and THRIFT_PORT to the location you ran your cassandra instance
// simple client
#include <string.h>
#include <sstream>
#include <iostream>
#include <stdlib.h>
#include "Cassandra.h"
#include <protocol/TBinaryProtocol.h>
#include <transport/TSocket.h>
#include <transport/TTransportUtils.h>
using namespace std;
using namespace facebook::thrift;
using namespace facebook::thrift::protocol;
using namespace facebook::thrift::transport;
//using namespace facebook::infrastructure::service;
using namespace facebook::fb303;
using namespace boost;
static int timestamp = 0;
// Set these to the location and port of your thrift interface
#define THRIFT_HOST "localhost"
#define THRIFT_PORT 9160
static vector<string> strsplit(string str) {
vector<string> ret;
string::size_type loc,loc2;
loc = 0;
loc2 = str.find(" ");
while(loc2 != string::npos) {
ret.push_back(str.substr(loc,loc2-loc));
loc = loc2+1;
loc2 = str.find(" ",loc);
}
ret.push_back(str.substr(loc,loc2==string::npos?loc2:(loc2-loc)));
return ret;
}
static void insertVal(CassandraClient client,
string tablename,
string rowkey,
string colFam,
string data) {
client.insert(tablename,rowkey,colFam,data,timestamp++);
cout << "called insert"<<endl;
}
static int countCols(CassandraClient client,
string tablename,
string rowkey,
string colFam) {
return client.get_column_count(tablename,rowkey,colFam);
}
void getCol(CassandraClient client,
column_t* col,
string tablename,
string rowkey,
string colfam) {
client.get_column(*col,tablename,rowkey,colfam);
}
static void printInsertUsage() {
printf("invalid insert, insert is used as:\n");
printf("insert [tablename] [rowkey] [columnFamily] [data]\n");
}
static void printCountUsage() {
printf("invalid count, count is used as:\n");
printf("count [tablename] [rowkey] [columnFamily]\n");
}
static void printGetUsage() {
printf("invalid get, get is used as:\n");
printf("get [tablename] [rowkey] [columnFamily]\n");
}
int main(int argc,char* argv) {
string line;
shared_ptr<TTransport> socket(new TSocket(THRIFT_HOST, THRIFT_PORT));
shared_ptr<TTransport> transport(new TBufferedTransport(socket));
shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
CassandraClient client(protocol);
try {
transport->open();
for (;;) {
cout << "> ";
cout.flush();
getline(cin,line);
vector<string> v = strsplit(line);
string cmd = v[0];
if (cmd == "quit")
break;
else if (cmd == "insert") {
if (v.size() != 5) {
printInsertUsage();
continue;
}
cout << "doing: insert |"<<v[1]<<"| |"<<v[2]<<"| |"<<v[3]<<"| |"<<v[4]<<"|"<<endl;
insertVal(client,v[1],v[2],v[3],v[4]);
continue;
}
else if (cmd == "count") {
if (v.size() != 4) {
printCountUsage();
continue;
}
cout << "count of "<<v[1]<<" "<<v[2]<<" "<<v[3]<<" is "<<countCols(client,v[1],v[2],v[3])<<endl;
}
else if (cmd == "get") {
if (v.size() != 4) {
printGetUsage();
continue;
}
cout << "getting "<<v[1]<<" "<<v[2]<<" "<<v[3]<<" is:"<<endl;
column_t col;
getCol(client,&col,v[1],v[2],v[3]);
cout << "Column Name:\t"<<col.columnName<<endl;
cout << "Value:\t\t"<<col.value<<endl;
cout << "Timestamp:\t"<<col.timestamp<<endl;
}
else {
cout << "Invalid command: "<<line<<endl;
continue;
}
}
transport->close();
} catch (TException &tx) {
printf("ERROR: %s\n", tx.what());
}
}
Save the following to Makefile. You might have to change the CFLAGS and GEN_SRC vars if you put it somewhere else. Also, set the THRIFT_DIR variable to the location of your thrift include files.
- Probably, THRIFT_DIR=/usr/local/include/thrift
# Makefile for nclient
THRIFT_DIR = /usr/local/include/thrift
LIB_DIR = /usr/local/lib
CXX=g++
CFLAGS = -Wall -g -I ../interface/gen-cpp -I${THRIFT_DIR}
LIBS = -L${LIB_DIR} -lstdc++ -lthrift
GEN_SRC = ../interface/gen-cpp/FacebookService.cpp ../interface/gen-cpp/fb303_types.cpp ../interface/gen-cpp/cassandra_types.cpp ../interface/gen-cpp/Cassandra.cpp
TARGET = nclient
OBJECTS = nclient.o
default: nclient
nclient: nclient.cpp
$(CXX) -o $(TARGET) $(CFLAGS) $(LIBS) nclient.cpp $(GEN_SRC)
clean:
@rm -f *.o $(TARGET)
Run a make in the directory you put those files in and you should get a binary called nclient that you can run. It accepts the commands 'insert' 'count' 'get' and 'quit'. If you didn't change the table defined in storage-conf.xml you should be able to run:
- insert Mytable myKey Test:myCol myData
- get Mytable myKey Test:myCol
and get back out myData
Cassandra Caveats
- If you've defined some set of ColumnFamlies and then you add data to the system and then try to add another ColumnFamily, inserts to the ne ColumnFamily will silently fail. You need to remove your existing data and restart.
- If you are trying to do a batch_insert and any of the columns in your batch_mutation have empty values (because you just didn't assign them) the insert will silently fail.
- If you're trying to insert a column and the column name has any of the characters space, dash, or colon (" ","-",":") the insert will silently fail. (Other characters might cause this as well)
