How To: RCassandra?

By on 22 Sep 2014

Because of the scalability of Cassandra it is widely adopted throughout the globe.

  • This tutorial assumes that you have Cassandra and R installed and configured correctly.

Why would you want to use database like Cassandra with R?

I find it very easy to convert raw data into processed data with R. However, there are times when I have a large number of tables that I have to process, but the memory space is not great enough to keep them all as objects at the same time. Therefore, what I do is clean them up and then put them into db one by one. I personally think that most analysts spend a large majority their time simply altering data from raw to nicely formatted, quickly fathomable data, and that this pursuit, mundane as it may be, is an important aspect of the process for the sake of future analysis.

Limitations of RCassandra

RCassandra doesn't support creating keyspace, deleting keyspaces, creating column family, deleting column family, deleting a row, and deleting a line in a column of data. Also there are very few functions that are available in RCassandra in comparison to Clojure's Cassandra package alia, R's mongodb packages, etc.

Creating keyspace and tables in Cassandra's single-node cluster on localhost

1
cassandra-cli -host 127.0.0.1 -port 9160

First, create a keyspace with

1
CREATE KEYSPACE rcass with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:1};

then to connect to this keyspace type

1
USE rcass;

Installation and usage of RCassandra

1
install.packages("RCassandra")

Load the RCassandra package into your environment with

1
library(RCassandra)

Now connect to your database with

1
connect.handle <- RC.connect(host="127.0.0.1", port=9160)

Cassandra by default listens to port 9160 but you can change it according to your configuration. To show the cluster type into your prompt

1
2
RC.cluster.name(connect.handle)
[1] "Test Cluster"

It will show a list and you would find a entry for your keyspace

1
2
3
4
5
RC.describe.keyspace(connect.handle, 'rcass')
$name
[1] "rcass"
$strategy_class
...

Using the R's datasets library to create a column family

1
2
3
4
5
6
7
8
9
library(datasets)
head(mtcars, 3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

RC.use(connect.handle, 'rcass')
RC.write.table(connect.handle, "cars", mtcars)

To get a row type:

1
RC.get.range(connect.handle, "cars", "3")

To get a range of keys and multiple columns:

1
RC.get(connect.handle, "cars", "3", c("mpg", "gear", "carb"));

To query a range of keys and a rangle of columns:

1
cars_slice <- RC.get.range.slices(connect.handle, "cars")

The above cars_slice is a list. So you can get the list elemensts as

1
cars_slice[[1]]

To read the table into R from the db use:

1
2
mycars <- RC.read.table(connect.handle, "cars")
head(mycars)

Now let's create a data frame for storing the name, email, password, designation:

1
2
employee <- data.frame(name="Mr. Foo", designation="coder",
email="foo@example.com", password="123")

Now to write this frame into a table in cassandra:

1
RC.write.table(connect.handle, "employees", employee)

Now to read this table:

1
RC.read.table(connect.handle, "employees")

Insert a row into the table:

1
2
3
4
RC.insert(connect.handle, "employees", "Mr. Moo",
designation="tester", email="moo@example.com", password="345")
RC.insert(connect.handle, "employees", "Boo", designation="HR",
email="boo@example.com", password="333")

Now to look up the changes made

1
RC.read.table(connect.handle, "employees")
comments powered by Disqus