In this post, I will show you how to import some data into your cluster. There is a very cool program called CDM written by the very talented Jon Haddad.  This was written for importing test data projects for single node localhost installation of Cassandra, not what we have 🙂
In our case, we will create the schema and import the data manually.

Step 1)

Startup both C* nodes and start the C* process. If you have followed previous posts all you should need to do it type “cassandra” at the command prompt, do one node at a time.

Run “nodetool status” and you should see both nodes in your cluster.

$ nodetool status

Step 2)

Download the necessary files to your server.

cd /home/cassandra


Step 3)

Now start cqlsh and create the schema and import the data

3.1 start cqlsh

cqlsh cass-node-1

3.2 create the movielens schema

source '/home/cassandra/schema.txt';

3.3 check the schema

desc keyspace movielens ;

3.4 Now copy the data from the csv files into our database

COPY movielens.movies FROM 'movies.csv' WITH DELIMITER=',';
COPY movielens.ratings_by_movie FROM 'ratings_by_movie.csv' WITH DELIMITER=',';
COPY movielens.ratings_by_user FROM 'ratings_by_user.csv' WITH DELIMITER=',';
COPY movielens.users FROM 'users.csv' WITH DELIMITER=',';

3.5 Quickly verify the data

select * from movielens.movies limit 10;
select * from movielens.ratings_by_movie limit 10;
select * from movielens.ratings_by_user limit 10;
select * from movielens.users limit 10;

Step 4)

Flush the data to disk.

nodetool flush

Calling nodetool flush is needed in order to ensure our memtables have been written to disk. If we didn’t do this, our data would be sitting in memory, and compaction requires data to be written to disk.


We now have a Cassandra database with test application data in it.  We can now use this for testing and learning.


Please enter your comment!
Please enter your name here