Corruption occurs all the time, it happens. This could be for any number of reasons from faulty hardware to a network glitch. As someone who will be managing Cassandra, you will need to know how to handle corruption. In the following post, I will show you how to cause corruption in your system and how to fix it. This is just one way to handle corruption, there are a number of different ways which we will go through in other posts.

Step 1)

Make sure your cluster is up and running.

[cassandra@cass-node-1 ~]$ nodetool status

Step 2)

Log into CQL and have a look at the “movies” table. If you do not have this take check back to my post on how to import data into Cassandra here

[cassandra@cass-node-1 ~]$ cqlsh cass-node-1
Connected to Phils-Cool-Cluster at cass-node-1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> use movielens ;
cqlsh:movielens> desc table movies;

Step 3)

Now exit cqlsh and do an sstableverify to make sure there is no existing corruption, this is the command to use: sstableverify –verbose movielens movies

[cassandra@cass-node-1 ~]$ sstableverify --verbose movielens movies
Verifying BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db') (239.256KiB)
Deserializing sstable metadata for BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db')
Checking computed hash of BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db')

Looks good, our table is there and the file is not corrupted.

Step 4)

Log back into cqlsh and do a select from movies and limit the results to top 10 results, make sure and set “CONSISTENCY ONE” so you are select from the local node only.

cqlsh:movielens> consistency one;
Consistency level set to ONE.
cqlsh:movielens> select * from movies limit 10;

Step 5)

Now let’s introduce some corruption into the file.
Taking the file that sstableverify showed us “vi” into it and add some text and a line break at the top of the file.

vi /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db

Step 6)

Now log back into cql and run the same select, again with consistency of ONE so we are selected from the local node only. Should we hit the corruption??

cqlsh:movielens> consistency one;
Consistency level set to ONE.
cqlsh:movielens> select * from movies limit 10;


10 rows are returned as if there is no corruption

Step 7)

Run a nodetool flush and check the sstableverify again

[cassandra@cass-node-1 ~]$ sstableverify --verbose movielens movies
Verifying BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db') (239.256KiB)
Deserializing sstable metadata for BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db')
Checking computed hash of BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db')
Corrupted SSTable : /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db
Error verifying BigTableReader(path='/data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db'): Corrupted: /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db

The nodetool flush will write everything from the mmtables into the sstables but it may still hold the data in mmtables, this is why we are able to select from the movies table even though we have introduced corruption.

We need to restart Cassandra for the service to pick up the corruption.

After the restart select from Movies again using CONSISTENCY ONE and see what happens.  Bingo, corruption spotted

If we check the log we can see it reporting the corruption now

WARN  [ReadStage-2] 2018-11-14 14:06:55,280 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,5,main]: {}
java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2601) ~[apache-cassandra-3.11.3.jar:3.11.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_191]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.3.jar:3.11.3]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.3.jar:3.11.3]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.3.jar:3.11.3]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db
        at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:405) ~[apache-cassandra-3.11.3.jar:3.11.3]
        at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator.computeNext(BigTableScanner.java:306) ~[apache-cassandra-3.11.3.jar:3.11.3]
        ...........>>>>>>>>>>

Step 8)

Fixing the corruption
It might sound dangerous but delete the data file and restart the C* service.

rm -f /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be/mc-1-big-Data.db

Step 9)

Log back into CQL and select from Movies using CONSISTENCY ONE.

Connected to Phils-Cool-Cluster at cass-node-1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> use movielens ;
cqlsh:movielens> consistency one;
Consistency level set to ONE.
cqlsh:movielens> select * from movies limit 10;

We see the table structure without any data, this makes sense as we delete the datafile. The structure of the table is coming from the system keyspaces.

Step 10)

Check the data directory to see what’s there.  Do we have datafiles?

cassandra@cass-node-1 ~]$ cd /data/cassandra/data/movielens/movies-477dce50a45611e890116d9c24d5c4be
[cassandra@cass-node-1 movies-477dce50a45611e890116d9c24d5c4be]$ ll
total 0
drwxrwxr-x. 2 cassandra cassandra  6 Aug 20 08:51 backups
drwxrwxr-x. 4 cassandra cassandra 68 Oct 17 11:47 snapshots

As we can see we have no data files.
So currently we have no datafile on the local node, we have no data on the local node i.e. there is nothing in mmtables. To fix this we need to force a Read Repair.
All we have to do is log into CQL, select from the table with a CONSISTENCY higher than ONE and then flush the mmtables to disk. This will pull the data we want from the other nodes and then write the data to disk.
Do not limit the select results to 10. Obviously this works for use as our table only has 1600 rows, in a much larger database you would need to schedule a repair.

Connected to Phils-Cool-Cluster at cass-node-1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> USE movielens ;
cqlsh:movielens> consistency all;
Consistency level set to ALL.
cqlsh:movielens> select * from movies;
cqlsh:movielens> exit

Step 11)

Do a nodetool flush to write all the data you just read to local disk.

[cassandra@cass-node-1 movies-477dce50a45611e890116d9c24d5c4be]$ nodetool flush

Now check the data directory again.

And there we have it, we have just recovered from a datafile corruption.

LEAVE A REPLY

Please enter your comment!
Please enter your name here