What is node repair doing in Cassandra

here's what will happen when node repair is request (nodetool repair)

definition:

neighbors, the node(s) has replica of the data other node(s) has. For example, you have a 7 node cluster, and token is evenly distributed (that is, each node holds 1/7 range). Assume you set replication factor to 3. That means any data you write should have 3 replicas, under default strategy (SimpleStrategy) , the next two replicas will be put on the next node along the ring. So if the data you write is on node 4, then node 5 and 6 each will also have one replica. And node 4 should also hold one replica for node 3 and one replica for node 2. Then the neighbors for node 4 will be node 2,3,4,5 and 6. Using same logic, node 6’s neighbors will be 4,5,6,7 and 1 (remember it’s a ring, when you reaches end, the next will be the first node)
you can do the same calculation for other node or for other replication factor.

Steps:

for each keyspace in Cassandra DB do below:
        skip if it's system keyspace
        run force table repair on the keyspace by
                make sure all neighbors are up or quit
                send build hash tree request to all neighbors (at same time)                <--- see below for hash tree definition
                        when receive request, each node will do below for each column family in the keyspace
                                trigger a read only compaction by flush memtables and            <---- possible huge physical write
                                build hash tree by reading all rows                                          <---- possible huge physical read
                        send hash tree result back to requesting node
                after received hash tree from all neighbors, the requesting node will compare hash tree result (per column family) with local result,
                and if different, ask for SSTables (data file) from remote node for repair (compare all rows, update local row(s) with the latest updated row(s) )      <--- possible huge physical read on remote and read/write on local
    wait until finishes or failed, then go next keyspace

So if you have write consistency level set to ALL, or you never delete any records then you don’t have to run node repair at all. ( If you don’t delete, the inserted/updated data will be synced when you access them, which is called read repair http://wiki.apache.org/cassandra/ReadRepair )

Hash tree is the way Cassandra used to efficiently determine which part of the data is out of sync among different nodes. you can find more here:
http://en.wikipedia.org/wiki/Hash_tree
and
http://wiki.apache.org/cassandra/AntiEntropy

You can observe node repair progress by set logging level to debug (in log4j.properties if you’re using default log4j) for class org.apache.cassandra.service.AntiEntropyService, and observe compaction progress by set logging level to debug for class org.apache.cassandra.db.compaction.CompactionManagerMBean.

mingbo's tech tips

Search This Blog

What is node repair doing in Cassandra

Comments

Post a Comment

Popular posts from this blog

enable special character support in Graphite metric name

How to send command / input to multiple Putty window simultaneously

easily convert RSA SecurID Software Token URL between iPhone and Andriod