Skip to main content

What is node repair doing in Cassandra

here's what will happen when node repair is request (nodetool repair)

definition:

   neighbors, the node(s) has replica of the data other node(s) has.  For example, you have a 7 node cluster, and token is evenly distributed (that is, each node holds 1/7 range). Assume you set replication factor to 3. That means any data you write should have 3 replicas, under default strategy (SimpleStrategy)  , the next two replicas will be put on the next node along the ring. So if the data you write is on node 4, then node 5 and 6 each will also have one replica. And node 4 should also hold one replica for node 3 and one replica for node 2.  Then the neighbors for node 4 will be node 2,3,4,5 and 6.  Using same logic, node 6’s neighbors will be 4,5,6,7 and 1 (remember it’s a ring, when you reaches end, the next will be the first node)
you can do the same calculation for other node or for other replication factor.

Steps:

for each keyspace in Cassandra DB do below:
        skip if it's system keyspace
        run force table repair on the keyspace by
                make sure all neighbors are up or quit
                send build hash tree request to all neighbors (at same time)                <--- see below for hash tree definition
                        when receive request, each node will do below for each column family in the keyspace
                                trigger a read only compaction by flush memtables and            <---- possible huge physical write
                                build hash tree by reading all rows                                          <---- possible huge physical read
                        send hash tree result back to requesting node                                                       
                after received hash tree from all neighbors, the requesting node will compare hash tree result (per column family) with local result,
                and if different, ask for SSTables (data file) from remote node for repair (compare all rows, update local row(s)  with the latest updated row(s) )      <--- possible huge physical read on remote and read/write on local      
    wait until finishes or failed, then go next keyspace

 

So if you have write consistency level set to ALL, or you never delete any records then you don’t have to run node repair at all. ( If you don’t delete, the inserted/updated data will be synced when you access them, which is called read  repair http://wiki.apache.org/cassandra/ReadRepair )


Hash tree is the way Cassandra used to efficiently determine which part of the data is out of sync among different nodes. you can find more here:
http://en.wikipedia.org/wiki/Hash_tree
and
http://wiki.apache.org/cassandra/AntiEntropy


You can observe node repair progress by set logging level to debug (in log4j.properties if you’re using default log4j) for class org.apache.cassandra.service.AntiEntropyService, and observe compaction progress by set logging level to debug for class org.apache.cassandra.db.compaction.CompactionManagerMBean.

Comments

Popular posts from this blog

enable special character support in Graphite metric name

Problem Graphite doesn’t support special characters like “ “ (empty space), “/” slash etc. Because it expect everything to be just ASCII to split/processing them, and then make directories based on metric name. For example:   Metric:     datacenter1.server1.app1.metric1.abc Will create datacenter1/server1/app1/metric1/abc.wsp But Metric: datacentter1.this is a test/with/path.app.test will fail when create directory So any special name not allow to appear in directory/file name is not supported by Graphite.   What we can do?   We can urlEncode the metric name which has special characters. So like “/var/opt” (not valid file name) will become “%2Fvar%2Fopt”(now valid), using urlEncode instead of others (like BASE64) is because this will keep most of data readable.   So what to change? 1. urlEncode metric name before send to Graphite (if you always sending metrics using text/line mode instead of pickle/batch mode, then you may consider modify Carbon-cache.py), like     metricname=

How to send command / input to multiple Putty window simultaneously

Putty is one of the best and must-have freeware for people working on Linux/Unix but use Windows as client like me.  We need to manage many servers and sometimes we are hoping we can run/execute same command on multiple host at same time, or just input same thing to multiple host. I searched online for a tool can do this. And it looks like PuTTYCS (PuTTY Command Sender) is the only one existing. But I’m a little bit disappointing after tried the software, it’s good but not good enough. It can only send command to each window one by one, and you have to wait until last window got input. So I think I should do something, and puttyCluster was born ( https://github.com/mingbowan/puttyCluster ) interface is simple: When you input Windows title pattern in the text box, you will be prompt for how many windows matching the pattern, like this: and you click the edit box under “cluster input”, what ever key you pressed will pass to all those windows simultaneously, even “Ctrl-C”, “Esc” and