dynamic snitch and replicationStrategy in Cassandra

What is EndPointSnitch :

Cassandra is a distribute database, and data can be on any of the node or nodes (depends on replication factor). But one single request from one client can hit only one Cassandra node at a time, so Cassandra needs to locate where is the data (on local node or on remote), then proxy the request to the node, wait for result and then return result to client. EndPointSnitch is used here to determine the sorted list of node Cassandra internally should proxy request to (best node is the first one in list). Coming with Cassandra, there’re 5 EndPointSnitch implementations. EndPointSnitch also needs to provide “datacenter” and “rack” info for ReplicationStrategy to determine where to put replicas

SimpleSnitch:

First get host list from ReplicationStrategy (defined for each keyspace, see below), exclude dead node and then return the node list. This snitch will always return "datacenter1" as datacenter and "rack1" as rack.

RackInferringSnitch

if local node has replica data, put local node at the beginning of node list, then any (randomly) of the node on same rack if it(them) has replica data, then any of the node in same datacenter but no on same rack which has replica data, Then any node in another datacenter if it (them) has replica data.

datacenter is defined by 2nd octets of IP. For example, 10.20.30.40, datacenter number is 20

rack is defined by 3nd octets of IP. For example, 10.20.30.40, rack number is 30

PropertyFileSnitch

you define rack info in file cassandra-topology.properties. with format like:

10.0.0.13=DC1:RAC2

10.21.119.14=DC3:RAC2

10.20.114.15=DC2:RAC2

default=DC1:r1

if any node has IP address matches the IP address defined in the properties file, use datacenter/rack info for that, otherwise use default. Then go through the same steps as for RackInferringSnitch to find the node list.

Ec2Snitch:

grab ec2 availability zone info by doing an HTTP get to http://169.254.169.254/latest/meta-data/placement/availability-zone, which will return string like "us-east-1a", "us-east-1b" etc. then define datacenter to "us-east-1" and rack to "a" for "us-east-1a". After that, go through the same steps as for RackInferringSnitch to find the node list.

DynamicEndpointSnitch:

using same algorithm as for dynamic load balancing policy to rank node. The latency info is collected by StorageProxy (which will do both local and remote requests, that is, all requests from client). All parameters are configured in Cassandra configuration file. You still need to define a non-dynamic snitch as base (any of above), so the initial list of servers comes from the underlying snitch. For more about algorithm, see here: http://mingbowan.blogspot.ca/2012/08/how-cassandra-hector-client-load.html

What is ReplicationStrategy:

When you define keyspace, you need to let Cassandra know how many replicas you want. And then replicationStrategy will decide which node to put those replica(s). 4 implantations come with Cassandra, but one of them “OldNetworkTopologyStrategy” is obsoleted. Below is details for the other 3.

SimpleStrategy

always return the node next on ring. So for example if we have 4 nodes, A, B, C and D. If we set replication factor to 3, and for data on node B, C and D will have additional replica. If we data is on C, then D, A will have additional replica.

LocalStrategy

always return local node.

NetworkTopologyStrategy

First, define how many replicas each datacenter should have and placement options (defined in strategy_options) when create or update keyspace. Like:

for example, if the keyspace replication factor is 6, the

datacenter replication factors could be 3, 2, and 1, so 3 replicas in

one datacenter, 2 in another, and 1 in another - total 6.

So the statement looks like:

CREATE KEYSPACE test

WITH placement_strategy = 'NetworkTopologyStrategy'

AND strategy_options=[{us-east:3,us-west:2, eu-west:1}];

After that, using endPointSnitch to find out datacenter and rack info. Then for each datacenter, first try to find node(s) on different rack, then node(s) on any of the rack if replicas count not enough. Repeat until replication factor is satisfied or throw exception.

mingbo's tech tips

Search This Blog

dynamic snitch and replicationStrategy in Cassandra

Comments

Post a Comment

Popular posts from this blog

enable special character support in Graphite metric name

How to send command / input to multiple Putty window simultaneously

easily convert RSA SecurID Software Token URL between iPhone and Andriod