What is EndPointSnitch
:
Cassandra is a distribute database, and
data can be on any of the node or nodes (depends on replication factor).
But one single request from one client can hit only one Cassandra node at
a time, so Cassandra needs to locate where is the data (on local node or on
remote), then proxy the request to the node, wait for result and then return
result to client. EndPointSnitch is used here to determine the sorted
list of node Cassandra internally should proxy request to (best node is the
first one in list). Coming with Cassandra, there’re 5 EndPointSnitch
implementations. EndPointSnitch also needs to provide “datacenter”
and “rack” info for ReplicationStrategy to determine where to put
replicas
SimpleSnitch:
First get host list from ReplicationStrategy (defined
for each keyspace, see below), exclude dead node and then return the node list.
This snitch will always return "datacenter1" as datacenter and
"rack1" as rack.
RackInferringSnitch
if local node has replica data, put local node at the beginning of node list,
then any (randomly) of the node on same rack if it(them) has replica
data, then any of the node in same datacenter but no on same rack which has
replica data, Then any node in another datacenter if it (them) has replica
data.
datacenter is defined by 2nd octets of IP. For example, 10.20.30.40, datacenter
number is 20
rack is defined by 3nd octets of IP. For example, 10.20.30.40, rack number is
30
PropertyFileSnitch
you define rack info in file cassandra-topology.properties. with format like:
10.0.0.13=DC1:RAC2
10.21.119.14=DC3:RAC2
10.20.114.15=DC2:RAC2
default=DC1:r1
if any node has IP address matches the IP address defined in the properties
file, use datacenter/rack info for that, otherwise use default. Then go through
the same steps as for RackInferringSnitch to find the node list.
Ec2Snitch:
grab ec2 availability zone info by doing an HTTP get to http://169.254.169.254/latest/meta-data/placement/availability-zone,
which will return string like "us-east-1a", "us-east-1b" etc.
then define datacenter to "us-east-1" and rack to "a"
for "us-east-1a". After that, go through the same steps as for RackInferringSnitch
to find the node list.
DynamicEndpointSnitch:
using same algorithm as for dynamic load balancing policy to rank node. The
latency info is collected by StorageProxy (which will do both local and remote
requests, that is, all requests from client). All parameters are configured in
Cassandra configuration file. You still need to define a non-dynamic snitch as
base (any of above), so the initial list of servers comes from the underlying snitch. For more about algorithm, see here: http://mingbowan.blogspot.ca/2012/08/how-cassandra-hector-client-load.html
What is ReplicationStrategy:
When you define keyspace, you need to let
Cassandra know how many replicas you want. And then replicationStrategy will
decide which node to put those replica(s). 4 implantations come with
Cassandra, but one of them “OldNetworkTopologyStrategy” is obsoleted. Below is
details for the other 3.
SimpleStrategy
always return the node next on ring. So for example if we have 4 nodes, A, B, C
and D. If we set replication factor to 3, and for data on node B, C and D
will have additional replica. If we data is on C, then D, A will have
additional replica.
LocalStrategy
always return local node.
NetworkTopologyStrategy
First, define how many replicas each datacenter should have and placement
options (defined in strategy_options) when create or update keyspace. Like:
for example, if the keyspace replication factor is 6, the
datacenter replication factors could be 3, 2, and 1, so 3 replicas in
one datacenter, 2 in another, and 1 in another - total 6.
So the statement looks like:
CREATE KEYSPACE test
WITH placement_strategy =
'NetworkTopologyStrategy'
AND
strategy_options=[{us-east:3,us-west:2, eu-west:1}];
After that, using endPointSnitch to find out datacenter and rack info.
Then for each datacenter, first try to find node(s) on different rack, then
node(s) on any of the rack if replicas count not enough. Repeat until
replication factor is satisfied or throw exception.
Comments
Post a Comment