How Cassandra Hector client load balancing work

Concept:
hector connection pool contains other connection pools. The "other connection pool" here is the connection pool to each Cassandra host. Let's call the master hector pool as hector pool and other pools as host pool. So, for example, your cluster has 3 hosts, then the hector pool will contain 3 host connection pools, each host pool has several connections to one of the 3 hosts.

Details:
come with hector client library, there are 3 load balancing policies (will call them LBP).
1. RoundRobinBalancingPolicy
this LBP always return next host pool in hector pool. So if you have 3 hosts as A, B and C. Then when you request new connection pool, this LBP will give you A, B, C, A, B, C, A ….

2. LeastActiveBalancingPolicy
This LBP keeps track of how many active connections within each host pool. And when you request a new one, it will return then host pool with least active connections . If multiple host pools have same active connections, randomly return one. So if you have 3 hosts A, B and C, and A has 30 active connections, B has 40 and C has 10, then you will get C when request new pool, and until those connections was returned/released by your client (so they will became inactive), you will likely get 20 C before getting one from A.

3. DynamicLoadBalancingPolicy (DLBP hereafter)
This is the most complicated one, the algorithm used here is the same as dynamic snitch within Cassandra cluster. Below is the step when DLBP determine with host pool it should return when request

        3.1 For each client request you made to Cassandra, the total response time will be recorded within each host pool, and the response time will be used to compute score for the pool
                3.1.1 score will be reset by DLBP every 20 seconds
                3.1.2 only keep last 100 response time and at most update 1000 times before reset
                3.1.3 score = (-1) * Math.log10(1 - Math.pow(Math.E, (-1) * (0.768) / avg(response time)))
        3.2     DLBP score will be updated every 0.1 second using above step’s formula
        3.3 randomly pick one host pool and compare the score it has (s1) with all others (sn)
                3.3.1 return node with smallest score if exists (s1-sn)/s1 > DYNAMIC_BADNESS_THRESHOLD     <---- DYNAMIC_BADNESS_THRESHOLD is a configurable value, default =0.1
                3.3.2 otherwise, return the one picked in step 3.3

Below is chart for average response Time vs. Score. You may/should/need to adjust the DYNAMIC_BADNESS_THRESHOLD for your specific work load.

If you’re using AbstractColumnFamilyTemplate or any class inherent it (like ThriftColumnFamilyTemplate) or using HectorTemplateImpl or any other class which will eventually call HConnectionManager.operateWithFailover() for request to Cassandra, here’s step for each request:

1. hector will request a host pool from hector pools based on load balancing policy and exclude down host (details in step 4)
2. from the host pool, borrow (request) a new connection
3. try to completed request on this connection.
4. if for some reason, there's transportation exception, mark the host as down and go back to 1
5. release connection back to host pool after step 1.3 or 1.4
6. if no host pool is available, throw exception to client

mingbo's tech tips

Search This Blog

How Cassandra Hector client load balancing work

Comments

Post a Comment

Popular posts from this blog

enable special character support in Graphite metric name

How to send command / input to multiple Putty window simultaneously

easily convert RSA SecurID Software Token URL between iPhone and Andriod