Redis Cluster principle ?
Under Cluster ,key How to address ?
What are the addressing algorithms ?
Talk about consistency hash?
Again, he was bombarded by interviewers from large factories !

<>1 Cluster overview

Redis Continuous development of clusters , Multiple machines can be realized , Deploying multiple instances , Each instance stores part of the data .
At the same time, each instance can be brought with it Redis From instance , Guarantee if Redis The main instance is hung , Switch to auto redis From instance .

in olden days , A lot of people use it codis Such clients support clustering . The current version is used by everyone Redis cluster, That is, the official cluster mode , It's time to go into it .

<>2 Keep reading and writing separate + Under high availability architecture , Lateral expansion support 1T+ mass data

* redis single master The capacity bottleneck of Architecture

* redis How to pass master Lateral expansion support 1T+ Data volume
<>3 Data distribution algorithm :hash+ uniformity hash+redis cluster Of hash slot

Explain the core algorithm of distributed data storage , Algorithm of data distribution

hash algorithm -> uniformity hash algorithm (memcached) -> redis cluster,hash slot algorithm

With different algorithms , It's up to you to be in more than one master Node time , How is the data distributed to these nodes , Solve this problem

<>4 Redis cluster

* Automatic fragmentation of data , each master Put some data up
* Provides built-in high availability support , part master When not available , It can continue to work
stay Redis cluster Under the framework , each Redis To open two ports , For example, one is 6379, The other is to add 10000 Port number of , such as 16379

16379 Ports are used for communication between nodes , that is cluster bus Cluster bus
cluster bus Communication for , Used to detect faults , Configuration update , Fail over authorization

cluster bus Another binary protocol was used - gossip, It is mainly used for efficient data exchange between nodes , Less network bandwidth and processing time

* The most rustic hash Algorithms and drawbacks ( Massive cache rebuild )

3, uniformity hash algorithm ( Automatic cache migration )+ Virtual node ( Automatic load balancing )

* uniformity hash Explanation and advantages of the algorithm

* uniformity hash The virtual node of the algorithm realizes load balancing

4,redis cluster Of hash slot algorithm

redis cluster There are fixed ones 16384 individual hash slot, For each key calculation CRC16 value , And then yes 16384 Mould taking , Available key Corresponding hash slot

redis cluster Each of them master They will hold some slot, For example, there are 3 individual master, Then maybe each master hold 5000 Multiple hash slot

hash slot Give Way node It's easy to add and remove , Add one more master, The other master Of hash
slot Moving parts past , One less master, Just take it hash slot Move to other master up

move hash slot The cost is very low

Client's api, It can be used for the specified data , Let them go the same way hash slot, adopt hash tag To achieve

<>5 Internal communication mechanism between nodes

<>5.1 Basic communication principle

Metadata for maintaining clusters

<>5.1.1 Centralized

Centralized storage and maintenance of cluster metadata

Cluster metadata ( Node information , Failure, etc ) Centralized storage in a node

Updating and reading metadata , Good timeliness , Once the metadata changes , Immediately update to centralized storage , Other nodes are immediately aware as they read

All the pressure of updating metadata is concentrated in one place , It may cause pressure on the storage of metadata

Redis cluster The other method adopted between nodes is called gossip Agreement of
Keep communicating with each other , Keep the data of all nodes in the whole cluster intact

gossip: The advantage is that , The update of metadata is scattered , Not in one place , Renewal requests will continue , Call all nodes to update , There is a certain delay , It reduces the pressure ;
shortcoming , Metadata update has delay , Some operations of the cluster may be delayed

We just did it reshard, To do another operation , You'll find out ,configuration error, Reach an agreement

(2)10000 port

Each node has a dedicated port for communication between nodes , It is the port number of the service provided by itself +10000, such as 7001, So what is used for communication between nodes is 17001 port

Each node sends messages to several other nodes at regular intervals ping news , At the same time, several other points have been received ping Then return pong

(3) Information exchanged

Fault information , Adding and removing nodes ,hash slot information , wait

<>5.1.2 gossip agreement

* gossip Protocol maintenance cluster metadata

The protocol contains multiple messages

Send by a node meet For newly added nodes , Let the new node join the cluster , The new node then begins to communicate with other nodes
redis-trib.rb add-node
In fact, it sent a gossip meet Message to new node , Notify the node to join the cluster


Each node sends messages to other nodes frequently ping, It contains its own state and cluster metadata maintained by itself , Through each other ping Exchange metadata

ping Very often , And carry some metadata , The possible burden of the network will be increased

Each node per s Will execute 10 second ping, Every time I choose 5 Other nodes that have not communicated for the longest time . Of course, if it is found that the communication delay of a node has reached
cluster_node_timeout / 2

Then send it immediately ping, Avoid long data exchange delay , It's too long behind . for instance , Between the two nodes 10 Minutes without data exchange , Then the whole cluster is in a serious metadata inconsistency situation , There will be problems . therefore
cluster_node_timeout It can be adjusted , If the adjustment is large , That will reduce the transmission frequency

every time ping, One is to bring their own node information , And take it with you 1/10 Information of other nodes , Send it out , Data exchange . At least include 3 Information about other nodes , Up to total nodes -2 Information about other nodes


return ping and meet, Contains your own status and other information , It can also be used for information broadcasting and updating


One node judges another fail after , Just send it fail To other nodes , Notify other nodes , The specified node is down !

<>6 Cluster oriented Jedis Internal implementation principle

development Jedis,Redis Of Java client

jedis cluster api And redis cluster Some basic principles of cluster interaction

<>6.1 Redirection based client

redis-cli -c, redirect automatically

<>6.1.1 request redirections

The client may pick any one Redis Instance to send the command , Each instance receives a command , Can calculate key Corresponding hash slot

If it is local, it will be handled locally , Otherwise return moved To client , Let client redirect
cluster keyslot mykey
You can view one key Corresponding hash slot What is it?

use redis-cli When , Can be added -c parameter , Support automatic request redirection ,redis-cli Received moved after , Will automatically redirect to the corresponding node to execute the command

<>6.1.2 calculation hash slot

calculation hash slot The algorithm of , It's the basis key calculation CRC16 value , And then yes 16384 Mould taking , Get the corresponding hash slot

use hash tag It can be specified manually key Corresponding slot, The same hash tag Under the key, All will be in one hash slot in , such as set
mykey1:{100} and set mykey2:{100}

<>6.1.3 hash slot lookup

Through between nodes gossip Protocol data exchange , I know every one hash slot On which node

<>6.2 smart jedis

<>6.2.1 What is? smart jedis

Redirection based client , It's very network consuming IO, Because most of the time , There may be a request redirection , To find the right node

So most of the clients , such as java redis client , namely jedis, All of them smart Of

Local maintenance hashslot -> node Mapping table for , cache , Most of the time , Go directly to the local cache hashslot ->
node, It does not need to be done through nodes moved redirect

<>6.2.2 JedisCluster How it works

stay JedisCluster When initializing , You'll randomly choose one node, initialization hashslot -> node Mapping table , Create one for each node at the same time JedisPool Connection pool

Based on JedisCluster Perform the operation , first JedisCluster All will be calculated locally key Of hashslot, Then find the corresponding node in the local mapping table

If that node It happens to be holding that hashslot, Well, then ok;
If it did reshard Such operation , probably hashslot It's not there anymore node Yes , Will return moved

If JedisCluter API Return when the corresponding node is found moved, Then use the metadata of this node , Update local hashslot -> node Mapping table cache

Repeat the above steps , Until the corresponding node is found , If retrying exceeds 5 second , Then the report is wrong ,JedisClusterMaxRedirectionException

jedis Old version , It may occur when a node in the cluster fails and the automatic handover recovery has not been completed , Frequent updates hash slot, frequently ping Node check active , Lead to a large number of networks IO expenses

jedis Latest version , For these excessive hash slot Update and ping, Both have been optimized , Similar problems are avoided

<>6.2.3 hashslot Migration and ask redirect

If hash slot Migrating , Will return then ask Redirect to jedis

jedis Received ask After redirection , It will be relocated to the target node for execution , But because ask happen to hash slot During migration , therefore JedisCluster
API received ask It will not be updated hashslot Local cache

It is certain that ,hashslot It's done ,moved Will update local hashslot->node Mapping table cached

<>7 High availability and the principle of active standby switching

principle , It's almost like a sentry

<>7.1 Judge node downtime

If one node thinks the other is down , Namely pfail - Subjective downtime
If more than one node thinks that the other node is down , Namely fail - Objective downtime
It's almost the same as the sentry ,sdown - odown

stay cluster-node-timeout within , A node has never returned pong, Then it is considered that pfail

If a node thinks of a node as a node pfail, Then it will be in gossip ping In the news ,ping To other nodes , If more than half of the nodes agree that pfail, Then it will become fail

<>7.2 Filter from node

For down time master node, From its ownership slave node in , Select one to switch to master node

Check each slave node And master node Time to disconnect , If it exceeds
cluster-node-timeout * cluster-slave-validity-factor
Then you are not eligible to switch to master, This is the same as the sentry , Steps to filter from node timeout

<>7.3 Election node

sentry : Sort all slave nodes ,slave priority,offset,run id

Each slave node , According to their own right master Replication of data offset, Set an election time ,offset The bigger it is ( More data to replicate ) Slave node of , The earlier the election time , Give priority to elections

be-all master node start slave Voting in elections , For those to be elected slave vote , If most of them
master node(N/2 + 1)
All voted for a slave node , Then the election is passed , The slave node can be switched to master

The slave node performs the master-slave switch , Switch from node to master node

<>7.4 Compared with the sentry

The whole process is compared to the sentry , Very similar , So ,redis cluster Powerful , Directly integrated replication and sentinal Function of