Take you in-depth analysis Kafka Architecture knowledge points , Learning to understand Kafka data processing ,Kafka
Core components ,broker And clusters ,Consumer And topic relationship ,Kafka Distribution of messages ,Consumer Load balancing development technology based on .

1. Kafka Data processing steps
1.1 Producer Generate message to Broker in ;
1.2 Leader Stateful Broker receive messages , Write to corresponding topic in ;
1.3 Leader Stateful Broker After receiving , Pass to Follow Stateful Broker Backup as copy ;
1.4 Consumer consumption Broker Messages in .

2. Kafka Core components
2.1 Producer: Messages generated by message producers will be sent to a topic;
2.2 Consumer: Message consumer , The consumed message content comes from a topic;
2.3 Topic: Message according to topic Categorize ,topic Its essence is a directory , Categorize messages of the same subject into the same directory ;
2.4 Broker: every last kafka example ( Or each kafka Server node ) Just one broker, One broker Can have multiple topic;
2.5 Zookeeper: Zookeeper Cluster does not belong to kafka Components in , but kafka rely on
Zookeeper Cluster save meta information , So I hereby declare its importance .

3. broker And clusters

An independent Kafka The server is called broker,broker Receive a message from the producer, set an offset for the message, and submit the message to disk for saving .broker Providing services to consumers , Respond to requests to read partitions , Returns a message that has been submitted to disk . Based on specific hardware and its performance characteristics , single broker Can easily handle thousands of partitions and millions of messages per second .

broker Is part of the cluster . Each cluster has one broker It also acts as a cluster controller ( Automatically elect from active members of the cluster ). The controller is responsible for management , Include assigning partitions to broker And monitoring broker. In a cluster , A partition belongs to a broker, The broker Known as the leader of the Division . A partition can be assigned multiple broker, Partition replication will occur at this time . This replication mechanism provides message redundancy for partitions , If one broker invalid , other broker Can take over leadership . however , Relevant consumers and producers must reconnect to the new leader .

4. Consumer And topic relationship
kafka Only supported Topic
• each group Can have more than one consumer, each consumer Belong to one consumer
group; under normal conditions , One group Will contain multiple consumer, This will not only improve topic Concurrent consumption of messages in , And it can improve ” Fault tolerance ” nature , If group One of consumer If it fails, its consumption partitions Other consumer Automatic takeover .

• about Topic A specific message in , Will only be subscribed to this Topic Each of group One of consumer consumption , This message will not be sent to a group Multiple of consumer; So one group All in consumer Will consume the whole Topic, each group in consumer Message consumption is independent of each other , We can think of one group It's a ” subscribe ” person .
• stay kafka in , One partition Messages in will only be group One of consumer consumption ( At the same time );

One Topic Each of partions, Only one ” subscriber ” One of consumer consumption , But one consumer Can consume multiple at the same time partitions Messages in .

•kafka Determined by the design principle of , For a topic, Same group Cannot have more than partitions Number of consumer Simultaneous consumption , Otherwise it would mean some consumer Will not get message , And is idle .

kafka Only one can be guaranteed partition A message in is consumer Consumption is sequential ; in fact , from Topic From an angle , When there are multiple partitions Hour , Messages are still not globally ordered .

5. Kafka Distribution of messages
•Producer The client is responsible for the distribution of messages

•kafka Any one in the cluster broker You can producer provide metadata information , these metadata Contains ” Surviving in the cluster servers list ”,“partitions
leader list ” Etc ;
• When producer Get to metadata After message , producer Will and Topic Lower all partition leader keep socket connect ;

• Message by producer Directly through socket Send to broker, There will be no ” Routing layer ”. in fact , To which message is routed partition Shangyou producer Client decision , For example, you can use ”random””key-hash”” polling ” etc .
• If one topic Multiple in partitions, So in producer End implementation ” Message balanced distribution ” be necessary .
• stay producer In the configuration file of the end , Developers can specify partition Routing method .
•Producer Response mechanism for message sending
Set whether the server feedback is required for sending data , There are three values 0,1,-1
–0: producer Will not wait broker send out ack
–1: When leader Send after receiving message ack
–-1: When all follower Send after successful synchronization ack

6. Consumer Load balancing for
When a group in , have consumer When joining or leaving , Will trigger partitions equilibrium . The ultimate goal of equilibrium , Is promotion topic Concurrent consumption capacity of , The steps are as follows :
1. If topic1, Have the following partitions: P0,P1,P2,P3
2. join group A in , There are the following consumer: C0,C1
3. First, according to partition Index number pair partitions sort : P0,P1,P2,P3
4. according to consumer.id sort : C0,C1
5. Calculation multiple : M = [P0,P1,P2,P3].size / [C0,C1].size, Example value M=2( Round up )
6. Then assign in turn partitions: C0 = [P0,P1],C1=[P2,P3], Namely Ci = [P(i * M),P((i + 1) * M -1)]