Apache Kafka@IT·大数据消息中间件

总结kafka的consumer消费能力很低的情况下的处理方案

2016-10-21  本文已影响32835人  LOC_Thomas

简介

由于项目中需要使用kafka作为消息队列,并且项目是基于spring-boot来进行构建的,所以项目采用了spring-kafka作为原生kafka的一个扩展库进行使用。先说明一下版本:

用过kafka的人都知道,对于使用kafka来说,producer的使用相对简单一些,只需要把数据按照指定的格式发送给kafka中某一个topic就可以了。本文主要是针对spring-kafka的consumer端上的使用进行简单一些分析和总结。

kafka的速度是很快,所以一般来说producer的生产消息的逻辑速度都会比consumer的消费消息的逻辑速度快。

具体案例

之前在项目中遇到了一个案例是,consumer消费一条数据平均需要200ms的时间,并且在某个时刻,producer会在短时间内产生大量的数据丢进kafka的broker里面(假设平均1s中内丢入了5w条需要消费的消息,这个情况会持续几分钟)。

对于这种情况,kafka的consumer的行为会是:

[rhllor] Tue Oct 18 21:39:16 CST 2016 INFO [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator coordinatorDead 529: kafka-example|NTI|Marking the coordinator 2147483646 dead.
[rhllor] Tue Oct 18 21:39:16 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator sendGroupMetadataRequest 465: kafka-example|NTI|Issuing group metadata request to broker 1
[rhllor] Tue Oct 18 21:39:16 CST 2016 ERROR [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator handle 550: kafka-example|NTI|Error ILLEGAL_GENERATION occurred while committing offsets for group new-message-1
[rhllor] Tue Oct 18 21:39:16 CST 2016 WARN [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator onComplete 424: kafka-example|NTI|Auto offset commit failed: Commit cannot be completed due to group rebalance
[rhllor] Tue Oct 18 21:39:16 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator run 408: kafka-example|NTI|Cannot auto-commit offsets now since the coordinator is unknown, will retry after backoff
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator handleGroupMetadataResponse 478: kafka-example|NTI|Group metadata response ClientResponse(receivedTimeMs=1476797957072, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@1d3d7e6, request=RequestSend(header={api_key=10,api_version=0,correlation_id=20,client_id=consumer-1}, body={group_id=new-message-1}), createdTimeMs=1476797956485, sendTimeMs=1476797956485), responseBody={error_code=0,coordinator={node_id=1,host=10.10.44.124,port=9092}})
[rhllor] Tue Oct 18 21:39:17 CST 2016 ERROR [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator handle 550: kafka-example|NTI|Error ILLEGAL_GENERATION occurred while committing offsets for group new-message-1
[rhllor] Tue Oct 18 21:39:17 CST 2016 WARN [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator maybeAutoCommitOffsetsSync 445: kafka-example|NTI|Auto offset commit failed: 
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator onJoinPrepare 247: kafka-example|NTI|Revoking previously assigned partitions [rhllor-log-0, rhllor-log-1, rhllor-log-2]
[rhllor] Tue Oct 18 21:39:17 CST 2016 INFO [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.springframework.kafka.listener.KafkaMessageListenerContainer onPartitionsRevoked 244: kafka-example|NTI|partitions revoked:[rhllor-log-0, rhllor-log-1, rhllor-log-2]
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator performGroupJoin 309: kafka-example|NTI|(Re-)joining group new-message-1
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator performGroupJoin 318: kafka-example|NTI|Issuing request (JOIN_GROUP: {group_id=new-message-1,session_timeout=15000,member_id=consumer-1-64063d04-9d4e-45af-a927-17ccf31c6ec1,protocol_type=consumer,group_protocols=[{protocol_name=range,protocol_metadata=java.nio.HeapByteBuffer[pos=0 lim=22 cap=22]}]}) to coordinator 2147483646
[rhllor] Tue Oct 18 21:39:17 CST 2016 INFO [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator handle 354: kafka-example|NTI|Attempt to join group new-message-1 failed due to unknown member id, resetting and retrying.
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator performGroupJoin 309: kafka-example|NTI|(Re-)joining group new-message-1
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator performGroupJoin 318: kafka-example|NTI|Issuing request (JOIN_GROUP: {group_id=new-message-1,session_timeout=15000,member_id=,protocol_type=consumer,group_protocols=[{protocol_name=range,protocol_metadata=java.nio.HeapByteBuffer[pos=0 lim=22 cap=22]}]}) to coordinator 2147483646
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator handle 336: kafka-example|NTI|Joined group: {error_code=0,generation_id=1,group_protocol=range,leader_id=consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2,member_id=consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2,members=[{member_id=consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2,member_metadata=java.nio.HeapByteBuffer[pos=0 lim=22 cap=22]}]}
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator performAssignment 225: kafka-example|NTI|Performing range assignment for subscriptions {consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2=org.apache.kafka.clients.consumer.internals.PartitionAssignor$Subscription@1dbca7d4}
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator performAssignment 229: kafka-example|NTI|Finished assignment: {consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2=org.apache.kafka.clients.consumer.internals.PartitionAssignor$Assignment@4826f394}
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator onJoinLeader 397: kafka-example|NTI|Issuing leader SyncGroup (SYNC_GROUP: {group_id=new-message-1,generation_id=1,member_id=consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2,group_assignment=[{member_id=consumer-1-d3f30611-5788-4b81-bf0d-e779a11093d2,member_assignment=java.nio.HeapByteBuffer[pos=0 lim=38 cap=38]}]}) to coordinator 2147483646
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.AbstractCoordinator handle 423: kafka-example|NTI|Received successful sync group response for group new-message-1: {error_code=0,member_assignment=java.nio.HeapByteBuffer[pos=0 lim=38 cap=38]}
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator onJoinComplete 191: kafka-example|NTI|Setting newly assigned partitions [rhllor-log-0, rhllor-log-1, rhllor-log-2]
[rhllor] Tue Oct 18 21:39:17 CST 2016 INFO [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.springframework.kafka.listener.KafkaMessageListenerContainer onPartitionsAssigned 249: kafka-example|NTI|partitions assigned:[rhllor-log-0, rhllor-log-1, rhllor-log-2]
[rhllor] Tue Oct 18 21:39:17 CST 2016 DEBUG [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-kafka-consumer-1] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator sendOffsetFetchRequest 581: kafka-example|NTI|Fetching committed offsets for partitions: [rhllor-log-0, rhllor-log-1, rhllor-log-2]

日志的意思大概是coordinator挂掉了,然后自动提交offset失败,然后重新分配partition给客户端

解决方案

遇到了这个问题之后, 我们做了一些步骤:

/**
     * The ack mode to use when auto ack (in the configuration properties) is false.
     * <ul>
     * <li>RECORD: Ack after each record has been passed to the listener.</li>
     * <li>BATCH: Ack after each batch of records received from the consumer has been
     * passed to the listener</li>
     * <li>TIME: Ack after this number of milliseconds; (should be greater than
     * {@code #setPollTimeout(long) pollTimeout}.</li>
     * <li>COUNT: Ack after at least this number of records have been received</li>
     * <li>MANUAL: Listener is responsible for acking - use a
     * {@link AcknowledgingMessageListener}.
     * </ul>
     */
    private AbstractMessageListenerContainer.AckMode ackMode = AckMode.BATCH;

这些策略保证了在一批消息没有完成消费的情况下,也能提交offset,从而避免了完全提交不上而导致永远重复消费的问题。

分析

那么问题来了,为什么spring-kafka的提交offset的策略能够解决spring-kafka的auto-commit的带来的重复消费的问题呢?下面通过分析spring-kafka的关键源码来解析这个问题。

if (isRunning() && this.definedPartitions != null) { 
      initPartitionsIfNeeded();      
 // we start the invoker here as there will be no rebalance calls to       
// trigger it, but only if the container is not set to autocommit       
// otherwise we will process records on a separate thread      
     if (!this.autoCommit) {        
            startInvoker();     
     }
 }

上面可以看到,如果auto.commit关掉的话,spring-kafka会启动一个invoker,这个invoker的目的就是启动一个线程去消费数据,他消费的数据不是直接从kafka里面直接取的,那么他消费的数据从哪里来呢?他是从一个spring-kafka自己创建的阻塞队列里面取的。

最后

上一篇 下一篇

猜你喜欢

热点阅读