关于Kafka的其他一些内容,堆积情况,retention,auto.offset.reset

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了关于Kafka的其他一些内容,堆积情况,retention,auto.offset.reset相关的知识,希望对你有一定的参考价值。

producer是线程安全的,多线程分享共用一个producer比用多个 producer整体上要快

如果你想了解大数据的学习路线,想学习大数据知识以及需要免费的学习资料可以加群:784789432.欢迎你的加入。每天下午三点开直播分享基础知识,晚上20:00都会开直播给大家分享大数据项目实战。

可以命令行查看某consumer group所有consumer的offset,和落后的多少,也就是说可以查看Kafka中数据的堆积情况,以下来自官方文档

Sometimes it‘s useful to see the position of your consumers. We have a tool that will show the position of all consumers in a consumer group as well as how far behind the end of the log they are. To run this tool on a consumer group named my-group consuming a topic named my-topic would look like this:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group

Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
my-topic 0 2 4 2 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127.0.0.1 consumer-1
my-topic 1 2 3 1 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127.0.0.1 consumer-1
my-topic 2 2 3 1 consumer-2-42c1abd4-e3b2-425d-a8bb-e1ea49b29bb2 /127.0.0.1 consumer-2

This tool also works with ZooKeeper-based consumers:

bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --describe --group my-group

Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
my-topic 0 2 4 2 my-group_consumer-1
my-topic 1 2 3 1 my-group_consumer-1
my-topic 2 2 3 1 my-group_consumer-2

retention

Kafka保留消费后的数据,但也不是永远保留,默认7天后,会自动删除数据。当然我们可以设置保留(retention)几天,由broker config (也就是server.properties)中的log.retention.ms、log.retention.minutes、log.rentention.hours设置,优先级递增。默认log.rentention.hours=168。

还有一种retention的设置方法,就是log.retention.bytes,也是server.properties中设置,定义了一个partition存储的最大大小。两种方法一种满足就会删除。

kafka删除是按照segment来删,一次只能整个删掉一个或多个segment。

另外也可以设置每个topic的retention情况,具体看官方文档http://kafka.apache.org/documentation/

See http://kafka.apache.org/documentation/#brokerconfigs for the full list of log.retention./log.roll./log.segment.* configs

auto.offset.reset

What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer‘s group
anything else: throw exception to the consumer.

default: latest

每一个topic记录offset,offset属于不同的group,一个group只有一个offset(每个partition)

以上是关于关于Kafka的其他一些内容,堆积情况,retention,auto.offset.reset的主要内容,如果未能解决你的问题,请参考以下文章

Kafka的Lag计算误区及正确实现

Kafka 的 Lag 计算误区及正确实现

Spark 实战系列sparkstreaming 任务出现堆积如何优化?(流量突然大增资源不够怎么办?)

Spark 实战系列sparkstreaming 任务出现堆积如何优化?(流量突然大增资源不够怎么办?)

Kafka数据堆积分析处理

kafka消费者状态检查—消费的offset是不是滞后/堆积