记一次Kafka warning排查过程

Posted 2023-03-10 AlaGeek

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了记一次Kafka warning排查过程相关的知识，希望对你有一定的参考价值。

1、前因

在配合测试某个需求的时候，正好看到控制台打印了个报错，如下：

2023-03-06 17:05:58,565[325651ms][pool-28-thread-1][org.apache.kafka.common.utils.AppInfoParser][WARN] - Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=producer-1
        at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(AppInfoParser.java:64)
        at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:426)
        at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:287)
        at org.springframework.kafka.core.DefaultKafkaProducerFactory.createKafkaProducer(DefaultKafkaProducerFactory.java:406)
        at org.springframework.kafka.core.DefaultKafkaProducerFactory.createProducer(DefaultKafkaProducerFactory.java:392)
        at org.springframework.kafka.core.KafkaTemplate.getTheProducer(KafkaTemplate.java:463)
        at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:401)
        at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:216)

很明显是Kafka在创建Producer实例的时候重复了，正好趁着有空排查排查，不然谁知道后面会因为这个导致什么问题。

2、BUG定位

根据堆栈信息，找到与Kafka有关的报错代码，进到类 AppInfoParser 的 registerAppInfo方法中，代码如下：

public static synchronized void registerAppInfo(String prefix, String id, Metrics metrics, long nowMs) 
    try 
        ObjectName name = new ObjectName(prefix + ":type=app-info,id=" + Sanitizer.jmxSanitize(id));
        AppInfo mBean = new AppInfo(nowMs);
        ManagementFactory.getPlatformMBeanServer().registerMBean(mBean, name);

        registerMetrics(metrics, mBean); // prefix will be added later by JmxReporter
     catch (JMException e) 
        log.warn("Error registering AppInfo mbean", e);

从方法名可以推测，应当是 Kafka 在创建 Producer 实例时，会按 Producer 的 id 构造一个 AppInfo，并注册到一个公共的类似Map的东西中，而我们的代码创建了多个实例，并且 id 重复了，基于这个猜测来看Kafka的配置文件（已脱敏）：

<!-- 定义producer1的参数 -->
<bean id="producerProperties1" class="java.util.HashMap">
  <constructor-arg>
    <map>
      <entry key="bootstrap.servers" value="localhost:9092"/>
      <entry key="retries" value="3"/>
      <entry key="batch.size" value="4096"/>
      <entry key="linger.ms" value="10"/>
      <entry key="buffer.memory" value="40960"/>
      <entry key="acks" value="all"/>
      <entry key="key.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
      <entry key="value.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
    </map>
  </constructor-arg>
</bean>

<!-- 定义producer2的参数 -->
<bean id="producerProperties2" class="java.util.HashMap">
  <constructor-arg>
    <map>
      <entry key="bootstrap.servers" value="localhost:9092"/>
      <entry key="retries" value="3"/>
      <entry key="batch.size" value="4096"/>
      <entry key="linger.ms" value="10"/>
      <entry key="buffer.memory" value="40960"/>
      <entry key="acks" value="all"/>
      <entry key="key.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
      <entry key="value.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
    </map>
  </constructor-arg>
</bean>

可以看到项目中配置了两个 Kafka 的 Producer，并且都未指定 Producer 的 id，符合我们的猜测，那么我们要怎么修复，如果我们指定了 id，Producer 在多线程的情况下，每个线程的 id 是否又会重复。
基于几个问题，进到类 KafkaProducer 的构造方法中，来看 AppInfoParser.registerAppInfo() 方法调用语句：

AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics, time.milliseconds());

可以看到前面说的 Producer 的 id 实际上是 clientId，往前找到 clientId 的赋值语句：

this.clientId = buildClientId(config.getString(ProducerConfig.CLIENT_ID_CONFIG), transactionalId);

进到 buildClientId 里面：

private static String buildClientId(String configuredClientId, String transactionalId) 
    if (!configuredClientId.isEmpty())
        return configuredClientId;

    if (transactionalId != null)
        return "producer-" + transactionalId;

    return "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();

可知如果 configuredClientId 和 transactionalId 都为空，那么clientId就会自动生成，继续往上追溯，来看 transactionalId 的赋值语句：

String transactionalId = userProvidedConfigs.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG) ?
                (String) userProvidedConfigs.get(ProducerConfig.TRANSACTIONAL_ID_CONFIG) : null;

其中 ProducerConfig.TRANSACTIONAL_ID_CONFIG 值为 transactional.id，可见 transactionalId 的值取得是用户配置（userProvidedConfigs）中的 transactional.id 的值，而 configuredClientId 值并不是直接获取的用户配置（userProvidedConfigs）的 client.id，而是拿的构造方法中传入的config中的 client.id 对应的值，说明 config 很有可能是在用户配置（userProvidedConfigs）的基础上进行了些许处理。
继续往上追溯，进到 DefaultKafkaProducerFactory.createKafkaProducer 方法中：

protected Producer<K, V> createKafkaProducer() 
    if (this.clientIdPrefix == null) 
        return new KafkaProducer<>(this.configs, this.keySerializerSupplier.get(),
                this.valueSerializerSupplier.get());
    
    else 
        Map<String, Object> newConfigs = new HashMap<>(this.configs);
        newConfigs.put(ProducerConfig.CLIENT_ID_CONFIG,
                this.clientIdPrefix + "-" + this.clientIdCounter.incrementAndGet());
        return new KafkaProducer<>(newConfigs, this.keySerializerSupplier.get(),
                this.valueSerializerSupplier.get());

可以看到如果 clientIdPrefix 不为空的情况下，会在 config 中放入 client.id 的键值对，很明显这种情况下不会有我们所说的 clientId 重复的情况发生，因此我们只需要保证 clientIdPrefix 不为空即可。在 DefaultKafkaProducerFactory 构造方法中找到 clientIdPrefix 的赋值语句：

if (this.clientIdPrefix == null && configs.get(ProducerConfig.CLIENT_ID_CONFIG) instanceof String) 
    this.clientIdPrefix = (String) configs.get(ProducerConfig.CLIENT_ID_CONFIG);

其中 ProducerConfig.CLIENT_ID_CONFIG 值为 client.id，所以只需要在用户配置中添加 client.id 的值，那么 KafkaProducer 在创建时，就会在自动生成的 clientId 中添加前缀字符串，从而避免不同的 KafkaProducer 的 id 冲突。

3、BUG修复

将上述Kafka配置文件修改如下：

<!-- 定义producer1的参数 -->
<bean id="producerProperties1" class="java.util.HashMap">
  <constructor-arg>
    <map>
      <entry key="bootstrap.servers" value="localhost:9092"/>
      <entry key="client.id" value="a"/>
      <entry key="retries" value="3"/>
      <entry key="batch.size" value="4096"/>
      <entry key="linger.ms" value="10"/>
      <entry key="buffer.memory" value="40960"/>
      <entry key="acks" value="all"/>
      <entry key="key.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
      <entry key="value.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
    </map>
  </constructor-arg>
</bean>

<!-- 定义producer2的参数 -->
<bean id="producerProperties2" class="java.util.HashMap">
  <constructor-arg>
    <map>
      <entry key="bootstrap.servers" value="localhost:9092"/>
      <entry key="client.id" value="b"/>
      <entry key="retries" value="3"/>
      <entry key="batch.size" value="4096"/>
      <entry key="linger.ms" value="10"/>
      <entry key="buffer.memory" value="40960"/>
      <entry key="acks" value="all"/>
      <entry key="key.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
      <entry key="value.serializer" value="org.apache.kafka.common.serialization.StringSerializer"/>
    </map>
  </constructor-arg>
</bean>

以上是关于记一次Kafka warning排查过程的主要内容，如果未能解决你的问题，请参考以下文章

TPS低，CPU高--记一次storm压测问题排查过程

记一次生产kafka消息消费的事故

记一次OGG数据写入HBase的丢失数据原因分析

干货！一次 kafka 卡顿事故排查过程

记一次kafka故障

troubleshooting记一次Kafka集群重启导致消息重复消费问题处理记录