Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”

Posted

技术标签:

【中文标题】Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”【英文标题】:Kafka not able to connect with zookeeper with error "Timed out waiting for connection while in state: CONNECTING" 【发布时间】:2019-02-08 06:56:53 【问题描述】:

我正在尝试在 kubernetes pod 中运行我的 kafka 和 zookeeper。

这是我的zookeeper-service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: zookeeper-svc
  name: zookeeper-svc
spec:
  ports:
  - name: "2181"
    port: 2181
    targetPort: 2181
  selector:
    io.kompose.service: zookeeper
status:
  loadBalancer: 

下面是zookeeper-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: zookeeper
  name: zookeeper
spec:
  replicas: 1
  strategy: 
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: zookeeper
    spec:
      containers:
      - image: wurstmeister/zookeeper
        name: zookeeper
        ports:
        - containerPort: 2181
        resources: 
      restartPolicy: Always
status: 

kafka-deployment.yaml如下:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert -f docker-compose.yml
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: kafka
  name: kafka
spec:
  replicas: 1
  strategy: 
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: kafka
    spec:
      containers:
      - env:
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: kafka
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zookeeper:2181
        - name: KAFKA_PORT
          value: "9092"
        - name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
          value: "60000"
        image: wurstmeister/kafka
        name: kafka
        ports:
        - containerPort: 9092
        resources: 
      restartPolicy: Always
status: 

我首先启动zookeeper服务和部署。一旦 zookeeper 启动并且kubectl get pods 显示它处于运行状态,我就开始 kafka 部署。由于 restartPolicy 一如既往,Kafka 部署开始失败并一次又一次地重新启动。当我检查来自 kafka docker 的日志时,我发现它无法连接到 zookeeper 服务并且连接超时。这是来自 kafka 容器的日志。

[2018-09-03 07:06:06,670] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
atkafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$ waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:230)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:95)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1580)
at kafka.server.KafkaServer.kafka$server$KafkaServer$$createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,671] INFO shutting down (kafka.server.KafkaServer)
[2018-09-03 07:06:06,673] WARN  (kafka.utils.CoreUtils$)
java.lang.NullPointerException
atkafka.server.KafkaServer$$anonfun$shutdown$5.apply$mcV$sp(KafkaServer.scala:579)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:579)
at kafka.server.KafkaServer.startup(KafkaServer.scala:329)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,676] INFO shut down completed 
(kafka.server.KafkaServer)
[2018-09-03 07:06:06,677] ERROR Exiting Kafka. 
(kafka.server.KafkaServerStartable)
[2018-09-03 07:06:06,678] INFO shutting down 
(kafka.server.KafkaServer)

这可能是什么原因?和解决方案?

编辑:来自 zookeeper pod 的日志:

2018-09-03 10:32:39,562 [myid:] - INFO  
[main:ZooKeeperServerMain@96] - Starting server
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:zookeeper.version=3.4.9-1757313, built on 
08/23/2016 06:50 GMT
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:host.name=zookeeper-7594d99b-sgm6p
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.version=1.7.0_65
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.vendor=Oracle Corporation
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.class.path=/opt/zookeeper- 
3.4.9/bin/../build/classes:/opt/zookeeper- 
3.4.9/bin/../build/lib/*.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j- 
log4j12-1.6.1.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j-api-1.6. 
1.ja r:/opt/zookeeper-3.4.9/bin/../lib/netty- 
3.10.5.Final.jar:/opt/zookeeper-3.4.9/bin/../lib/log4j- 
1.2.16.jar:/opt/zookeeper-3.4.9/bin/../lib/jline- 
0.9.94.jar:/opt/zookeeper-3.4.9/bin/../zookeeper- 
3.4.9.jar:/opt/zookeeper- 
3.4.9/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.9/bin/../conf:

2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.io.tmpdir=/tmp
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.compiler=<NA>
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.name=Linux
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.arch=amd64 
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.version=4.15.0-20-generic
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] -     
Server environment:user.name=root
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:user.home=/root
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:user.dir=/opt/zookeeper-3.4.9
2018-09-03 10:32:39,570 [myid:] - INFO  [main:ZooKeeperServer@815] 
- 
tickTime set to 2000
2018-09-03 10:32:39,571 [myid:] - INFO  [main:ZooKeeperServer@824] 
- 
minSessionTimeout set to -1
2018-09-03 10:32:39,571 [myid:] - INFO  [main:ZooKeeperServer@833] 
- 
maxSessionTimeout set to -1
2018-09-03 10:32:39,578 [myid:] - INFO  
[main:NioserverCnxnFactory@89] 
- binding to port 0.0.0.0/0.0.0.0:2181 

编辑: 从 kafka 容器启动日志:

Excluding KAFKA_HOME from broker config
[Configuring] 'advertised.host.name' in 
'/opt/kafka/config/server.properties'
[Configuring] 'port' in '/opt/kafka/config/server.properties'
[Configuring] 'broker.id' in '/opt/kafka/config/server.properties'
Excluding KAFKA_VERSION from broker config
[Configuring] 'zookeeper.connect' in 
'/opt/kafka/config/server.properties'
[Configuring] 'log.dirs' in '/opt/kafka/config/server.properties'
[Configuring] 'zookeeper.connect.timeout.ms' in 
'/opt/kafka/config/server.properties'
 [2018-09-05 10:47:22,036] INFO Registered 
kafka:type=kafka.Log4jController MBean 
(kafka.utils.Log4jControllerRegistration$) 
[2018-09-05 10:47:23,145] INFO starting (kafka.server.KafkaServer)
[2018-09-05 10:47:23,148] INFO Connecting to zookeeper on 
zookeeper:2181 (kafka.server.KafkaServer)
[2018-09-05 10:47:23,288] INFO [ZooKeeperClient] Initializing a new 
session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2018-09-05 10:47:23,300] INFO Client 
environment:zookeeper.version=3.4.13- 
2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 
GMT (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client environment:host.name=kafka 
-757dc6c47b-zpzfz (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client 
environment:java.version=1.8.0_171 (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.vendor=Oracle Corporation 
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.home=/usr/lib/jvm/java-1.8-openjdk/jre 
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.class.path=/opt/kafka/bin/../libs/activation- 
1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/argparse4j- 
0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations- 
0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3- 
3.5.jar:/opt/kafka/bin/../libs/connect-api- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-file- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-json- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-runtime- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-transforms- 
2.0.0.jar:/opt/kafka/bin/../libs/guava- 
20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/jackson-annotations- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-core- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-databind- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations- 
CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api- 
1.2.jar:/opt/kafka/bin/../libs/javax.inject- 
1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/javax.servlet-api- 
3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api- 
2.1.jar:/opt/kafka/bin/../libs/jaxb-api- 
2.3.0.jar:/opt/kafka/bin/../libs/jersey-client- 
2.27.jar:/opt/kafka/bin/../libs/jersey-common- 
2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet 
-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core- 
2.27.jar:/opt/kafka/bin/../libs/jersey-hk2- 
2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb- 
2.27.jar:/opt/kafka/bin/../libs/jersey-server 
-2.27.jar:/opt/kafka/bin/../libs/jetty-client 
-9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-continuation- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-http- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-io- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-security- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-server- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlet- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlets- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-util- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jopt-simple- 
5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.11- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-tools- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka_2.11-2.0.0- 
sources.jar:/opt/kafka/bin/../libs/kafka_2.11-2 
 .0.0.jar:/opt/kafka/bin/../libs/log4j 
1.2.17.jar:/opt/kafka/bin/../libs/lz4-java- 
1.4.1.jar:/opt/kafka/bin/../libs/maven-artifact- 
3.5.3.jar:/opt/kafka/bin/../libs/metrics-core- 
2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator- 
1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils- 
3.1.0.jar:/opt/kafka/bin/../libs/reflections- 
0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni- 
5.7.3.jar:/opt/kafka/bin/../libs/scala-library- 
2.11.12.jar:/opt/kafka/bin/../libs/scala-logging_2.11- 
3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect- 
2.11.12.jar:/opt/kafka/bin/../libs/slf4j-api- 
1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12- 
1.7.25.jar:/opt/kafka/bin/../libs/snappy-java- 
1.1.7.1.jar:/opt/kafka/bin/../libs/validation-api- 
1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient- 
0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar 
(org.apache.zookeeper.ZooKeeper) 

kubectl get svc -o wide 的输出如下:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE       SELECTOR
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    50m       <none>
zookeeper    ClusterIP   10.98.180.138   <none>        2181/TCP   48m       io.kompose.service=zookeeper

kubectl get pods -o wide 的输出:

NAME                       READY     STATUS             RESTARTS   AGE       IP           NODE
kafka-757dc6c47b-zpzfz     0/1       CrashLoopBackOff   15         1h        10.32.0.17   administrator-thinkpad-l480
zookeeper-7594d99b-784n9   1/1       Running            0          1h        10.32.0.19   administrator-thinkpad-l480

编辑: kubectl describe pod kafka-757dc6c47b-zpzfz的输出:

Name:           kafka-757dc6c47b-zpzfz
Namespace:      default
Node:           administrator-thinkpad-l480/10.11.17.86
Start Time:     Wed, 05 Sep 2018 16:17:06 +0530
Labels:         io.kompose.service=kafka
            pod-template-hash=3138727036
Annotations:    <none>
Status:         Running
IP:             10.32.0.17
Controlled By:  ReplicaSet/kafka-757dc6c47b
Containers:
  kafka:
   Container ID:docker://2bdc06d876ae23437c61f4e95539a67903cdb61e88fd9c68377b47c7705293a3
    Image:          wurstmeister/kafka
    Image ID:       docker-pullable://wurstmeister/kafka@sha256:2e3ff64e70ea983530f590282f36991c0a1b105350510f53cc3d1a0279b83c28
    Port:           9092/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 05 Sep 2018 17:29:06 +0530
      Finished:     Wed, 05 Sep 2018 17:29:14 +0530
    Ready:          False
    Restart Count:  18
    Environment:
      KAFKA_ADVERTISED_HOST_NAME:          kafka
      KAFKA_ZOOKEEPER_CONNECT:             zookeeper:2181
      KAFKA_PORT:                          9092
      KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS:  160000
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-nhb9z (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-nhb9z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-nhb9z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
             node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                From                                  Message
  ----     ------   ----               ----                                  -------
  Warning  BackOff  3m (x293 over 1h)  kubelet, administrator-thinkpad-l480  Back-off restarting failed container

【问题讨论】:

zookeeper 日志报告什么? 用 zookeeper 日志更新了问题 Zookeeper 很好...您尝试过使用 Confluent Helm Charts 吗? @cricket_007 不,我没有使用 helm charts..因为我从来没有听说过它(我也是 kubernetes 的新手)。但是一个简短的概述暗示它有它自己的卡夫卡图像..我无论如何都不想使用它。还有其他方法吗? 嗯,Confluent 是 Kafka 的企业公司和主要开发人员,所以我不确定您对他们的 Kafka 图像有什么看法 【参考方案1】:

在我的例子中,我试图在启动 Zookeeper 之前启动 Kafka 服务器。

所以正确的顺序是启动 Zookeeper,然后启动 Kafka 服务器。

【讨论】:

【参考方案2】:

这可能是什么原因?和解决方案?

原因隐藏在以下日志行后面:

INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)

Kafka 正在搜索 zookeeper,而它应该搜索您的服务名称 zookeeper-svc

解决方案很简单:将您的 Zookeeper 服务从 zookeeper-svc 重命名为 zookeeper 在您的 zookeeper-service.yaml 中,如下所示:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: zookeeper
  name: zookeeper
spec:
  ports:
  - name: "2181"
    port: 2181
    targetPort: 2181
  selector:
    io.kompose.service: zookeeper
status:
  loadBalancer: 

或者,更简洁,保持服务名称不变并重新配置 Kafka 以搜索 zookeeper-svc 而不是 zookeeper

注意:由于您的最小示例足以重现该问题(谢谢,很高兴看到最小示例有效!),尽管 pod 已启动并运行(未处于错误状态)它仍然显示:java.io.IOException: Can't resolve address: kafka:9092 与另一个问题(服务不涵盖 Kafka 部署)相关,并且超出了此问题的范围,只是为了让您知道。

【讨论】:

我在将zookeeper-svc 更改为zookeeper 后尝试过,但仍然遇到同样的错误。 奇怪。我已经运行了您的确切清单,仅在 docker edge 上重命名了服务(来自答案),并且它正确启动并且已启动并运行 2h+(注释中的错误从那里的日志中列出)。您确定您重新创建了正确的服务清单吗?你能确保你有正确的清单和正确的创建顺序吗?您能否获取帮助容器来验证 zookeeper:2181 是否可访问、已启动并正在运行? 实际上,要正确:1)我已经运行了您的确切清单以重现错误。 2)我删除了zookeeper-service,然后3)编辑它以在两个地方用zookeeper替换zookeeper-svc,4)重新创建服务和5)删除并创建kafka-deployment。在这些确切的步骤之后,一切仍然正常运行...... 我遵循了相同的步骤,但仍然面临同样的问题。而且,让辅助容器验证 zookeeper:2181 是否可访问是什么意思? 在同一命名空间中启动任何其他容器(busybox 或其他),然后执行它以验证您可以从那里访问 zookeeper:2181。连接问题表明您无法访问所需的主机,这最初是与服务相关的,只需确保您已覆盖该问题并且在集群中您对 zookeeper 主机很好。【参考方案3】:

我正在使用带有此警告的 microk8s

警告:IPtables FORWARD 策略是 DROP

考虑启用流量转发:

sudo iptables -P FORWARD ACCEPT

修复它对我有用

【讨论】:

【参考方案4】:

我在使用 zookeeper docker 镜像运行 atlas 时遇到了同样的问题。 增加超时对我有帮助。

atlas.kafka.zookeeper.session.timeout.ms=40000
atlas.kafka.zookeeper.connection.timeout.ms=20000

【讨论】:

以上是关于Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”的主要内容,如果未能解决你的问题,请参考以下文章

避免Kafka客户端无法连接Docker上运行的Kafka,又名:Docker如何添加hosts映射

启动zookeeper和kafka时 kafka无法启动或者闪退

kafka与zookeeper管理之kafka-manager踩坑小记

Zookeeper 在 Kafka 中的作用

动态从zookeeper读取kafka信息

动态从zookeeper读取kafka信息