Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”
Posted
技术标签:
【中文标题】Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”【英文标题】:Kafka not able to connect with zookeeper with error "Timed out waiting for connection while in state: CONNECTING" 【发布时间】:2019-02-08 06:56:53 【问题描述】:我正在尝试在 kubernetes pod 中运行我的 kafka 和 zookeeper。
这是我的zookeeper-service.yaml
:
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.1.0 (36652f6)
creationTimestamp: null
labels:
io.kompose.service: zookeeper-svc
name: zookeeper-svc
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
io.kompose.service: zookeeper
status:
loadBalancer:
下面是zookeeper-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.1.0 (36652f6)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
replicas: 1
strategy:
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: zookeeper
spec:
containers:
- image: wurstmeister/zookeeper
name: zookeeper
ports:
- containerPort: 2181
resources:
restartPolicy: Always
status:
kafka-deployment.yaml
如下:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -f docker-compose.yml
kompose.version: 1.1.0 (36652f6)
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: kafka
spec:
containers:
- env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: kafka
- name: KAFKA_ZOOKEEPER_CONNECT
value: zookeeper:2181
- name: KAFKA_PORT
value: "9092"
- name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
value: "60000"
image: wurstmeister/kafka
name: kafka
ports:
- containerPort: 9092
resources:
restartPolicy: Always
status:
我首先启动zookeeper服务和部署。一旦 zookeeper 启动并且kubectl get pods
显示它处于运行状态,我就开始 kafka 部署。由于 restartPolicy 一如既往,Kafka 部署开始失败并一次又一次地重新启动。当我检查来自 kafka docker 的日志时,我发现它无法连接到 zookeeper 服务并且连接超时。这是来自 kafka 容器的日志。
[2018-09-03 07:06:06,670] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
atkafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$ waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:230)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:95)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1580)
at kafka.server.KafkaServer.kafka$server$KafkaServer$$createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,671] INFO shutting down (kafka.server.KafkaServer)
[2018-09-03 07:06:06,673] WARN (kafka.utils.CoreUtils$)
java.lang.NullPointerException
atkafka.server.KafkaServer$$anonfun$shutdown$5.apply$mcV$sp(KafkaServer.scala:579)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:579)
at kafka.server.KafkaServer.startup(KafkaServer.scala:329)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,676] INFO shut down completed
(kafka.server.KafkaServer)
[2018-09-03 07:06:06,677] ERROR Exiting Kafka.
(kafka.server.KafkaServerStartable)
[2018-09-03 07:06:06,678] INFO shutting down
(kafka.server.KafkaServer)
这可能是什么原因?和解决方案?
编辑:来自 zookeeper pod 的日志:
2018-09-03 10:32:39,562 [myid:] - INFO
[main:ZooKeeperServerMain@96] - Starting server
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:zookeeper.version=3.4.9-1757313, built on
08/23/2016 06:50 GMT
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:host.name=zookeeper-7594d99b-sgm6p
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:java.version=1.7.0_65
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:java.vendor=Oracle Corporation
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:java.class.path=/opt/zookeeper-
3.4.9/bin/../build/classes:/opt/zookeeper-
3.4.9/bin/../build/lib/*.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j-
log4j12-1.6.1.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j-api-1.6.
1.ja r:/opt/zookeeper-3.4.9/bin/../lib/netty-
3.10.5.Final.jar:/opt/zookeeper-3.4.9/bin/../lib/log4j-
1.2.16.jar:/opt/zookeeper-3.4.9/bin/../lib/jline-
0.9.94.jar:/opt/zookeeper-3.4.9/bin/../zookeeper-
3.4.9.jar:/opt/zookeeper-
3.4.9/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.9/bin/../conf:
2018-09-03 10:32:39,567 [myid:] - INFO [main:Environment@100] -
Server environment:java.io.tmpdir=/tmp
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:java.compiler=<NA>
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:os.name=Linux
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:os.arch=amd64
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:os.version=4.15.0-20-generic
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:user.name=root
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:user.home=/root
2018-09-03 10:32:39,569 [myid:] - INFO [main:Environment@100] -
Server environment:user.dir=/opt/zookeeper-3.4.9
2018-09-03 10:32:39,570 [myid:] - INFO [main:ZooKeeperServer@815]
-
tickTime set to 2000
2018-09-03 10:32:39,571 [myid:] - INFO [main:ZooKeeperServer@824]
-
minSessionTimeout set to -1
2018-09-03 10:32:39,571 [myid:] - INFO [main:ZooKeeperServer@833]
-
maxSessionTimeout set to -1
2018-09-03 10:32:39,578 [myid:] - INFO
[main:NioserverCnxnFactory@89]
- binding to port 0.0.0.0/0.0.0.0:2181
编辑: 从 kafka 容器启动日志:
Excluding KAFKA_HOME from broker config
[Configuring] 'advertised.host.name' in
'/opt/kafka/config/server.properties'
[Configuring] 'port' in '/opt/kafka/config/server.properties'
[Configuring] 'broker.id' in '/opt/kafka/config/server.properties'
Excluding KAFKA_VERSION from broker config
[Configuring] 'zookeeper.connect' in
'/opt/kafka/config/server.properties'
[Configuring] 'log.dirs' in '/opt/kafka/config/server.properties'
[Configuring] 'zookeeper.connect.timeout.ms' in
'/opt/kafka/config/server.properties'
[2018-09-05 10:47:22,036] INFO Registered
kafka:type=kafka.Log4jController MBean
(kafka.utils.Log4jControllerRegistration$)
[2018-09-05 10:47:23,145] INFO starting (kafka.server.KafkaServer)
[2018-09-05 10:47:23,148] INFO Connecting to zookeeper on
zookeeper:2181 (kafka.server.KafkaServer)
[2018-09-05 10:47:23,288] INFO [ZooKeeperClient] Initializing a new
session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2018-09-05 10:47:23,300] INFO Client
environment:zookeeper.version=3.4.13-
2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39
GMT (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client environment:host.name=kafka
-757dc6c47b-zpzfz (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client
environment:java.version=1.8.0_171 (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client
environment:java.vendor=Oracle Corporation
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client
environment:java.home=/usr/lib/jvm/java-1.8-openjdk/jre
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client
environment:java.class.path=/opt/kafka/bin/../libs/activation-
1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-
b42.jar:/opt/kafka/bin/../libs/argparse4j-
0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-
0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-
3.5.jar:/opt/kafka/bin/../libs/connect-api-
2.0.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-
2.0.0.jar:/opt/kafka/bin/../libs/connect-file-
2.0.0.jar:/opt/kafka/bin/../libs/connect-json-
2.0.0.jar:/opt/kafka/bin/../libs/connect-runtime-
2.0.0.jar:/opt/kafka/bin/../libs/connect-transforms-
2.0.0.jar:/opt/kafka/bin/../libs/guava-
20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-
b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-
b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-
b42.jar:/opt/kafka/bin/../libs/jackson-annotations-
2.9.6.jar:/opt/kafka/bin/../libs/jackson-core-
2.9.6.jar:/opt/kafka/bin/../libs/jackson-databind-
2.9.6.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-
2.9.6.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-
CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-
1.2.jar:/opt/kafka/bin/../libs/javax.inject-
1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-
b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-
3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-
2.1.jar:/opt/kafka/bin/../libs/jaxb-api-
2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-
2.27.jar:/opt/kafka/bin/../libs/jersey-common-
2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet
-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-
2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-
2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-
2.27.jar:/opt/kafka/bin/../libs/jersey-server
-2.27.jar:/opt/kafka/bin/../libs/jetty-client
-9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-continuation-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-http-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-io-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-security-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-server-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlet-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlets-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-util-
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jopt-simple-
5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.11-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-
2.0.0.jar:/opt/kafka/bin/../libs/kafka-tools-
2.0.0.jar:/opt/kafka/bin/../libs/kafka_2.11-2.0.0-
sources.jar:/opt/kafka/bin/../libs/kafka_2.11-2
.0.0.jar:/opt/kafka/bin/../libs/log4j
1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-
1.4.1.jar:/opt/kafka/bin/../libs/maven-artifact-
3.5.3.jar:/opt/kafka/bin/../libs/metrics-core-
2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-
1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-
3.1.0.jar:/opt/kafka/bin/../libs/reflections-
0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-
5.7.3.jar:/opt/kafka/bin/../libs/scala-library-
2.11.12.jar:/opt/kafka/bin/../libs/scala-logging_2.11-
3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-
2.11.12.jar:/opt/kafka/bin/../libs/slf4j-api-
1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-
1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-
1.1.7.1.jar:/opt/kafka/bin/../libs/validation-api-
1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-
0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar
(org.apache.zookeeper.ZooKeeper)
kubectl get svc -o wide
的输出如下:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50m <none>
zookeeper ClusterIP 10.98.180.138 <none> 2181/TCP 48m io.kompose.service=zookeeper
kubectl get pods -o wide
的输出:
NAME READY STATUS RESTARTS AGE IP NODE
kafka-757dc6c47b-zpzfz 0/1 CrashLoopBackOff 15 1h 10.32.0.17 administrator-thinkpad-l480
zookeeper-7594d99b-784n9 1/1 Running 0 1h 10.32.0.19 administrator-thinkpad-l480
编辑:
kubectl describe pod kafka-757dc6c47b-zpzfz
的输出:
Name: kafka-757dc6c47b-zpzfz
Namespace: default
Node: administrator-thinkpad-l480/10.11.17.86
Start Time: Wed, 05 Sep 2018 16:17:06 +0530
Labels: io.kompose.service=kafka
pod-template-hash=3138727036
Annotations: <none>
Status: Running
IP: 10.32.0.17
Controlled By: ReplicaSet/kafka-757dc6c47b
Containers:
kafka:
Container ID:docker://2bdc06d876ae23437c61f4e95539a67903cdb61e88fd9c68377b47c7705293a3
Image: wurstmeister/kafka
Image ID: docker-pullable://wurstmeister/kafka@sha256:2e3ff64e70ea983530f590282f36991c0a1b105350510f53cc3d1a0279b83c28
Port: 9092/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 05 Sep 2018 17:29:06 +0530
Finished: Wed, 05 Sep 2018 17:29:14 +0530
Ready: False
Restart Count: 18
Environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS: 160000
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nhb9z (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-nhb9z:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nhb9z
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m (x293 over 1h) kubelet, administrator-thinkpad-l480 Back-off restarting failed container
【问题讨论】:
zookeeper 日志报告什么? 用 zookeeper 日志更新了问题 Zookeeper 很好...您尝试过使用 Confluent Helm Charts 吗? @cricket_007 不,我没有使用 helm charts..因为我从来没有听说过它(我也是 kubernetes 的新手)。但是一个简短的概述暗示它有它自己的卡夫卡图像..我无论如何都不想使用它。还有其他方法吗? 嗯,Confluent 是 Kafka 的企业公司和主要开发人员,所以我不确定您对他们的 Kafka 图像有什么看法 【参考方案1】:在我的例子中,我试图在启动 Zookeeper 之前启动 Kafka 服务器。
所以正确的顺序是启动 Zookeeper,然后启动 Kafka 服务器。
【讨论】:
【参考方案2】:这可能是什么原因?和解决方案?
原因隐藏在以下日志行后面:
INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)
Kafka 正在搜索 zookeeper
,而它应该搜索您的服务名称 zookeeper-svc
。
解决方案很简单:将您的 Zookeeper 服务从 zookeeper-svc
重命名为 zookeeper
在您的 zookeeper-service.yaml
中,如下所示:
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.1.0 (36652f6)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
io.kompose.service: zookeeper
status:
loadBalancer:
或者,更简洁,保持服务名称不变并重新配置 Kafka 以搜索 zookeeper-svc
而不是 zookeeper
。
注意:由于您的最小示例足以重现该问题(谢谢,很高兴看到最小示例有效!),尽管 pod 已启动并运行(未处于错误状态)它仍然显示:java.io.IOException: Can't resolve address: kafka:9092
与另一个问题(服务不涵盖 Kafka 部署)相关,并且超出了此问题的范围,只是为了让您知道。
【讨论】:
我在将zookeeper-svc
更改为zookeeper
后尝试过,但仍然遇到同样的错误。
奇怪。我已经运行了您的确切清单,仅在 docker edge 上重命名了服务(来自答案),并且它正确启动并且已启动并运行 2h+(注释中的错误从那里的日志中列出)。您确定您重新创建了正确的服务清单吗?你能确保你有正确的清单和正确的创建顺序吗?您能否获取帮助容器来验证 zookeeper:2181 是否可访问、已启动并正在运行?
实际上,要正确:1)我已经运行了您的确切清单以重现错误。 2)我删除了zookeeper-service,然后3)编辑它以在两个地方用zookeeper替换zookeeper-svc,4)重新创建服务和5)删除并创建kafka-deployment。在这些确切的步骤之后,一切仍然正常运行......
我遵循了相同的步骤,但仍然面临同样的问题。而且,让辅助容器验证 zookeeper:2181 是否可访问是什么意思?
在同一命名空间中启动任何其他容器(busybox 或其他),然后执行它以验证您可以从那里访问 zookeeper:2181。连接问题表明您无法访问所需的主机,这最初是与服务相关的,只需确保您已覆盖该问题并且在集群中您对 zookeeper 主机很好。【参考方案3】:
我正在使用带有此警告的 microk8s
警告:IPtables FORWARD 策略是 DROP
考虑启用流量转发:
sudo iptables -P FORWARD ACCEPT
修复它对我有用
【讨论】:
【参考方案4】:我在使用 zookeeper docker 镜像运行 atlas 时遇到了同样的问题。 增加超时对我有帮助。
atlas.kafka.zookeeper.session.timeout.ms=40000
atlas.kafka.zookeeper.connection.timeout.ms=20000
【讨论】:
以上是关于Kafka 无法与 Zookeeper 连接,出现错误“处于状态时等待连接超时:CONNECTING”的主要内容,如果未能解决你的问题,请参考以下文章
避免Kafka客户端无法连接Docker上运行的Kafka,又名:Docker如何添加hosts映射
启动zookeeper和kafka时 kafka无法启动或者闪退