添加 3 个节点时,Storm Supervisor 未启动 [关闭]

Posted

技术标签:

【中文标题】添加 3 个节点时,Storm Supervisor 未启动 [关闭]【英文标题】:Storm supervisor not starting when adding 3 nodes [closed] 【发布时间】:2014-07-16 11:34:29 【问题描述】:

我正在尝试在多节点 Storm 集群上测试 Storm+Kafka+Trident 作业。

当我在机器 1 上运行作业时,作业运行并处理记录 当我在添加第二个工作人员后运行我的工作时,该工作也运行没有任何问题。

当我将第三个工作人员添加到集群时,问题就开始了。我在工作日志中得到以下信息

2014-07-16 16:47:56 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6701... [29]
2014-07-16 16:47:56 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6703... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6702... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6700... [29]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6701... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-cassandra1/10.201.221.139:6703
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-cassandra1/10.201.221.139:6703..., timeout: 600000ms, pendings: 0
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-cassandra1/10.201.221.139:6702
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-cassandra1/10.201.221.139:6702..., timeout: 600000ms, pendings: 0
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6700... [30]
2014-07-16 16:48:31 s.k.KafkaUtils [INFO] Metrics Tick: Not enough data to calculate spout lag.
2014-07-16 16:48:34 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-172.144.96.66.static.eigbox.net/66.96.144.172:6701... [6]
2014-07-16 16:48:34 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-172.144.96.66.static.eigbox.net/66.96.144.172:6703... [6]

在主管日志中,我收到以下消息

2014-07-16 16:47:26 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:27 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:27 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:28 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:28 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:29 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:29 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:30 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started

作业根本没有运行。我的storm.yaml 配置是这样的

storm.zookeeper.servers:
- "10.201.32.79"
# 
nimbus.host: "10.201.32.79"
storm.local.dir: "/home/hadoop/stormtmp"
java.library.path: "/opt/java7/lib"
#supervisor.slots.ports:
#    - 6700
#    - 6701
#    - 6702
#    - 6703
worker.childopts: "-Xmx2048m -XX:NewSize=1000m -XX:MaxNewSize=1000m"
nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
ui.port: 8084
ui.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"

【问题讨论】:

有人可以帮忙吗? 【参考方案1】:

基本上是说主管无法启动工人.. 尝试在主管日志中查看类似 b.s.d.supervisor [INFO] Launching worker with command: java -server ..... 现在复制这个命令并尝试在你的主管上运行它,看看你是否遇到任何错误,如果是,你可能需要相应地配置你的storm.yaml

【讨论】:

我能够使用 Storm 0.9.2 版解决它。 Storm 0.9.1 有一个已知的错误,它会导致像这样的故障 JIRA-187 已在 Storm 0.9.2 中解决。此外,我将 netty min wait ms 增加到 4000ms, max wait ms 增加到 10000ms。这似乎成功了。还是谢谢 storm.messaging.netty.max_retries=100storm.messaging.netty.max_wait_ms=1200000 这解决了我的问题。 Netty 对超时非常敏感,如果处理不当,会导致 worker 崩溃,supervisor 会重启。

以上是关于添加 3 个节点时,Storm Supervisor 未启动 [关闭]的主要内容,如果未能解决你的问题,请参考以下文章

3.storm-starter打包在storm集群上运行

storm搭建

storm 入门原理介绍

Storm/Kafka - 无法获得 kafka 的偏移滞后

(转发)storm 入门原理介绍

redis-cluster集群搭建