CDH6 Failover Controller 无法启动,zkfc 执行报错

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CDH6 Failover Controller 无法启动,zkfc 执行报错相关的知识,希望对你有一定的参考价值。

参考技术A 在开启了 Kerberos 验证的集群中,启用了 HA 模式的 HDFS 报错未正常运行,且 Failover Controller 报红未正常启动,Cloudera 报告 Failover Controller 的进程状态为退出

在 Cloudera 的 Failover Controller 错误 ①:

在对应的宿主机上执行 hdfs zkfc -formatZK 后报错误 ②:

Failover Cluster

Install Windows Server Failover Clustering (WSFC)

Install Windows Server HA (High Availability), Windows Server Failover Clustering (WSFC).

It needs more than 2 Nodes to configure WSFC and also they are in Active Directory Domain.

Furthermore, they need more than 2 shared storages, for Data and for Quorum.

Considering the above, this example is based on the environment like follows.

                                   |

[1]

Configure AD DS, refer tp here. Also let WSFC Nodes join in AD DS Domain.

[2]

Install iSCSI Target and configure 2 storage, for Data and for Quorum, refer to here.
Also Configure WSFC Nodes as iSCSI Initiator of the Target, refer to here.

[3]Configure WSFC Nodes, set the same setting on all Nodes like follows.
Run Server Manager and Click [Add roles and features].
技术分享图片
[4]Click [Next] button.
技术分享图片
[5]Select [Role-based or feature-based installation].
技术分享图片
[6]Select a Host which you'd like to add services.
技术分享图片
[7]Go next without checking boxes.
技术分享图片
[8]Check boxes [Failover Clustering] and [Multipath I/O] and go next.
技术分享图片
[9]Click [Install] button.
技术分享图片
[10]After finishing Installation, click [Close] button.
技术分享图片

Configure Windows Server Failover Clustering (WSFC)

Install Windows Server HA (High Availability), Windows Server Failover Clustering (WSFC).

It needs more than 2 Nodes to configure WSFC and also they are in Active Directory Domain.

Furthermore, they need more than 2 shared storages, for Data and for Quorum.

Considering the above, this example is based on the environment like follows.

                                 |
+----------------------+           |           +----------------------+
|  [      AD DS     ]  |10.0.0.100 | 10.0.0.110|  [  iSCSI Target  ]  |
|     fd3s.srv.world   +-----------+-----------+   target.srv.world   |
|                      |           |           |                      |
+----------------------+           |           +----------------------+
                                  |
+----------------------+           |           +----------------------+
|  [ Cluster Node#1 ]  |10.0.0.101 | 10.0.0.102|  [ Cluster Node#2 ]  |
|    rx-8.srv.world    +-----------+-----------+    rx-9.srv.world    |
|                      |                       |                      |
+----------------------+                       +----------------------+

[1]Configure all WSFC Nodes as iSCSI Initiator of the iSCSI Target.
For Initiator setting, check a box [Enable multi-path] like follows.
技术分享图片
[2]On all WSFC Nodes, Configure Multi-path I/O, Open [Tools] - [MPIO].
技术分享图片
[3]Move to [Discover Multi-Paths] tab and check a box [Add support for iSCSI devices] and click [Add] button.
技术分享图片
[4]Restart Computer, Click [Yes].
技术分享图片
[5]After restarting, make sure iSCSI support has been added.
技术分享图片
[6]Configure WSFC. Work on a Node in WSFC Cluster from this section.
Open [Tools] - [Failover Cluster Manager].
技术分享图片
[7]Right-Click [Failover Cluster Manager] on the left pane and select [Create Cluster].
技术分享图片
[8]Click [Next] Button.
技术分享图片
[9]Input Hostname or IP address in [Enter server name] field and click [Add] button. After adding all WSFC nodes, go next.
技术分享图片
[10]It had better run testing for initial configuration. Click [Next] Button.
技术分享图片
[11]Click [Next] Button.
技术分享图片
[12]Click [Next] Button.
技术分享图片
[13]Click [Next] Button.
技术分享图片
[14]After finishing testing, results are shown. If it's OK, Click [Finish] Button.
技术分享图片
[15]Input any Cluster name in [Cluster Name] field, for address field, Input Cluster's IP address.
技术分享图片
[16]Click [Next] Button.
技术分享图片
[17]Click [Finish] Button.
技术分享图片
[18]

It's possbile to see status of Cluster on [Failover Cluster Manager] tool.
Thas's all done.

Make sure the shared storage is mounted on a Node. Also Make sure the Failover action to reboot or shutdown an active Node which mounts the shared storage.

If you'd like to shutdown all Nodes for some reason like machine maintenaces, Shutdown and Start them like the following sequence.

(1) Shutdown passive Nodes.
(2) Shutdown an Active Node.
(3) Start an Active Node. (That is just the Node which was an Active at the latest)
(4) Start Passive Nodes.
技术分享图片

Add WSFC Nodes

Add Nodes in existing WSFC Cluster.

This example is based on the environment like follows.

Add [Node#3] to the existing Cluster.

                                   |
+----------------------+           |           +----------------------+
|  [      AD DS     ]  |10.0.0.100 | 10.0.0.110|  [  iSCSI Target  ]  |
|     fd3s.srv.world   +-----------+-----------+   target.srv.world   |
|                      |           |           |                      |
+----------------------+           |           +----------------------+
                                   |
+----------------------+           |           +----------------------+
|  [ Cluster Node#1 ]  |10.0.0.101 | 10.0.0.102|  [ Cluster Node#2 ]  |
|    rx-8.srv.world    +-----------+-----------+    rx-9.srv.world    |
|                      |           |           |                      |
+----------------------+           |           +----------------------+
                                   |
+----------------------+           |
|  [ Cluster Node#3 ]  |10.0.0.103 |
|    rx-10.srv.world   +-----------+
|                      |
+----------------------+

[1]

On a new Node, Configure iSCSI Initiator of the iSCSI Target.

[2]

On a new Node, Install [Failover Clustering] and [Multipath I/O] feature.

[3]Move to a existing Node, Open [Tools] - [Failover Cluster Manager].
技术分享图片
[4]Right Click the Cluster name ans select [Add Node].
技术分享图片
[5]Click [Next] button.
技术分享图片
[6]Input Hostname or IP address in [Enter server name] field and click [Add] button. After adding all WSFC nodes, go next.
技术分享图片
[7]If you run testing, go next with default.
技术分享图片
[8]Click [Next] button.
技术分享图片
[9]Click [Next] button.
技术分享图片
[10]Click [Next] button.
技术分享图片
[11]After finishing testing, results are shown. If it's OK, Click [Finish] Button.
技术分享图片
[12]Click [Next] button.
技术分享图片
[13]Click [Finish] Button.
技术分享图片
[14]That's OK all. Make sure a new Node has beed added on Cluster Manager.
技术分享图片

以上是关于CDH6 Failover Controller 无法启动,zkfc 执行报错的主要内容,如果未能解决你的问题,请参考以下文章

大数据平台CDH6.1.0 安装配置

CDH5到CDH6都更新了些什么

CDH5到CDH6都更新了些什么

关于CDH6的一些介绍

大数据平台CDH6.3.2部署

大数据平台CDH6.3.2部署