【Ambari-部署】记一次HDFS HA启用失败恢复过程

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了【Ambari-部署】记一次HDFS HA启用失败恢复过程相关的知识,希望对你有一定的参考价值。

参考技术A

人至贱则无敌,要学会制造问题然后解决问题。

启用HDFS HA过程中,被人为的中断或意外的中断导致Secondary NameNone还没有被删除。就像下图的小红框。还没离婚就把小三带回家了,最后全都一夜暴毙...没有人知道那晚发生了什么

通过走访得知
hdc-data1: namenode,datanode,journalnode,
hdc-data2: namenode,datanode,journalnode,secondarynamenode
hdc-data3: namenode,datanode,journalnode

角色查看

清理ZKFC

清理JOURNALNODE

清理 额外的 NAMENODE

ambari HDFS-HA 回滚

curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://zwshen86:8080/api/v1/clusters/bigdata/services/STORM

 
命令中的 zwshen86 为 Ambari Server 的机器名(端口默认为 8080),bigdata 为 cluster 名字,STORM 为 Service 的名字。

查看hdfs的信息

curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://node01:8080/api/v1/clusters/ocdp/services/HDFS

 
node01,datanode,journalnode,
node02,datanode,journalnode,SECONDARY_NAMENODE
node03,datanode,journalnode
node04,namenode,
node05,namenode,


回滚操作:

sudo su -l hdfs -c ‘hdfs dfsadmin -safemode enter‘
sudo su -l hdfs -c ‘hdfs dfsadmin -saveNamespace‘  

 

查看各主机的组件角色

curl -u admin:admin -i http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=NAMENODE

curl -u admin:admin -i http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=SECONDARY_NAMENODE


curl -u admin:admin -i http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=JOURNALNODE

curl -u admin:admin -i  http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=ZKFC

 



删除zkfc

curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE  http://node01:8080/api/v1/clusters/ocdp/hosts/node05/host_components/ZKFC
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE  http://node01:8080/api/v1/clusters/ocdp/hosts/node04/host_components/ZKFC

 



启用SECONDARY_NAMENODE

curl -u admin:admin -H "X-Requested-By: ambari" -X POST -d ‘{"host_components" : [{"HostRoles":{"component_name":"SECONDARY_NAMENODE"}}] }‘ http://node01:8080/api/v1/clusters/ocdp/hosts?Hosts/host_name=node02


curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Enable Secondary NameNode"},"Body":{"HostRoles":{"state":"INSTALLED"}}}‘ http://node01:8080/api/v1/clusters/ocdp/hosts/node02/host_components/SECONDARY_NAMENODE

curl -u admin:admin -H "X-Requested-By: ambari" -X GET "http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=SECONDARY_NAMENODE&fields=HostRoles/state"

 


删除journalnode

curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=JOURNALNODE
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://node01:8080/api/v1/clusters/ocdp/hosts/node01/host_components/JOURNALNODE
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://node01:8080/api/v1/clusters/ocdp/hosts/node02/host_components/JOURNALNODE
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://node01:8080/api/v1/clusters/ocdp/hosts/node03/host_components/JOURNALNODE

 


删除额外的namenode:

curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://node01:8080/api/v1/clusters/ocdp/host_components?HostRoles/component_name=NAMENODE
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://node01:8080/api/v1/clusters/ocdp/hosts/node05/host_components/NAMENODE

 重启服务




 

以上是关于【Ambari-部署】记一次HDFS HA启用失败恢复过程的主要内容,如果未能解决你的问题,请参考以下文章

ambari HDFS-HA 回滚

如何在 hortonworks 沙箱上的 Ambari 中启用 HDFS 文件视图?

HDFS 磁盘已满

记一次失败的K8S安装部署

Ambari2.6安装部署Hadoop2.7

HDFS运行Balancer失败及问题解决办法