013 Hadoop 高可用 - Namenode 自动故障切换

Posted 2023-03-09

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了013 Hadoop 高可用 - Namenode 自动故障切换相关的知识，希望对你有一定的参考价值。

参考技术A

013 Hadoop High Availability – Namenode Automatic Failover

Before Hadoop 2.0 that is Hadoop 1.0 faced a single point of failure (SPOF) in NameNode. This means if the NameNode failed the entire system would not function and manual intervention was necessary to bring the Hadoop cluster up with the help of secondary NameNode which resulted in overall downtime. With Hadoop 2.0 we had single standby node to facilitate automatic failover and with Hadoop 3.0 which supports multiple standby nodes, the system has become even more highly available. In this tutorial, we will talk about Hadoop high availability. We will look at various types of failover and discuss in detail how the components of Zookeeper provide for automatic failover.

在 Hadoop 2.0 之前，Hadoop 1.0 在 NameNode 中面临单点故障 (SPOF).这意味着，如果 NameNode 出现故障，整个系统将无法运行，需要手动干预 Hadoop 集群 在二级 NameNode 的帮助下，导致了整体停机.借助 Hadoop 2.0，我们有了单个备用节点，以方便自动故障切换; 借助支持多个备用节点的 Hadoop 3.0，系统变得更加可用.在本教程中，我们将讨论 Hadoop 高可用性.我们将研究各种类型的故障切换，并详细讨论 动物园管理员组件 提供自动故障切换.

Hadoop High Availability – Automatic Failover

With Hadoop 2.0, we have support for multiple NameNodes and with Hadoop 3.0 we have standby nodes. This overcomes the SPOF (Single Point Of Failure) issue using an extra NameNode (Passive Standby NameNode) for automatic failover. This is the high availability in Hadoop.

借助 Hadoop 2.0，我们支持多个名称节点，借助 Hadoop 3.0，我们拥有备用节点.这克服了使用额外的 NameNode (被动备用 NameNode) 进行自动故障切换的 SPOF (单点故障) 问题.这是 Hadoop 中的高可用性.

Failover is a process in which the system transfers control to a secondary system in an event of failure.

故障切换是指在发生故障时，系统将控制转移到辅助系统的过程.

There are two types of failover:

故障切换有两种类型:

Automatic failover in Hadoop adds up below components to a Hadoop HDFS deployment:

Hadoop 中的自动故障切换将以下组件添加到 Hadoop HDFS 部署中:

Zookeeper quorum is a centralized service for maintaining small amounts of data for coordination, configuration, and naming. It provides group services and synchronization. It keeps the client informed about changes in data and track client failures. Implementation of automatic HDFS failover relies on Zookeeper for:

Zookeeper 是用于维护少量数据以进行协调、配置和命名的集中服务.它提供组服务和同步.它让客户了解数据的变化，并跟踪客户故障.执行自动 HDFS 失败转移功能依赖于管理员的:

ZKFC is a client of Zookeeper that monitors and manages the namenode status. So, each of the machines which run namenode service also runs a ZKFC.

ZKFC 是一个客户管理员监督和管理、复制指令的情况.因此，运行 namenode 服务的每台机器也都运行 ZKFC.

ZKFC handles:

ZKFC 手柄:

**Health Monitoring – **ZKFC periodically pings the active NameNode with Health check command and if the NameNode doesn’t respond it in time it will mark it as unhealthy. This may happen because the NameNode might be crashed or frozen.

健康监测- ZKFC 定期用健康检查命令 ping 活跃的 NameNode，如果 NameNode 没有及时响应，它会将其标记为不健康.这可能是因为 NameNode 可能会崩溃或冻结.

Zookeeper Session Management – If the local NameNode is healthy it keeps a session open in the Zookeeper. If this local NameNode is active, it holds a special lock znode . If the session expires then this lock will delete automatically.

会话管理- 如果本地名称节点是健康的，它会在 Zookeeper 中保持会话打开.如果这个本地名称节点是活动的，它会持有一个特殊的锁 Znode .如果会话过期，则此锁将自动删除.

Zookeeper-based Election – If there is a situation where local NameNode is healthy and ZKFC gets to know that none of the other nodes currently holds the znode lock, the ZKFC itself will try to acquire that lock. If it succeeds in this task then it has won the election and becomes responsible for running a failover. The failover is similar to manual failover; first, the previously active node is fenced if required to do so and then the local node becomes the active node.

动物园管理员的选举 如果本地 NameNode 健康，ZKFC 知道目前没有其他节点持有 znode 锁，ZKFC 本身将尝试获得该锁.如果它在这个任务中成功，那么它就赢得了选举，并负责运行故障切换.故障切换类似于手动故障切换; 首先，如果需要，以前的活动节点会被隔离，然后本地节点会成为活动节点.

Hence, in this Hadoop High Availability article, we saw Zookeeper daemons configure to run on three or five nodes. Since Zookeeper does not have high resource requirement it could be run on the same node as the HDFS Namenode or standby Namenode. Many operators choose to deploy third Zookeeper process on the same node as the YARN Resource Manager. So, it is advised to keep Zookeeper data separate from HDFS metadata i.e. on different disks as it will give the best performance and isolation.

因此，在这个 Hadoop 高可用性文章中，我们看到 Zookeeper 守护进程被配置为在三到五个节点上运行.因为管理员没有所需的它可以运行在相同的节点为 HDFS 、复制指令或待机、复制指令.许多操作员选择在与 YARN 资源管理器相同的节点上部署第三个 Zookeeper 进程.因此，建议将 Zookeeper 数据与 HDFS 元数据分开，即在不同的磁盘上，因为它将提供最佳的性能和隔离.

You must check the latest Hadoop Interview Questions for your upcoming interview.

你必须检查一下 最新 Hadoop 面试题: 为你即将到来的面试.

Still, if any doubt regarding Hadoop High Availability, ask in the comments. We will definitely get back to you.

尽管如此，如果对 Hadoop 的高可用性有任何疑问，请在评论中提问.我们一定会给你回复的

https://data-flair.training/blogs/hadoop-high-availability

以上是关于013 Hadoop 高可用 - Namenode 自动故障切换的主要内容，如果未能解决你的问题，请参考以下文章