ZooKeeper Dynamic Reconfiguration(ZooKeeper 动态配置重构)

Posted 徐同学呀

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ZooKeeper Dynamic Reconfiguration(ZooKeeper 动态配置重构)相关的知识,希望对你有一定的参考价值。

Overview 概览

Prior to the 3.5.0 release, the membership and all other configuration parameters of Zookeeper were static - loaded during boot and immutable at runtime. Operators resorted to ‘‘rolling restarts’’ - a manually intensive and error-prone method of changing the configuration that has caused data loss and inconsistency in production.

在3.5.0版本之前,Zookeeper 的成员关系和所有其他配置参数在引导期间是静态加载的,在运行时是不可变的。操作员采用“滚动重启”——这是一种手动步骤密集且容易出错的方法,可能会导致生产中数据丢失和不一致的配置。

Starting with 3.5.0, “rolling restarts” are no longer needed! ZooKeeper comes with full support for automated configuration changes: the set of Zookeeper servers, their roles (participant / observer), all ports, and even the quorum system can be changed dynamically, without service interruption and while maintaining data consistency. Reconfigurations are performed immediately, just like other operations in ZooKeeper. Multiple changes can be done using a single reconfiguration command. The dynamic reconfiguration functionality does not limit operation concurrency, does not require client operations to be stopped during reconfigurations, has a very simple interface for administrators and no added complexity to other client operations.

从3.5.0开始,“滚动重启”就不再需要了!ZooKeeper 提供了对自动化配置更改的全面支持: ZooKeeper 服务器集、它们的角色(参与者/观察者)、所有端口,甚至 quorum 系统都可以动态更改,不会出现服务中断,同时保持数据一致性。

就像 ZooKeeper 中的其他操作一样,重新配置会立即执行。可以使用单个重新配置命令进行多个更改。动态重新配置功能不限制操作并发性,不需要在重新配置期间停止客户端操作,具有非常简单的管理员接口,对其他客户端操作没有增加复杂性。

New client-side features allow clients to find out about configuration changes and to update the connection string (list of servers and their client ports) stored in their ZooKeeper handle. A probabilistic algorithm is used to rebalance clients across the new configuration servers while keeping the extent of client migrations proportional to the change in ensemble membership.

新的客户端特性允许客户端发现配置更改,并更新存储在 ZooKeeper 句柄中的连接字符串(服务器及其客户端端口列表)。使用概率算法在新的配置服务器之间重新平衡客户端,同时保持客户端迁移的范围与集成成员的变化成正比。

This document provides the administrator manual for reconfiguration. For a detailed description of the reconfiguration algorithms, performance measurements, and more, please see our paper:

本文档提供了重新配置的管理员手册。关于重构算法的详细描述,性能测量等,请参阅我们的论文:

Shraer, A., Reed, B., Malkhi, D., Junqueira, F. Dynamic Reconfiguration of Primary/Backup Clusters. In *USENIX Annual Technical Conference (ATC)*(2012), 425-437* : Links: paper (pdf), slides (pdf), video, hadoop summit slides

Note: Starting with 3.5.3, the dynamic reconfiguration feature is disabled by default, and has to be explicitly turned on via reconfigEnabled configuration option.
注意: 从3.5.3开始,动态配置特性默认是禁用的,必须通过 reconfigEnabled配置选项显式打开。

Changes to Configuration Format 更改配置格式

Specifying the client port 指定客户端端口

A client port of a server is the port on which the server accepts client connection requests. Starting with 3.5.0 the clientPort and clientPortAddress configuration parameters should no longer be used. Instead, this information is now part of the server keyword specification, which becomes as follows:

服务器的客户端端口是服务器接受客户端连接请求的端口。从3.5.0开始,应该不再使用 clientPort 和 clientPortAddress 配置参数。相反,这些信息现在是服务器关键字规范的一部分,如下所示:

server.<positive id> = <address1>:<port1>:<port2>[:role];[<client port address>:]<client port>

The client port specification is to the right of the semicolon. The client port address is optional, and if not specified it defaults to “0.0.0.0”. As usual, role is also optional, it can be participant or observer (participant by default).

客户端端口规范位于分号的右侧。客户端端口地址是可选的,如果没有指定,则默认为“0.0.0.0”。通常,角色也是可选的,它可以是参与者或观察者(缺省情况下是参与者)。

Examples of legal server statements:

合法的服务器语句示例:

server.5 = 125.23.63.23:1234:1235;1236
server.5 = 125.23.63.23:1234:1235:participant;1236
server.5 = 125.23.63.23:1234:1235:observer;1236
server.5 = 125.23.63.23:1234:1235;125.23.63.24:1236
server.5 = 125.23.63.23:1234:1235:participant;125.23.63.23:1236

Specifying multiple server addresses 指定多个服务器地址

Since ZooKeeper 3.6.0 it is possible to specify multiple addresses for each ZooKeeper server (see ZOOKEEPER-3188). This helps to increase availability and adds network level resiliency to ZooKeeper. When multiple physical network interfaces are used for the servers, ZooKeeper is able to bind on all interfaces and runtime switching to a working interface in case a network error. The different addresses can be specified in the config using a pipe (’|’) character.

由于 ZooKeeper 3.6.0,可以为每个 ZooKeeper 服务器指定多个地址(参见 https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3188)。这有助于提高可用性,并增加 ZooKeeper 的网络级弹性。当服务器使用多个物理网络接口时,ZooKeeper 能够绑定到所有接口,并在运行时切换到一个工作接口,以防出现网络错误。不同的地址可以使用管道(“ |”)字符在配置中指定。

Examples for a valid configurations using multiple addresses:

使用多个地址的有效配置示例:

server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889;2188
server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889|zoo2-net3:2890:3890;2188
server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889;zoo2-net1:2188
server.2=zoo2-net1:2888:3888:observer|zoo2-net2:2889:3889:observer;2188

The standaloneEnabled flag 独立模式启用的标志

Prior to 3.5.0, one could run ZooKeeper in Standalone mode or in a Distributed mode. These are separate implementation stacks, and switching between them during run time is not possible. By default (for backward compatibility) standaloneEnabled is set to true. The consequence of using this default is that if started with a single server the ensemble will not be allowed to grow, and if started with more than one server it will not be allowed to shrink to contain fewer than two participants.

在3.5.0之前,可以在独立模式或分布式模式下运行 ZooKeeper。这两种模式是互相独立实现的,在运行时不可能在它们之间进行切换。默认情况下(对于向下兼容来说),standaloneEnabled设置为 true 。使用这个默认值的结果是,如果从一个服务器开始,集群将不允许增长,如果从多个服务器开始,它将不允许缩小到少于两个参与者。

Setting the flag to false instructs the system to run the Distributed software stack even if there is only a single participant in the ensemble. To achieve this the (static) configuration file should contain:

将标志设置为 false 会指示系统以分布式方式运行,即使集合中只有一个参与者。为了实现这一点,(静态)配置文件应该包含:

standaloneEnabled=false

With this setting it is possible to start a ZooKeeper ensemble containing a single participant and to dynamically grow it by adding more servers. Similarly, it is possible to shrink an ensemble so that just a single participant remains, by removing servers.

有了这个设置,就可以启动一个包含单个参与者的 ZooKeeper 集群,并通过添加更多的服务器来动态地增长它。同样也可以通过移除服务器缩小它,甚至只保留一个参与者。

Since running the Distributed mode allows more flexibility, we recommend setting the flag to false. We expect that the legacy Standalone mode will be deprecated in the future.

由于运行分布式模式允许更多的灵活性,我们建议将标志设置为 false。我们希望,传统的独立模式将在未来被弃用。

The reconfigEnabled flag 重新配置启用的标志

Starting with 3.5.0 and prior to 3.5.3, there is no way to disable dynamic reconfiguration feature. We would like to offer the option of disabling reconfiguration feature because with reconfiguration enabled, we have a security concern that a malicious actor can make arbitrary changes to the configuration of a ZooKeeper ensemble, including adding a compromised server to the ensemble. We prefer to leave to the discretion of the user to decide whether to enable it or not and make sure that the appropriate security measure are in place. So in 3.5.3 the reconfigEnabled configuration option is introduced such that the reconfiguration feature can be completely disabled and any attempts to reconfigure a cluster through reconfig API with or without authentication will fail by default, unless reconfigEnabled is set to true.

从3.5.0开始,在3.5.3之前,没有办法禁用动态重新配置功能。我们希望提供禁用重新配置功能的选项,因为启用重新配置后,我们有一个安全问题,即恶意参与者可以对 ZooKeeper 集成的配置进行任意更改,包括在集成中添加一个被破坏的服务器。我们倾向于让用户自行决定是否启用它,并确保适当的安全措施到位。因此在3.5.3中引入了 reconfiggenabled配置选项,这样可以完全禁用重新配置功能,并且默认情况下,任何通过重新配置 API 配置集群的尝试都会失败,无论是否使用身份验证,除非 reconfiggenabled设置为 true。

To set the option to true, the configuration file (zoo.cfg) should contain:

要将该选项设置为 true,配置文件(zoo.cfg)应该包含:

reconfigEnabled=true

Dynamic configuration file 动态配置文件

Starting with 3.5.0 we’re distinguishing between dynamic configuration parameters, which can be changed during runtime, and static configuration parameters, which are read from a configuration file when a server boots and don’t change during its execution. For now, the following configuration keywords are considered part of the dynamic configuration: server, group and weight.

从3.5.0开始,我们区分了动态配置参数和静态配置参数,前者可以在运行时更改,后者可以在服务器启动时从配置文件中读取,而且在执行过程中不会更改。现在,以下配置关键字被认为是动态配置的一部分: server、 group 和 weight。

Dynamic configuration parameters are stored in a separate file on the server (which we call the dynamic configuration file). This file is linked from the static config file using the new dynamicConfigFile keyword.

动态配置参数存储在服务器上的一个单独的文件中(我们称之为动态配置文件)。该文件使用新的dynamicConfigFile 关键字从静态配置文件链接。

Example

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
dynamicConfigFile=/zookeeper/conf/zoo_replicated1.cfg.dynamic

zoo_replicated1.cfg.dynamic

server.1=125.23.63.23:2780:2783:participant;2791
server.2=125.23.63.24:2781:2784:participant;2792
server.3=125.23.63.25:2782:2785:participant;2793

When the ensemble configuration changes, the static configuration parameters remain the same. The dynamic parameters are pushed by ZooKeeper and overwrite the dynamic configuration files on all servers. Thus, the dynamic configuration files on the different servers are usually identical (they can only differ momentarily when a reconfiguration is in progress, or if a new configuration hasn’t propagated yet to some of the servers). Once created, the dynamic configuration file should not be manually altered. Changed are only made through the new reconfiguration commands outlined below. Note that changing the config of an offline cluster could result in an inconsistency with respect to configuration information stored in the ZooKeeper log (and the special configuration znode, populated from the log) and is therefore highly discouraged.

当集成配置发生变化时,静态配置参数保持不变。动态参数由 ZooKeeper 推送,并覆盖所有服务器上的动态配置文件。因此,不同服务器上的动态配置文件通常是相同的(只有在重新配置正在进行时,或者新配置尚未传播到某些服务器时,它们才会暂时不同)。创建后,不应手动更改动态配置文件,仅通过下面概述的新的重新配置命令进行更改。请注意,更改离线集群的配置可能会导致存储在 ZooKeeper 日志中的配置信息(以及从日志中填充的特殊配置 znode)不一致,因此非常不鼓励这样做。

Example 2

Users may prefer to initially specify a single configuration file. The following is thus also legal:

用户可能更喜欢在开始时指定一个配置文件,因此以下内容也是合法的:

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
clientPort=

The configuration files on each server will be automatically split into dynamic and static files, if they are not already in this format. So the configuration file above will be automatically transformed into the two files in Example 1. Note that the clientPort and clientPortAddress lines (if specified) will be automatically removed during this process, if they are redundant (as in the example above). The original static configuration file is backed up (in a .bak file).

如果它们还不是如示例1标准的格式,每个服务器上的配置文件将自动分为动态和静态文件。因此,上面的配置文件将自动转换为示例1中的两个文件。注意,clientPort 和 clientPortAddress 行(如果指定的话)将在这个过程中自动删除。原始静态配置文件以.Bak 文件形式备份。

Backward compatibility 向下兼容

We still support the old configuration format. For example, the following configuration file is acceptable (but not recommended):

我们仍然支持旧的配置格式。例如,下面的配置文件是可以接受的(但不推荐) :

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
clientPort=2791
server.1=125.23.63.23:2780:2783:participant
server.2=125.23.63.24:2781:2784:participant
server.3=125.23.63.25:2782:2785:participant

During boot, a dynamic configuration file is created and contains the dynamic part of the configuration as explained earlier. In this case, however, the line “clientPort=2791” will remain in the static configuration file of server 1 since it is not redundant – it was not specified as part of the “server.1=…” using the format explained in the section Changes to Configuration Format. If a reconfiguration is invoked that sets the client port of server 1, we remove “clientPort=2791” from the static configuration file (the dynamic file now contain this information as part of the specification of server 1).

在启动期间,将创建一个动态配置文件,其中包含前面解释的配置的动态部分。但是,在这种情况下,clientPort = 2791一行将保留在server1的静态配置文件中,因为它不是冗余的——它没有使用“更改配置格式”一节中解释的格式指定为server. 1 = ...的一部分。如果调用一个重新配置来设置server1的客户端,我们将从静态配置文件中删除clientPort = 2791(动态文件现在包含这个信息,作为server1规范的一部分)。

Upgrading to 3.5.0 升级到3.5.0

Upgrading a running ZooKeeper ensemble to 3.5.0 should be done only after upgrading your ensemble to the 3.4.6 release. Note that this is only necessary for rolling upgrades (if you’re fine with shutting down the system completely, you don’t have to go through 3.4.6). If you attempt a rolling upgrade without going through 3.4.6 (for example from 3.4.5), you may get the following error:

升级到3.5.0版本只能在升级到3.4.6版本之后。请注意,这只是滚动升级所必需的(如果您对完全关闭系统没有意见,则不必通过3.4.6)。如果您尝试滚动升级而没有通过3.4.6(例如从3.4.5) ,您可能会得到以下错误:

2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/127.0.0.1:2784:QuorumCnxManager$Listener@498] - Received connection request /127.0.0.1:60876
2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784:QuorumCnxManager@349] - Invalid server id: -65536

During a rolling upgrade, each server is taken down in turn and rebooted with the new 3.5.0 binaries. Before starting the server with 3.5.0 binaries, we highly recommend updating the configuration file so that all server statements “server.x=…” contain client ports (see the section Specifying the client port). As explained earlier you may leave the configuration in a single file, as well as leave the clientPort/clientPortAddress statements (although if you specify client ports in the new format, these statements are now redundant).

在滚动升级期间,每个服务器依次关闭并使用新的3.5.0二进制文件重新启动。在使用3.5.0二进制文件启动服务器之前,我们强烈建议更新配置文件,使所有服务器语句server.x = ...包含客户端端口(参见指定客户端端口一节)。如前所述,您可以将配置保留在单个文件中,也可以保留 clientPort/clientPortAddress 语句(尽管如果您以新的格式指定客户端,这些语句现在是多余的)。

Dynamic Reconfiguration of the ZooKeeper Ensemble

ZooKeeper集群的动态重构

The ZooKeeper Java and C API were extended with getConfig and reconfig commands that facilitate reconfiguration. Both commands have a synchronous (blocking) variant and an asynchronous one. We demonstrate these commands here using the Java CLI, but note that you can similarly use the C CLI or invoke the commands directly from a program just like any other ZooKeeper command.

通过 getConfigreconfig命令对 ZooKeeper的Java版 和 C版 API 进行了扩展,这些命令有助于重新配置。两个命令都有一个同步(阻塞)变体和一个异步命令。我们在这里使用 Java CLI 演示这些命令,但是请注意,您也可以类似地使用 C CLI或者像其他 ZooKeeper 命令一样直接从程序调用命令。

API

There are two sets of APIs for both Java and C client.

Java 和 C 客户端都有两套 API。

  • Reconfiguration API : Reconfiguration API is used to reconfigure the ZooKeeper cluster. Starting with 3.5.3, reconfiguration Java APIs are moved into ZooKeeperAdmin class from ZooKeeper class, and use of this API requires ACL setup and user authentication (see Security for more information.).

    Reconfiguration API: Reconfiguration API用于重新配置 ZooKeeper 集群。从3.5.3开始,重新配置的 Java API 被从 ZooKeeper 类移动到 ZooKeeperAdmin类中,使用这个 API 需要 ACL 设置和用户身份验证.

  • Get Configuration API : Get configuration APIs are used to retrieve ZooKeeper cluster configuration information stored in /zookeeper/config znode. Use of this API does not require specific setup or authentication, because /zookeeper/config is readable to any users.

    Get Configuration API: Get Configuration API 用于检索存储在/ZooKeeper/config znode中的 ZooKeeper 集群配置信息。使用这个 API 不需要特定的设置或身份验证,因为/zookeeper/config 对于任何用户都是可读的。

Security

Prior to 3.5.3, there is no enforced security mechanism over reconfig so any ZooKeeper clients that can connect to ZooKeeper server ensemble will have the ability to change the state of a ZooKeeper cluster via reconfig. It is thus possible for a malicious client to add compromised server to an ensemble, e.g., add a compromised server, or remove legitimate servers. Cases like these could be security vulnerabilities on a case by case basis.

在3.5.3之前,没有针对重新配置的强制安全机制,因此任何可以连接到 ZooKeeper 服务器集成的 ZooKeeper 客户端都可以通过重新配置来更改 ZooKeeper 集群的状态。因此,恶意客户端有可能向集群中添加受损服务器,例如,添加受损服务器,或删除合法服务器。像这样的情况可能是基于具体情况的安全漏洞。

To address this security concern, we introduced access control over reconfig starting from 3.5.3 such that only a specific set of users can use reconfig commands or APIs, and these users need be configured explicitly. In addition, the setup of ZooKeeper cluster must enable authentication so ZooKeeper clients can be authenticated.

为了解决这个安全问题,我们从3.5.3开始引入了对重新配置的访问控制,这样只有一组特定的用户可以使用重新配置命令或 API,并且这些用户需要显式配置。此外,ZooKeeper 集群的设置必须启用身份验证,以便 ZooKeeper 客户端能够进行身份验证。

We also provide an escape hatch for users who operate and interact with a ZooKeeper ensemble in a secured environment (i.e. behind company firewall). For those users who want to use reconfiguration feature but don’t want the overhead of configuring an explicit list of authorized user for reconfig access checks, they can set “skipACL” to “yes” which will skip ACL check and allow any user to reconfigure cluster.

我们还为那些在安全的环境(即公司防火墙之后)中操作和与 ZooKeeper 集成交互的用户提供了一个逃生窗口。对于那些想要使用重新配置特性但不想为重新配置访问检查配置明确的授权用户列表的用户,他们可以将skipACL设置为“ yes”,这将跳过 ACL 检查,允许任何用户重新配置集群。

Overall, ZooKeeper provides flexible configuration options for the reconfigure feature that allow a user to choose based on user’s security requirement. We leave to the discretion of the user to decide appropriate security measure are in place.

总的来说,ZooKeeper 为重新配置功能提供了灵活的配置选项,允许用户根据用户的安全需求进行选择。我们让用户自行决定适当的安全措施是否到位。

  • Access Control : The dynamic configuration is stored in a special znode ZooDefs.CONFIG_NODE = /zookeeper/config. This node by default is read only for all users, except super user and users that’s explicitly configured for write access. Clients that need to use reconfig commands or reconfig API should be configured as users that have write access to CONFIG_NODE. By default, only the super user has full control including write access to CONFIG_NODE. Additional users can be granted write access through superuser by setting an ACL that has write permission associated with specified user. A few examples of how to setup ACLs and use reconfiguration API with authentication can be found in ReconfigExceptionTest.java and TestReconfigServer.cc.

    访问控制: 动态配置存储在一个特殊的 znode ZooDefs.CONFIG \\_ node =/zookeeper/config 中。此节点默认情况下只对所有用户读取,但超级用户和显式配置为写访问的用户除外。需要使用重新配置命令或重新配置 API 的客户端应配置为具有对 config_node 的写访问权限的用户。默认情况下,只有超级用户拥有完全的控制权,包括对 config_node 的写访问权。通过设置具有与指定用户关联的写权限的 ACL,可以通过超级用户授予其他用户写访问权限。在 ReconfigExceptionTest.javatestreconfigserver.cc 中可以找到一些关于如何设置 ACL 和使用重新配置 API 进行身份验证的例子。

  • Authentication : Authentication of users is orthogonal to the access control and is delegated to existing authentication mechanism supported by ZooKeeper’s pluggable authentication schemes. See ZooKeeper and SASL for more details on this topic.

    身份验证: 用户的身份验证与访问控制是正交的,并被委托给由 ZooKeeper 的可插入身份验证方案支持的现有身份验证机制。请参阅 ZooKeeper 和 SASL 了解更多关于这个主题的详细信息。

  • Disable ACL check : ZooKeeper supports “skipACL” option such that ACL check will be completely skipped, if skipACL is set to “yes”. In such cases any unauthenticated users can use reconfig API.

    禁用 ACL 检查: ZooKeeper 支持skipACL选项,这样,如果 skipACL设置为“ yes”,ACL 检查将被完全跳过。在这种情况下,任何未经身份验证的用户都可以使用 reconfig API

Retrieving the current dynamic configuration

检索当前动态配置

The dynamic configuration is stored in a special znode ZooDefs.CONFIG_NODE = /zookeeper/config. The new config CLI command reads this znode (currently it is simply a wrapper to get /zookeeper/config). As with normal reads, to retrieve the latest committed value you should do a sync first.

动态配置存储在一个特殊的 znode ZooDefs.CONFIG \\_ node =/zookeeper/config 中。新的 config CLI命令读取这个 znode (目前它只是一个 get/zookeeper/config的包装)。与正常读取一样,要检索最新提交的值,您应该首先进行同步。

[zk: 127.0.0.1:2791(CONNECTED) 3] config
server.1=localhost:2780:2783:participant;localhost:2791
server.2=localhost:2781:2784:participant;localhost:2792
server.3=localhost:2782:2785:participant;localhost:2793

Notice the last line of the output. This is the configuration version. The version equals to the zxid of the reconfiguration command which created this configuration. The version of the first established configuration equals to the zxid of the NEWLEADER message sent by the first successfully established leader. When a configuration is written to a dynamic configuration file, the version automatically becomes part of the filename and the static configuration file is updated with the path to the new dynamic configuration file. Configuration files corresponding to earlier versions are retained for backup purposes.

注意输出的最后一行。这是配置版本。版本等于创建此配置的重新配置命令的 zxid。第一个建立的配置的版本等于第一个成功建立的领导者发送的 NEWLEADER 消息的 zxid。当配置写入动态配置文件时,版本自动成为文件名的一部分,静态配置文件随着新的动态配置文件的路径更新。保留与早期版本相对应的配置文件以备份。

During boot time the version (if it exists) is extracted from the filename. The version should never be altered manually by users or the system administrator. It is used by the system to know which configuration is most up-to-date. Manipulating it manually can result in data loss and inconsistency.

在启动期间,从文件名中提取版本(如果存在的话)。这个版本永远不应该被用户或者系统管理员修改。系统使用它来知道哪种配置是最新的。手动操作可能导致数据丢失和不一致。

Just like a get command, the config CLI command accepts the -w flag for setting a watch on the znode, and -s flag for displaying the Stats of the znode. It additionally accepts a new flag -c which outputs only the version and the client connection string corresponding to the current configuration. For example, for the configuration above we would get:

与 get 命令一样,config CLI 命令接受 -w标志用于在 znode 上设置watch ,-s 标志用于显示 znode 的 Stats。它还接受一个新的标志 -c-c只输出与当前配置对应的版本和客户端连接字符串。例如,对于上面的配置,我们会得到:

[zk: 127.0.0.1:2791(CONNECTED) 17] config -c
400000003 localhost:2791,localhost:2793,localhost:2792

Note that when using the API directly, this command is called getConfig.

注意,当直接使用 API 时,这个命令称为 getConfig。

As any read command it returns the configuration known to the follower to which your client is connected, which may be slightly out-of-date. One can use the sync command for stronger guarantees. For example using the Java API:

与任何 read 命令一样,它返回客户端所连接的跟随者已知的配置,这可能有点过时。你可以使用同步命令来获得更强的保证。例如使用 Java API:

zk.sync(ZooDefs.CONFIG_NODE, void_callback, context);
zk.getConfig(watcher, callback, context);

Note: in 3.5.0 it doesn’t really matter which path is passed to the sync() command as all the server’s state is brought up to date with the leader (so one could use a different path instead of ZooDefs.CONFIG_NODE). However, this may change in the future.

注意: 在3.5.0中,传递给 sync ()命令的路径并不重要,因为所有服务器的状态都是最新的(因此可以使用不同的路径而不是ZooDefs.CONFIG \\_ NODE)。然而,这在未来可能会改变。

Modifying the current dynamic configuration

修改当前的动态配置

Modifying the configuration is done through the reconfig command. There are two modes of reconfiguration: incremental and non-incremental (bulk). The non-incremental simply specifies the new dynamic configuration of the system. The incremental specifies changes to the current configuration. The reconfig command returns the new configuration.

修改配置是通过 reconfig 命令完成的。重新配置有两种模式: 增量和非增量(批量)。非增量式只是简单地指定系统的新的动态配置。增量指定对当前配置的更改。reconfig 命令返回新的配置。

A few examples are in: ReconfigTest.java, ReconfigRecoveryTest.java and TestReconfigServer.cc.

下面是一些例子: ReconfigTest.java,reconfgrecoverytest.java 和 testreconfigserver.cc。

General

Removing servers: Any server can be removed, including the leader (although removing the leader will result in a short unavailability, see Figures 6 and 8 in the paper). The server will not be shut-down automatically. Instead, it becomes a “non-voting follower”. This is somewhat similar to an observer in that its votes don’t count towards the Quorum of votes necessary to commit operations. However, unlike a non-voting follower, an observer doesn’t actually see any operation proposals and does not ACK them. Thus a non-voting follower has a more significant negative effect on system throughput compared to an observer. Non-voting follower mode should only be used as a temporary mode, before shutting the server down, or adding it as a follower or as an observer to the ensemble. We do not shut the server down automatically for two main reasons. The first reason is that we do not want all the clients connected to this server to be immediately disconnected, causing a flood of connection requests to other servers. Instead, it is better if each client decides when to migrate independently. The second reason is that removing a server may sometimes (rarely) be necessary in order to change it from “observer” to “participant” (this is explained in the section Additional comments).

删除服务器: 可以删除任何服务器,包括领导者(尽管删除领导者会导致短暂的不可用)。服务器不会自动关闭,相反,它变成了一个“无投票权的追随者”。这有点类似于观察员,因为它的投票不计入提交操作所需的法定人数。然而,不像一个无投票权的跟随者,一个观察者实际上不会看到任何操作建议,也不会对它们进行 ACK。因此,与观察者相比,无投票关注者对系统吞吐量的负面影响更为显著。无投票跟随者模式应该只作为一个临时模式,在关闭服务器之前,或添加它作为一个跟随者或作为一个观察者的集合。

我们不会自动关闭服务器,主要有两个原因。第一个原因是,我们不希望连接到此服务器的所有客户端立即断开连接,这会导致大量连接请求涌向其他服务器。相反,如果每个客户端都能独立决定何时进行迁移,那就更好了。第二个原因是,为了将服务器从“观察者”更改为“参与者”,有时(很少)可能需要删除服务器。

Note that the new configuration should have some minimal number of participants in order to be considered legal. If the proposed change would leave the cluster with less than 2 participants and standalone mode is enabled (standaloneEnabled=true, see the section The standaloneEnabled flag), the reconfig will not be processed (BadArgumentsException). If standalone mode is disabled (standaloneEnabled=false) then it’s legal to remain with 1 or more participants.

请注意,新的配置应该有一些最低数量的参与者,以便被视为合法的。如果提议的更改将使集群的参与者少于2个,并且启用了独立模式(standaloneEnabled = true) ,则不会处理重新配置(BadArgumentsException)。如果禁用了独立模式(standalooneenabled = false) ,那么保留一个或多个参与者是合法的。

Adding servers: Before a reconfiguration is invoked, the administrator must make sure that a quorum (majority) of participants from the new configuration are already connected and synced with the current leader. To achieve this we need to connect a new joining server to the leader before it is officially part of the ensemble. This is done by starting the joining server using an initial list of servers which is technically not a legal configuration of the system but (a) contains the joiner, and (b) gives sufficient information to the joiner in order for it to find and connect to the current leader. We list a few different options of doing this safely.

添加服务器: 在调用重新配置之前,管理员必须确保来自新配置的大多数参与者已经连接并与当前领导者同步。为了实现这一点,我们需要在leader正式成为集群的一部分之前连接一个新的连接服务器。这是通过使用服务器的初始列表来启动连接服务器,这在技术上不是系统的合法配置,但(a)包含joiner,并(b)为joiner提供足够的信息,以便它找到并连接到当前的leader。我们列出了几种安全的方法:

  1. Initial configuration of joiners is comprised of servers in the last committed configuration and one or more joiners, where joiners are listed as observers. For example, if servers D and E are added at the same time to (A, B, C) and server C is being removed, the initial configuration of D could be (A, B, C, D) or (A, B, C, D, E), where D and E are listed as observers. Similarly, the configuration of E could be (A, B, C, E) or (A, B, C, D, E), where D and E are listed as observers. Note that listing the joiners as observers will not actually make them observers - it will only prevent them from accidentally forming a quorum with other joiners. Instead, they will contact the servers in the current configuration and adopt the last committed configuration (A, B, C), where the joiners are absent. Configuration files of joiners are backed up and replaced automatically as this happens. After connecting to the current leader, joiners become non-voting followers until the system is reconfigured and they are added to the ensemble (as participant or observer, as appropriate).

    joiners的初始配置由最后提交配置中的服务器和一个或多个joiners组成,其中joiners被列为观察者。例如,如果服务器D和E同时被添加到(A、B、C),而服务器C正在被删除,那么D的初始配置可以是(A、B、C、D)或(A、B、C、D、E),其中D和E被列为观察者。类似地,E的配置可以是(A, B, C, E)或(A, B, C, D, E),其中D和E被列为观察者。请注意,将joiners列表为观察者实际上并不会让它们成为观察者——它只会阻止它们成为观察者。

  2. Initial configuration of each joiner is comprised of servers in the last committed configuration + the joiner itself, listed as a participant. For example, to add a new server D to a configuration consisting of servers (A, B, C), the administrator can start D using an initial configuration file consisting of servers (A, B, C, D). If both D and E are added at the same time to (A, B, C), the initial configuration of D could be (A, B, C, D) and the configuration of E could be (A, B, C, E). Similarly, if D is added and C is removed at the same time, the initial configuration of D could be (A, B, C, D). Never list more than one joiner as participant in the initial configuration (see warning below).

    每个joiner的初始配置由最后提交配置中的服务器+ joiner本身组成,作为参与者列出。例如,添加一个新的服务器D组成的一个配置服务器(A, B, C),管理员可以启动一个使用初始配置文件D组成的服务器(a, B, C, D)。如果两个同时添加D和E (A, B, C), D可能的初始配置(A, B, C, D)和E可能的配置(A, B, C,E).同样,如果添加D的同时删除C, D的初始配置可以是(A, B, C)。

  3. Whether listing the joiner as an observer or as participant, it is also fine not to list all the current configuration servers, as long as the current leader is in the list. For example, when adding D we could start D with a configuration file consisting of just (A, D) if A is the current leader. however this is more fragile since if A fails before D officially joins the ensemble, D doesn’t know anyone else and therefore the administrator will have to intervene and restart D with another server list.

    无论将joiner作为观察者还是参与者列出,也可以不列出所有当前配置服务器,只要当前leader在列表中。例如,当添加D时,如果A是当前的leader,我们可以在D开始时使用一个只包含(A, D)的配置文件。然而,这是更脆弱的,因为如果A在D正式加入集合之前失败,D不认识任何人,因此管理员将不得不干预和重新启动D与另一个服务器列表。

Note 注意
Warning 警告

Never specify more than one joining server in the same initial configuration as participants. Currently, the joining servers don’t know that they are joining an existing ensemble; if multiple joiners are listed as participants they may form an independent quorum creating a split-brain situation such as processing operations independently from your main ensemble. It is OK to list multiple joiners as observers in an initial config.

不要在与参与者相同的初始配置中指定多个加入服务器。目前,加入的服务器不知道他们加入了一个现有的集群; 如果多个加入者被列为参与者,他们可能形成一个独立的法定人数,创造一个裂脑的情况,如独立于你的主集群的处理操作。在初始配置中将多个参与者列为观察者是可以的。

If the configuration of existing servers changes or they become unavailable before the joiner succeeds to connect and learn about configuration changes, the joiner may need to be restarted with an updated configuration file in order to be able to connect.

如果现有服务器的配置发生变化,或者在连接器成功连接并了解配置变化之前变得不可用,那么连接器可能需要使用更新的配置文件重新启动,以便能够连接。

Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration.

最后,请注意,一旦连接到领导者,joiner 就会采用最后提交的配置,在这个配置中它是不存在的(joiner 的初始配置在被重写之前会得到备份)。如果 joiner 在这种状态下重新启动,它将无法引导,因为它不在其配置文件中。为了启动它,你需要再次指定一个初始配置。

Modifying server parameters: One can modify any of the ports of a server, or its role (participant/observer) by adding it to the ensemble with different parameters. This works in both the incremental and the bulk reconfiguration modes. It is not necessary to remove the server and then add it back; just specify the new parameters as if the server is not yet in the system. The server will detect the configuration change and perform the necessary adjustments. See an example in the section Incremental mode and an exception to this rule in the section Additional comments.

修改服务器参数: 可以修改服务器的任何端口或其角色(参与者/观察者) ,方法是使用不同的参数将其添加到集成中。这在增量和批量重新配置模式下都可以工作。不需要先删除服务器,然后再将其添加回来; 只需指定新的参数,就好像服务器尚未在系统中一样。服务器将检测配置更改并执行必要的调整。请参阅增量模式一节中的示例以及附加注释一节中对此规则的异常。

It is also possible to change the Quorum System used by the ensemble (for example, change the Majority Quorum System to a Hierarchical Quorum System on the fly). This, however, is only allowed using the bulk (non-incremental) reconfiguration mode. In general, incremental reconfiguration only works with the Majority Quorum System. Bulk reconfiguration works with both Hierarchical and Majority Quorum Systems.

还可以更改集合所使用的法定人数系统(例如,动态地将多数法定人数系统更改为分级法定人数系统)。但是,这只允许使用批量(非增量)重新配置模式。一般来说,增量重新配置只适用于多数仲裁系统。批量重新配置适用于分级和多数仲裁系统。

Performance Impact: There is practically no performance impact when removing a follower, since it is not being automatically shut down (the effect of removal is that the server’s votes are no longer being counted). When adding a server, there is no leader change and no noticeable performance disruption. For details and graphs please see Figures 6, 7 and 8 in the paper.

性能影响: 删除关注者实际上不会对性能产生影响,因为它不会被自动关闭(删除的结果是服务器的投票不再被计算)。当添加服务器时,不会有领导者更改,也不会有明显的性能中断。

The most significant disruption will happen when a leader change is caused, in one of the following cases:

最严重的混乱会发生在领导人变更的时候,在下列情况之一:

  1. Leader is removed from the ensemble. 领导者被逐出集群
  2. Leader’s role is changed from participant to observer. 领导者的角色由参与者转变为观察者
  3. The port used by the leader to send transactions to others (quorum port) is modified. 领导者用于将事务发送给其他人的端口(仲裁端口)被修改。

In these cases we perform a leader hand-off where the old leader nominates a new leader. The resulting unavailability is usually shorter than when a leader crashes since detecting leader failure is unnecessary and electing a new leader can usually be avoided during a hand-off (see Figures 6 and 8 in the paper).

在这种情况下,我们进行领导者交接,由老领导人提名新领导人。由于检测引导失败是不必要的,因此在交接过程中通常可以避免选择一个新的领导者失败,所以由此产生的不可用性通常比引导失败要短。

When the client port of a server is modified, it does not drop existing client connections. New connections to the server will have to use the new client port.

当修改服务器的客户端端口时,它不会删除现有的客户端连接。到服务器的新连接必须使用新的客户端端口。

Progress guarantees: Up to the invocation of the reconfig operation, a quorum of the old configuration is required to be available and connected for ZooKeeper to be able to make progress. Once reconfig is invoked, a quorum of both the old and of the new configurations must be available. The final transition happens once (a) the new configuration is activated, and (b) all operations scheduled before the new configuration is activated by the leader are committed. Once (a) and (b) happen, only a quorum of the new configuration is required. Note, however, that neither (a) nor (b) are visible to a client. Specifically, when a reconfiguration operation commits, it only means that an activation message was sent out by the leader. It does not necessarily mean that a quorum of the new configuration got this message (which is required in order to activate it) or that (b) has happened. If one wants to make sure that both (a) and (b) has already occurred (for example, in order to know that it is safe to shut down old servers that were removed), one can simply invoke an update (set-data, or some other quorum operation, but not a sync) and wait for it to commit. An alternative way to achieve this was to introduce another round to the reconfiguration protocol (which, for simplicity and compatibility with Zab, we decided to avoid).

进度保证: 在调用 reconfig 操作之前,需要有旧配置的 quorum 可用并连接到 ZooKeeper 以便能够取得进展。一旦重新配置被调用,旧配置和新配置的仲裁必须可用。最后的转换发生在(a)新配置被激活,(b)所有在新配置被领导者激活之前计划的操作被提交之后。一旦(a)和(b)发生,只需要新配置的仲裁人数。但是请注意,客户机不能看到(a)或(b)。具体地说,当重新配置操作提交时,这只意味着激活消息是由领导者发出的。这并不一定意味着新配置的 quorum 获得了这个消息(激活它需要这个消息)或者(b)已经发生。如果希望确保(a)和(b)都已发生(例如,为了知道关闭已删除的旧服务器是安全的) ,可以简单地调用一个更新(set-data 或其他仲裁操作,但不是同步) ,并等待它提交。实现这一点的另一种方法是引入另一轮重新配置协议(为了简单和与 Zab 的兼容性,我们决定避免)。

Incremental mode

增量模式

The incremental mode allows adding and removing servers to the current configuration. Multiple changes are allowed. For example:

增量模式允许在当前配置中添加和删除服务器。允许多次更改。例如:

> reconfig -remove 3 -add
server.5=125.23.63.23:1234:1235;1236

Both the add and the remove options get a list of comma separated arguments (no spaces):

Add 和 remove 选项都会得到一个以逗号分隔的参数列表(没有空格) :

> reconfig -remove 3,4 -add
server.5=localhost:2111:2112;2113,6=localhost:2114:2115:observer;2116

The format of the server statement is exactly the same as described in the section Specifying the client port and includes the client port. Notice that here instead of “server.5=” you can just say “5=”. In the example above, if server 5 is already in the system, but has different ports or is not an observer, it is updated and once the configuration commits becomes an observer and starts using these new ports. This is an easy way to turn participants into observers and vice versa or change any of their ports, without rebooting the server.

服务器语句的格式与指定客户端一节中描述的格式完全相同,并且包含客户端。注意,这里可以直接说“5 =”,而不是“ server. 5 =”。在上面的例子中,如果服务器5已经在系统中,但是有不同的端口或者不是观察者,那么它将被更新,一旦配置提交成为观察者并开始使用这些新端口。这是一种简单的方法,可以将参与者转换为观察者,反之亦然,或者更改其任何端口,而无需重启服务器。

ZooKeeper supports two types of Quorum Systems – the simple Majority system (where the leader commits operations after receiving ACKs from a majority of voters) and a more complex Hierarchical system, where votes of different servers have different weights and servers are divided into voting groups. Currently, incremental reconfiguration is allowed only if the last proposed configuration known to the leader uses a Majority Quorum System (BadArgumentsException is thrown otherwise).

ZooKeeper 支持两种类型的 Quorum 系统——简单多数系统(领导者在收到大多数投票者的 ack 后进行操作)和更复杂的分层系统,不同服务器的投票有不同的权重,服务器被分成投票组。目前,只有在领导者知道的最后一个提议配置使用了多数 Quorum 系统时,才允许增量重新配置(否则将引发 BadArgumentsException)。

Incremental mode - examples using the Java API:

增量模式-使用 java API 的例子:

List<String> leavingServers = new ArrayList<String>();
leavingServers.add("1");
leavingServers.add("2");
byte[] config = zk.reconfig(null, leavingServers, null, -1, new Stat());

List<String> leavingServers = new ArrayList<String>();
List<String> joiningServers = new ArrayList<String>();
leavingServers.add("1")以上是关于ZooKeeper Dynamic Reconfiguration(ZooKeeper 动态配置重构)的主要内容,如果未能解决你的问题,请参考以下文章

安全牛学习笔记MSsql2005(Sa)权限执行命令总结

c++ dynamic的问题

未处理的异常:类型 'List<dynamic>' 不是颤振中类型 'Map<dynamic, dynamic>' 的子类型

_CastError(类型 '_InternalLinkedHashMap<dynamic, dynamic>' 不是类型转换中类型 'Map<String, dynamic>

_InternalLinkedHashMap<String, dynamic>' 不是类型 'List<Map<dynamic, dynamic>>' 的子类型

类型“_InternalLinkedHashMap<dynamic, dynamic>”不是“函数结果”的“List<Map<String, dynamic>>”类