原创Elasticsearch无宕机迁移节点

Posted 2020-10-15 忘记密码

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了原创Elasticsearch无宕机迁移节点相关的知识，希望对你有一定的参考价值。

官方API文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html

参考链接：https://jee-appy.blogspot.com/2016/09/replace-node-elasticsearch-cluster.html

一、测试环境
系统：centos7(虚拟机)
ES版本：5.0.0
节点：三个，node-1（9200）、node-2（9201）、node-3（9202）

说明：以下内容index用“myindex”表示，type用“mytype”表示

二、测试前集群信息
1.查看index配置
http://localhost:9200/myindex/_settings?pretty

{
  "myindex": {
    "settings": {
      "index": {
        "creation_date": "1491993018773",
        "number_of_shards": "1",
        "number_of_replicas": "1",
        "uuid": "4GboegTyTQiPoRjbtbLXFA",
        "version": {
          "created": "5000099"
        },
        "provided_name": "myindex"
      }
    }
  }
}

2.当前各节点信息

http://localhost:9200/_cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent 　　　　　host 　　　　　　ip 　　　node
　　 0 　　　　　　0b 　　5.8gb 　　 11.8gb 　　 17.6gb 　　　　　 33 　　127.0.0.1 　　127.0.0.1 　　node-2
　　 1 　　 　 4.5mb 　　5.8gb 　　 11.8gb 　　 17.6gb 　　　　　 33 　　127.0.0.1 　　127.0.0.1 　　node-3
　　 1 　　　　4.5mb 　　5.8gb 　　 11.8gb 　　 17.6gb 　　　　   33 　　127.0.0.1 　　127.0.0.1 　　node-1

3.测试三个节点查询
http://localhost:9200/myindex/mytype/_search?pretty
http://localhost:9201/myindex/mytype/_search?pretty
http://localhost:9202/myindex/mytype/_search?pretty

结果：三个节点但会数据总数都为13753，节点正常

4.集群配置信息

http://localhost:9200/_cluster/settings?pretty

{
  "persistent": {},
  "transient": {}
}

5.索引分片信息

http://localhost:9200/_cat/shards?v

index shard prirep state docs store ip node

三、移除节点

当前有3个节点，node-1、node-2、node-3，由于分片备份数设置为1（上面绿色背景），在node-1和node-3上有全量数据。

现计划将node-1移除，理想的结果是数据转移到node-2和node-3，node-1不再保留数据，然后停止node-1节点。

1.exclude节点

curl -XPUT localhost:9200/_cluster/settings -d ‘{"transient" :{"cluster.routing.allocation.exclude._name" :"node-1"}}‘

参数说明：

①由于es集群数据和配置是共享的，所以在三个节点中任意一个执行上面命令即可，在示例中端口可以是9200、9201、9202。

②exclude的匹配属性支持三种“_name”（节点名）、“_ip”、“_host”，而且值支持通配符匹配，详情见官方API文档。各个节点的三个参数可通过“http://localhost:9200/_cat/allocation?v”查看，对应的列名分别为“node”、“ip”、“host”。因为测试中三个节点的ip和host都相同所以通过"_name"参数区分，需要根据具体情况修改。

③另外，支持exclude和include，值也支持数组如：{"transient" :{"cluster.routing.allocation.exclude._name" :["node-1","node-2"]}}。若上述命令重复执行es配置信息会被覆盖，可以通过“http://localhost:9200/_cluster/settings?pretty”查看当前配置。

2.判断数据迁移结果

http://localhost:9200/_cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent 　　　　host 　　　　　　ip 　　node
　　 1 　　　　4.5mb 　　 5.8gb 　　11.8gb 　　 17.6gb 　　　　　　33 　127.0.0.1 　　127.0.0.1  node-2
　　 1 　　　　4.5mb 　　 5.8gb 　　11.8gb 　　 17.6gb 　　　　　　33   127.0.0.1 　　127.0.0.1  node-3
　　 0 　　　　　 0b 　　 5.8gb 　　11.8gb 　　 17.6gb 　　　　　　33   127.0.0.1 　　127.0.0.1  node-1

当shards和disk.indices都为0时表示node-1节点没数据了，数据迁移完成，这时就可以停止node-1节点了。

3.exclude后节点分配情况

http://localhost:9200/_cat/shards?v

　 index  shard  prirep  　state 　 docs 　　store 　　　　 ip 　　node
myindex 　　  0 　　  p  STARTED   13753 　　4.5mb  127.0.0.1   node-3

myindex 　　  0 　　　r  STARTED   13753 　　4.5mb  127.0.0.1   node-2

4.停止node-1节点

http://localhost:9200/_cat/allocation?v

 shards disk.indices disk.used disk.avail disk.total disk.percent 　　　　host 　　　　 ip 　node
　　　 1 　　　　4.5mb 　　5.8gb 　　 11.8gb 　　17.6gb 　　　　 　 33 　127.0.0.1  127.0.0.1 node-2
　　　 1 　　　　4.5mb 　　5.8gb 　　 11.8gb 　　17.6gb 　　　　 　 33 　127.0.0.1  127.0.0.1 node-3

5.重置settings
curl -XPUT localhost:9200/_cluster/settings -d ‘{"transient" :{"cluster.routing.allocation.exclude._name" :null}}‘

三、总结
可以每exclude一个节点完成后，停用这个节点。
也可以一条命令include所有保留的节点并exclude所有不保留的节点，然后停用节点。

以上是关于原创Elasticsearch无宕机迁移节点的主要内容，如果未能解决你的问题，请参考以下文章

MongoDB 2.3复制（副本集）

某云elasticsearch节点失效，手动重置primary，迁移分区

《Linux运维总结：使用elasticdump工具迁移单节点elasticsearch数据(方案一)》

如何使用快照进行elasticsearch迁移，使用共享文件夹方式

elasticsearch集群安全重启节点

（转）计算节点宕机了怎么办？