elasticsearch7.16版本数据冷热分层迁移失败总结

Posted 2022-05-01 zhenlq2015

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了elasticsearch7.16版本数据冷热分层迁移失败总结相关的知识，希望对你有一定的参考价值。

背景介绍：最近将某产品服务日志由本地服务器存储全部迁移到ELK，使用索引生命周期进行管理，由于日志量较大，每天的日志量为500G，考虑到成本问题，采用es的冷热数据分层，将不常查询的数据放到温层普通硬盘，保留近2个月，经常读写查询的日志在hot层保留近3天日志，2Tssd硬盘，配置好各个节点的角色后（关于配置比较简单，这里不再多说，新版本省去了很多步骤，主要是在node.roles定义角色即可），然后观察迁移结果，发生在warm层一直处于migrate状态，数据无法迁移到warm节点。

通过kibana界面或API命令（GET 索引名/_ilm/explain）可以看到索引在温热数据层迁移的状态成功与否，失败的话会有如下提示

失败提示1：

Waiting for [1] shards to be allocated to nodes matching the given filters

失败提示2:

lifecycle action [migrate] waiting for [2] shards to be moved to the [data_warm] tier (tier migration preference configuration is [data_warm,data_hot])”

可以通过API查看当前索引迁移状态：

GET /_cluster/allocation/explain #主要介绍这个排错方法，非常好用

"index": "logstash-xxx”,#索引名字

"shard": 4,#索引的某个具体的分片

"primary": true

查看返回的结果，主要是explanation字段的值，描述了迁移失败的原因，一般都对应的解决办法：

错误原因1（节点匹配错误）:

node does not match index setting [index.routing.allocation.require] filters [box_type:“cold”,_id:“so1n_id_3”]

错误原因2（node_concurrent_incoming_recoveries设置过小）:

"reached the limit of incoming shard recoveries [6], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=6] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])”

成功提示：

rebalance_explanation：cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance。

总结：

由于个人原因，未能在测试阶段进行充分验证，差点造成大错，还好过期数据已经按计划进行了迁移。

以上是关于elasticsearch7.16版本数据冷热分层迁移失败总结的主要内容，如果未能解决你的问题，请参考以下文章

极简实现 TiDB 冷热数据分层存储 | He3 团队访谈

冷热分层的elasticsearch架构下，分片无法分配到warm节点

Apache Druid0.17版本冷热数据分离设置

[elk]elasticsearch实现冷热数据分离

ElasticSearch——冷热分离