es Unassigned Shards 排查思路

Posted 2023-03-17

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了es Unassigned Shards 排查思路相关的知识，希望对你有一定的参考价值。

es Unassigned Shards

可能性1一

未分配的分片是指在Elasticsearch集群中存在的分片，但它们没有被分配到任何节点上。通常情况下，未分配的分片是由于集群中的节点无法分配或承载分片所致。在处理未分配的分片之前，我们需要先了解以下几点：

确定未分配分片的原因：通过执行以下命令可以确定未分配分片的原因

curl -XGET http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason

这将列出集群中所有分片的详细信息，包括未分配的原因。

集群健康状况：执行以下命令可以检查集群的健康状况

curl -XGET http://localhost:9200/_cluster/health?pretty=true

健康状况为绿色（green）表示集群健康状况良好，黄色（yellow）表示部分分片未分配，红色（red）表示大量分片未分配。

确认节点是否可用

GET /_cat/nodes?v

确认可用磁盘空间：在重新分配未分配分片之前，需要确保每个节点上有足够的可用磁盘空间来存储这些分片。您可以使用以下命令来检查每个节点的可用磁盘空间：

GET /_cat/allocation?v

检查分片分配策略是否正确配置。分片分配策略可以指定如何将分片分配给节点，以确保在节点宕机或新节点加入时正确分配分片。例如，可以使用以下命令来查看分片分配策略：

curl -X GET "http://localhost:9200/_cluster/settings?pretty=true"

根据未分配分片的原因和集群的健康状况，我们可以采取以下几种措施来处理未分配的分片：

增加节点容量：如果未分配的分片是由于节点容量不足而导致的，则可以考虑增加节点容量。
重新平衡分片：如果集群中某些节点上的分片负载过高，可以尝试通过重新平衡分片来解决问题。可以通过执行以下命令来手动触发重新平衡分片：

POST /_cluster/reroute?retry_failed=true

这将触发集群重新平衡所有分片。

修复故障节点：如果未分配的分片是由于故障节点所致，则需要先修复故障节点，然后重新平衡分片。
手动分配分片：如果以上方法都无法解决问题，可以考虑手动分配分片。可以通过执行以下命令来手动分配分片：

PUT /index/_shard/shard/_allocate?node=node_name

其中index是分片所属的索引名称，shard是分片ID，node_name是要将分片分配到的节点名称。但是，手动分配分片需要谨慎操作，因为错误的分配可能会导致数据丢失或损坏。

总之，在处理未分配的分片之前，需要仔细检查集群的健康状况和未分配分片的原因，并采取适当的措施来解决问题。

检查分片状态：重新分配操作完成后，需要检查分片状态是否已解决。您可以使用以下命令来检查分片状态：

curl -XGET http://localhost:9200/_cat/shards?h=index,shard,prirep,state

可能性二

如果一个 Elasticsearch 索引没有副本（replica），并且在某个节点上的主分片（primary shard）由于某些原因无法分配，则该索引上的未分配分片（unassigned shard）将无法在其他节点上创建副本，因为副本的创建依赖于已经存在的主分片。因此，在这种情况下，该索引上的未分配分片可能会显示为未分配状态。

在 Elasticsearch 中，主分片和副本分片都是分片（shard），它们都包含索引中的数据的一部分。主分片和副本分片之间的区别在于主分片是唯一的，它们是数据的主要拥有者，而副本分片是主分片的复制品，它们用于提高数据的可用性和容错性。

因此，如果一个索引没有副本并且主分片无法分配到节点上，则该索引上的未分配分片将无法创建副本，从而无法在其他节点上分配未分配分片。如果想要解决这个问题，可以尝试将该索引的副本数量设置为大于 0 的值，以便在主分片无法分配到节点时，副本分片可以接管该分片并提供数据的访问。

打开 Kibana 控制台或使用 curl 命令与 Elasticsearch 进行交互。
确认当前索引的副本数量。可以使用以下 curl 命令检索索引的当前设置

curl -X GET "http://localhost:9200/your_index_name/_settings?pretty"

其中，将 your_index_name 替换为要设置副本数量的索引名称。

如果当前索引的副本数量为 0，则使用以下 curl 命令将其设置为大于 0 的值：

curl -X PUT "http://localhost:9200/your_index_name/_settings?pretty" -H Content-Type: application/json -d

  "index": 
    "number_of_replicas": 1

其中，将 your_index_name 替换为要设置副本数量的索引名称，这里将副本数量设置为 1。

等待 Elasticsearch 完成重新平衡并将副本分配到其他节点。这可能需要一些时间，具体取决于集群中的索引大小和节点数量。

在副本分配到节点后，该索引上的未分配分片将会被自动分配到节点上。如果仍然存在未分配分片的问题，可以尝试重新启动 Elasticsearch 节点或手动重新分配分片

ES shard unassigned的解决方法汇总

说下shard出现的几个状态说明：

relocating_shards shows the number of shards that are currently moving from one node to another node(现网中遇到，因为kill -9重启es的方法不对，导致node下线，集群重新分配shard). This number is often zero, but can increase when Elasticsearch decides a cluster is not properly balanced, a new node is added, or a node is taken down, for example（我们的ES集群没有副本，很可能由于网络不稳定导致单个节点下线，从而重新分配shard）.
initializing_shards is a count of shards that are being freshly created. For example, when you first create an index, the shards will all briefly reside in initializing state. This is typically a transient event, and shards shouldn’t linger in initializing too long. You may also see initializing shards when a node is first restarted: as shards are loaded from disk, they start as initializing.（现网遇到过）
unassigned_shards are shards that exist in the cluster state, but cannot be found in the cluster itself. A common source of unassigned shards are unassigned replicas. For example, an index with five shards and one replica will have five unassigned replicas in a single-node cluster. Unassigned shards will also be present if your cluster is red (since primaries are missing).

其中unassigned_shards的问题是比较头痛的，我汇总了网上的解决方法，大家后面遇到可以参阅：

总结得最整的是 https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

单独针对主shard出现unassigned的解决可以看 http://blog.kiyanpro.com/2016/03/06/elasticsearch/reroute-unassigned-shards/ https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html http://www.wklken.me/posts/2015/05/23/elasticsearch-issues.html

单独针对副本shard出现unassigned的解决可以看 https://z0z0.me/recovering-unassigned-shards-on-elasticsearch/ https://dpatil1410.wordpress.com/2016/09/24/its-red-how-do-i-recover-unassigned-elasticsearch-shards/

以上是关于es Unassigned Shards 排查思路的主要内容，如果未能解决你的问题，请参考以下文章

ES shard unassigned的解决方法汇总

Diagnose unassigned shards

Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由

How to resolve unassigned shards in Elasticsearch——写得非常好

elasticsearch 单节点出现unassigned_shards

unassigned_shards一直无法分配