AWS 上的 Elasticsearch：如何修复未分配的分片？

Posted 2023-02-16

技术标签:

【中文标题】AWS 上的 Elasticsearch：如何修复未分配的分片？【英文标题】：Elasticsearch on AWS: How to fix unassigned shards? 【发布时间】：2018-01-09 18:38:47 【问题描述】：

我在 AWS Elasticsearch 上有一个索引，由于 NODE_LEFT 而没有被评估。这是_cat/shards的输出

rawindex-2017.07.04                     1 p STARTED    
rawindex-2017.07.04                     3 p UNASSIGNED NODE_LEFT
rawindex-2017.07.04                     2 p STARTED    
rawindex-2017.07.04                     4 p STARTED    
rawindex-2017.07.04                     0 p STARTED

在正常情况下，使用_cluster 或_settings 很容易重新分配这些分片。但是，这些正是 AWS 不允许的 API。我收到以下消息：


    Message: "Your request: '/_settings' is not allowed."

根据an answer to a very similar question，我可以使用AWS允许的_index API更改索引的设置。但是，index.routing.allocation.disable_allocation 似乎对我正在运行的 Elasticsearch 5.x 无效。我收到以下错误：


    "error": 
        "root_cause": [
            
                "type": "remote_transport_exception",
                "reason": "[enweggf][x.x.x.x:9300][indices:admin/settings/update]"
            
        ],
        "type": "illegal_argument_exception",
        "reason": "unknown setting [index.routing.allocation.disable_allocation] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
    ,
    "status": 400

我尝试使用高 index.priority 优先考虑索引恢复并将 index.unassigned.node_left.delayed_timeout 设置为 1 分钟，但我无法重新分配它们。

有什么方法（肮脏或优雅）可以在 AWS 托管的 ES 上实现这一点？

谢谢！

【问题讨论】：

由于 AWS ES 及其有限的灵活性，如果已经有此索引的备份，我会解决此问题的一种方法是删除索引并从备份中恢复它。将分配所有分片。 【参考方案1】：

我在 AWS Elasticsearch 6.3 版中遇到了类似的问题，即未能分配 2 个分片，并且集群的状态为 RED。运行GET _cluster/allocation/explain 显示原因是它们超出了默认的最大分配重试次数 5。

运行查询GET <my-index-name>/_settings 显示了可以更改每个索引的少数设置。请注意，如果您使用 AWS Elasticsearch 服务，所有查询都采用开箱即用的 Kibana 格式。以下解决了我的问题：

PUT <my-index-name>/_settings

  "index.allocation.max_retries": 6

随后立即运行 GET _cluster/allocation/explain 返回错误并显示以下内容："reason": "unable to find any unassigned shards to explain..."，一段时间后问题得到解决。

【讨论】：

优秀。解决了我的问题。注意：您可以使用相同的GET _cluster/allocation/explain 调用来确定哪些索引/索引包含有问题的未分配节点。有没有办法全局设置？这个没具体试过，不过可以使用伪索引_all来更改所有索引的设置，即PUT _all/_settings @villasv 来自_cluster/allocation/explain 的错误消息表明您可以通过调用/_cluster/reroute?retry_failed=true 对所有失败的分配全局重试分配。我没有亲自尝试，因为我急于修复我的集群，而且我只有几个失败的索引，所以更新了每个索引的重试计数。【参考方案2】：

当其他解决方案失败时，可能会有替代解决方案。如果您在 AWS 上有托管的 Elasticsearch 实例，那么您可以“仅”恢复快照的可能性很高。

检查失败的索引。

您可以用于例如：

curl -X GET "https://<es-endpoint>/_cat/shards"

或

curl -X GET "https://<es-endpoint>/_cluster/allocation/explain"

检查快照。

要查找快照存储库，请执行以下查询：

curl -X GET "https://<es-endpoint>/_snapshot?pretty"

接下来让我们看看cs-automated 存储库中的所有快照：

curl -X GET "https://<es-endpoint>/_snapshot/cs-automated/_all?pretty"

查找failures: [ ] 为空或您要恢复的索引未处于失败状态的快照。然后删除要恢复的索引：

curl -XDELETE 'https://<es-endpoint>/<index-name>'

...并像这样恢复已删除的索引：

curl -XPOST 'https://<es-endpoint>/_snapshot/cs-automated/<snapshot-name>/_restore' -d '"indices": "<index-name>"' -H 'Content-Type: application/json'

这里也有一些很好的文档：

https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-snapshots.html#es-managedomains-snapshot-restore https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html#aes-handling-errors-red-cluster-status

【讨论】：

以上是关于AWS 上的 Elasticsearch：如何修复未分配的分片？的主要内容，如果未能解决你的问题，请参考以下文章