ElasticSearch集群状态异常(RedYellow)原因分析
Posted 努力者Mr李
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch集群状态异常(RedYellow)原因分析相关的知识,希望对你有一定的参考价值。
注: 部分概念介绍来源于网络
一、ElasticSearch集群的三种状态:
Green - 所有数据都可用,主副分片都已经分配好
Yellow - 所有数据都可用,但尚未分配一些副本,不影响查询,可能影响恢复。如果集群中的某个节点发生故障,则在修复该节点之前,某些数据可能不可用。
Red - 某些数据由于某种原因 存在主分片未分配,对查询会有影响
二、查询索引Yellow状态原因
1、查看集群的健康并显示索引状态
GET /_cluster/health?level=indices
"cluster_name" : "elasticsearch-1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
#活动主分区数量
"active_primary_shards" : 28,
#活动主分区和副本分区的总数
"active_shards" : 55,
#正在重定位的分片数量
"relocating_shards" : 0,
#正在初始化的分片数量
"initializing_shards" : 0,
#未分配的分片数
"unassigned_shards" : 3,
#其分配因超时设置而延迟的分片数
"delayed_unassigned_shards" : 0,
#尚未执行的集群级别更改的数量
"number_of_pending_tasks" : 0,
#为完成的访问数量
"number_of_in_flight_fetch" : 0,
#自最早的初始化任务等待执行以来的时间(以毫秒为单位)
"task_max_waiting_in_queue_millis" : 0,
#集群中活动碎片的比率,以百分比表示
"active_shards_percent_as_number" : 100.0,
"indices" :
"elasticsearch-1" :
"status" : "green",
"number_of_shards" : 3,
"number_of_replicas" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 3
2、查看集群中每个节点的分片分配情况
GET /_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
19 86.7kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 master
18 73.1kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 node-003
18 67.8kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 node-002
3 UNASSIGNED
#unassigned_shards=3,确定是副本分片未分配,导致集群状态Yellow
3、查看unassigned的原因
GET /_cluster/allocation/explain?pretty
"index" : "elasticsearch-1",
"shard" : 3,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" :
"reason" : "CLUSTER_RECOVERED",
"at" : "2022-04-20T11:01:43.051Z",
"last_allocation_status" : "no_attempt"
,
"can_allocate" : "no",
#异常原因
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
"node_id" : "NfmBH4nSSpGmtf7aPNuvXQ",
"node_name" : "master",
"transport_address" : "127.0.0.1:9300",
"node_decision" : "no",
"deciders" : [
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the same cannot be allocate to the same node no which a copy of the shard already exists "
]
]
查看每个节点原因说有同样的数据,不能分配。
4、查看所有的分片
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
5、修改索引副本数
PUT /elasticsearch-1/_settings
"number_of_replicas": 2
6、更改完后查询
GET /_cluster/health?level=indices
"unassigned_shards" : 0
三、总结(Red、Yellow)
遇到集群Red、Yellow时,我们可以从如下方法排查 :
集群层面:curl -s 172.31.30.28:9200/_cat/nodes 或者 GET /_cluster/health
索引层面:GET /_cluster/health?pretty&level=indices
分片层面:GET /_cluster/health?pretty&level=shards
恢复情况:GET /_recovery?pretty
1、有unassigned分片的排查思路 :
先诊断:GET /_cluster/allocation/explain
#重新分配: /_cluster/reroute
实在无法分配,索引重建:
1.1、新建备份索引:
curl -XPUT ‘http://xxxx:9200/a_index_copy/‘ -d ‘ “settings”: “index”: “number_of_shards”:3, “number_of_replicas”:1
1.2、通过reindex api将a_index数据copy到a_index_copy:
POST _reindex "source": "index": "a_index" , "dest": "index": "a_index_copy", "op_type": "create"
1.3、删除a_index索引,这个必须要先做,否则别名无法添加
curl -XDELETE 'http://xxxx:9200/a_index'
1.4、给a_index_copy添加别名a_index
curl -XPOST 'http://xxxx:9200/_aliases' -d ' "actions": [ "add": "index": "a_index_copy", "alias": "a_index" ] '
以上是关于ElasticSearch集群状态异常(RedYellow)原因分析的主要内容,如果未能解决你的问题,请参考以下文章
Elasticsearch unassigned shard
《Elasticsearch 源码解析与优化实战》第14章:Cluster模块分析