ElasticSearch中的某个index的状态显示为red的问题index显示Unassigned Shards
Posted to.to
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch中的某个index的状态显示为red的问题index显示Unassigned Shards相关的知识,希望对你有一定的参考价值。
ElasticSearch中的某个index的状态显示为red的问题
错误:Unassigned Shards 4
1.1.1.查看集群状态
GET /_cluster/health?pretty
结果类似:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 426,
"active_shards" : 851,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
从上面可看出,集群的状态为red,其中unassigned_shard为4。错误原因就是有unassigned_shard的索引导致的。
1.1.2.查看索引的状态
GET /_cat/indices?v
结果类似:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test_test_operation_to pyz9euqTQQ6GF0ulPsnX4g 3 1 1 0 9.2kb 4.6kb
green open cba1 r5fvWeeAQ7uxQMRtNJxuwA 5 1 1 0 10.2kb 5.1kb
green open positiveinfo 97r4mnToS1OVx04QVzF5Rw 3 1 3091 7 993.4kb 496.7kb
green open dc_rep_pub_issue_output_month lLfxHpsZR8GPecqMLTvLsg 5 1 311845 87 163.3mb 81.5mb
close emplyee_test bsVDbqFWS0uYekpFI4Wnng
green open .monitoring-kibana-6-2021.05.27 cqlsx2crQyuc0WtSc_74zw 1 1 2711 0 1.8mb 959.1kb
green open filtertableinfo KGoc6kxqRtuZxPHG7Z6oXw 3 1 67 1 171kb 85.5kb
red open sg_house_rent_info_prod fAVmV5aqTROVbHjqw0GRKg 5 1 60313716 16540955 19.7gb 10.2gb
查看health 为red的,可以定位到是:sg_house_rent_info_prod
1.1.3.查看每个节点分片的分配数量以及它们所使用的硬盘空间大小
我们通过 GET _cat/allocation?v 可以查看每个节点分片的分配数量以及它们所使用的硬盘空间大小
GET _cat/allocation?v
结果类似:
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
284 88.2gb 218gb 2.7tb 2.9tb 7 hadoop1 xxx.xxx.xxx.xxx hadoop1
284 104.7gb 248.9gb 2.7tb 2.9tb 8 hadoop3 xxx.xxx.xxx.xxx hadoop3
283 96.6gb 234.6gb 2.8tb 3tb 7 hadoop2 xxx.xxx.xxx.xxx hadoop2
4 UNASSIGNED
发现其有4个shard是unassigned状态
再通过GET /_cat/health?v 查看集群健康状态。如果是正常的,显示的结果是如下的:
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1622101651 07:47:31 elasticsearch green 3 3 851 426 0 0 0 0 - 100.0%
1.1.4.如何解决呢?
首先精确定位unassigned shard的位置
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
然后可以通过以下语句查看具体原因:
GET _cluster/allocation/explain?pretty
笔者查询出的结果是:
{
"index" : "sg_house_rent_info_prod",
"shard" : 2,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-05-24T20:47:04.790Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [w__xIKWBT5KJZg1CEcmFGA]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[sg_house_rent_info_prod][2]: obtaining shard lock timed out after 5000ms]; ",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions" : [
{
"node_id" : "BOIgtPqgQSyIfAICLDuEfQ",
"node_name" : "hadoop1",
"transport_address" : "xxx.xxx.xxx.xxx:9300",
"node_attributes" : {
"ml.machine_memory" : "269924302848",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : false,
"allocation_id" : "AxRuU1gfS3yimqXUd7SoJw"
}
},
{
"node_id" : "eGv9Jjs_S8GcNLKzkCxzMA",
"node_name" : "hadoop2",
"transport_address" : "xxx.xxx.xxx.xxx:9300",
"node_attributes" : {
"ml.machine_memory" : "269924302848",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : false,
"allocation_id" : "wjR9jkfjQ-28OBKl_xFi1A",
"store_exception" : {
"type" : "file_not_found_exception",
"reason" : "no segments* file found in SimpleFSDirectory@/home/admin/es/esdata/nodes/0/indices/fAVmV5aqTROVbHjqw0GRKg/2/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@473b450: files: [write.lock]"
}
}
},
{
"node_id" : "w__xIKWBT5KJZg1CEcmFGA",
"node_name" : "hadoop3",
"transport_address" : "xxx.xxx.xxx.xxx:9300",
"node_attributes" : {
"ml.machine_memory" : "269924302848",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "PQWOPdxnQDqVfQbRLgh32A",
"store_exception" : {
"type" : "shard_lock_obtain_failed_exception",
"reason" : "[sg_house_rent_info_prod][2]: obtaining shard lock timed out after 5000ms",
"index_uuid" : "fAVmV5aqTROVbHjqw0GRKg",
"shard" : "2",
"index" : "sg_house_rent_info_prod"
}
},
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-05-24T20:47:04.790Z], failed_attempts[5], delayed=false, details[failed shard on node [w__xIKWBT5KJZg1CEcmFGA]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[sg_house_rent_info_prod][2]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"
}
]
}
]
}
到网上查找failed to create shard, failure IOException[failed to obtain in-memory shard lock
解决办法:
在kibana中执行如下命令:
POST /_cluster/reroute?retry_failed=true
retry_failed : (可选,布尔值)如果为true,则重试由于后续分配失败过多而阻塞的分片的分配。
附:常见es分配失败原因:
1)INDEX_CREATED:由于创建索引的API导致未分配。
2)CLUSTER_RECOVERED :由于完全集群恢复导致未分配。
3)INDEX_REOPENED :由于打开open或关闭close一个索引导致未分配。
4)DANGLING_INDEX_IMPORTED :由于导入dangling索引的结果导致未分配。
5)NEW_INDEX_RESTORED :由于恢复到新索引导致未分配。
6)EXISTING_INDEX_RESTORED :由于恢复到已关闭的索引导致未分配。
7)REPLICA_ADDED:由于显式添加副本分片导致未分配。
8)ALLOCATION_FAILED :由于分片分配失败导致未分配。
9)NODE_LEFT :由于承载该分片的节点离开集群导致未分配。
10)REINITIALIZED :由于当分片从开始移动到初始化时导致未分配(例如,使用影子shadow副本分片)。
11)REROUTE_CANCELLED :作为显式取消重新路由命令的结果取消分配。
12)REALLOCATED_REPLICA :确定更好的副本位置被标定使用,导致现有的副本分配被取消,出现未分配。
另外一篇比较好的博文:
Elasticsearch 集群和索引健康状态及常见错误说明
以上是关于ElasticSearch中的某个index的状态显示为red的问题index显示Unassigned Shards的主要内容,如果未能解决你的问题,请参考以下文章
elasticsearch-5.6.1删除index下的某个type
elasticsearch 查看某个index 都有哪些type