记一次ES 事故
Posted bohu83
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记一次ES 事故相关的知识,希望对你有一定的参考价值。
从报警来看,业务报接口超时,同时es 错误日志也会提示:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TcpTransport$RequestHandler@6fbaf20b on EsThreadPoolExecutor[search, queue capacity
= 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1f159058[Running, pool size = 49, active threads = 49, queued tasks = 1000, completed tasks = 23765969104]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50) ~[elasticsearch-5.2.2.jar:5.2.2]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) ~[?:1.8.0_71]
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:94) ~[elasticsearch-5.2.2.jar:5.2.2]
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:89) ~[elasticsearch-5.2.2.jar:5.2.2]
at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1445) [elasticsearch-5.2.2.jar:5.2.2]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1329) [elasticsearch-5.2.2.jar:5.2.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.2.2.jar:5.2.2]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:527) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.7.Final.jar:4.1.7.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_71]
常规操作:
重启es,es 客户端服务,没坚持多久有开始报错。
显示集群负载高,70%,通常在10%以下,状态为red, 意味部分主分片不可用。
紧急选择降级操作,排除历史数据几百G的大分片后恢复。
es 5.2 ,client 是TransportClient ,类似单例模式,排除是client 配置引发的问题
反思:
还是没确认系统负载高对应业务的具体索引,以及具体查询语句。对于底层掌握不够。
补充知识点:
1 为啥不能随意调整es的线程池参数
在并发查询量大的情况下,访问流量超过了集群中单个Elasticsearch实例的处理能力,Elasticsearch服务端会触发保护性的机制,这个跟硬件配置cpu的核数有关,调成几百估计系统就崩溃了。
核心关注:
索引(index):主要是索引数据和删除数据操作
搜索(search):主要是获取,统计和搜索操作
批量操作(bulk):主要是对索引的批量操作
更新(refresh):主要是更新操作
官网介绍如下:
A node uses several thread pools to manage memory consumption. Queues associated with many of the thread pools enable pending requests to be held instead of discarded.
There are several thread pools, but the important ones include:
generic
For generic operations (for example, background node discovery). Thread pool type is
scaling
.
search
For count/search/suggest operations. Thread pool type is
fixed
with a size ofint((
# of allocated processors* 3) / 2) + 1
, and queue_size of1000
.
search_throttled
For count/search/suggest/get operations on
search_throttled indices
. Thread pool type isfixed
with a size of1
, and queue_size of100
.
search_coordination
For lightweight search-related coordination operations. Thread pool type is
fixed
with a size of a max ofmin(5, (
# of allocated processors) / 2)
, and queue_size of1000
.
get
For get operations. Thread pool type is
fixed
with a size of # of allocated processors, queue_size of1000
.
analyze
For analyze requests. Thread pool type is
fixed
with a size of1
, queue size of16
.
write
For single-document index/delete/update and bulk requests. Thread pool type is
fixed
with a size of # of allocated processors, queue_size of10000
. The maximum size for this pool is1 +
# of allocated processors.
snapshot
For snapshot/restore operations. Thread pool type is
scaling
with a keep-alive of5m
and a max ofmin(5, (
# of allocated processors) / 2)
.
snapshot_meta
For snapshot repository metadata read operations. Thread pool type is
scaling
with a keep-alive of5m
and a max ofmin(50, (
# of allocated processors* 3))
.
warmer
For segment warm-up operations. Thread pool type is
scaling
with a keep-alive of5m
and a max ofmin(5, (
# of allocated processors) / 2)
.
refresh
For refresh operations. Thread pool type is
scaling
with a keep-alive of5m
and a max ofmin(10, (
# of allocated processors) / 2)
.
fetch_shard_started
For listing shard states. Thread pool type is
scaling
with keep-alive of5m
and a default maximum size of2 *
# of allocated processors.
fetch_shard_store
For listing shard stores. Thread pool type is
scaling
with keep-alive of5m
and a default maximum size of2 *
# of allocated processors.
flush
For flush and translog
fsync
operations. Thread pool type isscaling
with a keep-alive of5m
and a default maximum size ofmin(5, (
# of allocated processors) / 2)
.
force_merge
For force merge operations. Thread pool type is
fixed
with a size of 1 and an unbounded queue size.
management
For cluster management. Thread pool type is
scaling
with a keep-alive of5m
and a default maximum size of5
.
system_read
For read operations on system indices. Thread pool type is
fixed
with a default maximum size ofmin(5, (
# of allocated processors) / 2)
.
system_write
For write operations on system indices. Thread pool type is
fixed
with a default maximum size ofmin(5, (
# of allocated processors) / 2)
.
system_critical_read
For critical read operations on system indices. Thread pool type is
fixed
with a default maximum size ofmin(5, (
# of allocated processors) / 2)
.
system_critical_write
For critical write operations on system indices. Thread pool type is
fixed
with a default maximum size ofmin(5, (
# of allocated processors) / 2)
.
watcher
For watch executions. Thread pool type is
fixed
with a default maximum size ofmin(5 * (
# of allocated processors), 50)
and queue_size of1000
.
以上是关于记一次ES 事故的主要内容,如果未能解决你的问题,请参考以下文章