Elasticsearch进阶篇 | 记一次kibana执行dsl脚本实战过程
Posted 每天译点晓知识
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch进阶篇 | 记一次kibana执行dsl脚本实战过程相关的知识,希望对你有一定的参考价值。
一、Elasticsearch Script History-分布式全文搜索-脚本引擎历史
在ES早期的版本中,使用MVEL脚本,但为解决安全隐患问题,于是Groovy脚本诞生。
随之出现的安全漏洞跟内存泄露问题,于是在ES5.0版本之际,painless脚本官宣,距今也有数年之久,painless脚本浮现在开发者眼前。
二、Elasticsearch Script ApplyCenarios-分布式全文搜索-脚本引擎应用场景
我们都很熟悉的认知到Elasticsearch全文搜索引擎,在其各版本系列中提供了丰富的dsl语法-增删改查-这里以6.x版本系列-6.8.6为例。
在80%以上的业务场景中作增删改查游刃有余,但应用于相对复杂的业务场景:
多字段自定义更新、自定义reindex、自定义数组字段动态添加...
当然基于脚本引擎手动开发插件也是可以实现的。
从painless脚本的衍生意义理解是"无痛"无漏洞的,但尤其需要注意的地方-不能以root账户启动es,不要公开es路径至其他用户。
从官方Script使用的介绍来看,首要就是性能问题,其次就是使用业务场景,ebay在性能优化实践英文版中也有体现,这里也mark下中文版。
其中,80%以上的业务场景:参考小编汇总Elasticsearch+Kibana+Dsl-Crud大全
GET _search
"query":
"match_all":
#节点信息
GET _cat/nodes?v
#各节点机器存储信息
GET _cat/allocation?v
#索引信息
GET _cat/indices?v
#分片信息
GET _cat/shards?v
#注册快照存储库-仓库共享
PUT _snapshot/my_backup
"type": "fs",
"settings":
"location": "/home/user/yxd179/es/backup"
#查看仓库信息
GET /_snapshot/my_backup?pretty
#查看快照存储库保存结果
GET _snapshot
#创建快照,这个会备份所有打开的索引到my_backup仓库下并命名为snapshot_phr的快照里。这个调用会立刻返回,然后快照会在后台运行。若是希望在脚本中一直等待到完成,可通过添加 wait_for_completion 标记实现,这个会阻塞调用直到快照完成(如果是大型快照,会花很长时间才返回),其中只会备份索引809iJpOmSI2ZmJrUqKRR0Q信息
PUT /_snapshot/my_backup/snapshot_yd?wait_for_completion=true
"indices": "809iJpOmSI2ZmJrUqKRR0Q",
"ignore_unavailable": true,
"include_global_state": false,
"metadata":
"taken_by": "phr",
"taken_because": "backup before upgrading"
#查看快照
GET /_snapshot/my_backup/snapshot_yd
#查看所有快照
GET /_snapshot/my_backup/_all
#删除快照
DELETE /_snapshot/my_backup/snapshot_yd
#监控快照创建或恢复过程
GET /_snapshot/my_backup/snapshot_yd/_status
#恢复快照
POST /_snapshot/my_backup/snapshot_yd/_restore
#动态模板
PUT /_template/yxd179_tpl
"index_patterns": [
"yxd179-2021*"
],
"settings":
"number_of_shards": 1,
"number_of_replicas": 1
,
"mappings":
"yd":
"dynamic_templates": [
"strings":
"match_mapping_type": "string",
"mapping":
"type": "text",
"index": true,
"copy_to": "full_context",
"analyzer": "ik_max_word",
"fields":
"keyword":
"type": "keyword",
"ignore_above": 256
],
"properties":
"full_context":
"type": "text",
"analyzer": "ik_max_word",
"fielddata": true,
"store": true
#副本分片分配设置
PUT /yxd179-2021/_settings
"number_of_replicas": "1"
#分页查询
GET /yxd179-2021/yd/_search
"from": 0,
"size": 30
#根据ID查询
GET /yxd179-2021/yd/647461503271768064
#bool query dsl查询
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"bool":
"should": [
"match":
"regNumber": "20203030651"
]
,
"term":
"status": "1"
]
,
"sort": [
"createTime":
"order": "desc"
],
"from": 0,
"size": 10
#允许ES最大滚动数目分配设置
PUT /yxd179-2021/_settings
"index":
"max_result_window": 13000000
#查看字段分词分析过程
POST /yxd179-2021/_analyze
"field": "regNumber",
"text": "国械标准20203030651号"
#模糊查询匹配
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"bool":
"should": [
"wildcard":
"regNumber.keyword": "*20203030651*"
]
,
"term":
"status": "1"
]
,
"sort": [
"createTime":
"order": "desc"
],
"from": 0,
"size": 10
#对指定字段设置分词器查询
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"match":
"hdsd0001004":
"query": "1828551417",
"analyzer": "char_analyzer"
]
,
"from": 0,
"size": 30
#模糊查询匹配
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"wildcard":
"hdsd0001002.keyword": "*yxd179*"
]
,
"from": 0,
"size": 30
#关闭索引:
POST yxd179-2021/_close
#打开索引:
POST yxd179-2021/_open
#对指定字段设置分词器
PUT /yxd179-2021/_mapping/yd
"properties":
"hdsd0001004":
"type": "text",
"analyzer": "char_analyzer"
#查看mapping结构体信息
GET yxd179-2021/_mapping
#设置分词分析器
PUT yxd179-2021/_settings
"analysis":
"analyzer":
"char_analyzer":
"tokenizer": "char_tokenizer",
"filter": "lowercase"
,
"tokenizer":
"char_tokenizer":
"type": "pattern",
"pattern": "|"
#minimum_should_match
GET /yxd179-2021/yd/_search
"query":
"query_string":
"query": "182855141y7",
"type": "phrase",
"operator": "AND",
"minimum_should_match": "100%",
"fields": [
"hdsd0001004"
]
#显示字段
GET /yxd179-2021/yd/_search
"_source":
"include": [
"id",
"productId"
]
,
"query":
"bool":
"must": [
"terms":
"productId": [
636654265306419462
]
]
,
"from": 0,
"size": 30
#高亮查询
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"bool":
"should": []
,
"term":
"status": "1"
,
"term":
"id":636662671736099971
]
,
"sort": [
"id":
"order": "asc"
],
"highlight":
"pre_tags": [
"<span class='title-key'>"
],
"post_tags": [
"</span>"
],
"fields":
"commonName":
"type": "plain"
,
"from": 0,
"size": 10
#read_only_allow_delete
PUT /yxd179-2021/_settings
"index":
"blocks":
"read_only_allow_delete":"false"
#查询模板
GET /_template
GET /yxd179-2021*/yd/_search
"from": 0,
"size": 30
#单个字段bool查询
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"term":
"id": "636651493706133509"
]
,
"from": 0,
"size": 30
#批量
POST /_bulk
"index":"_index":"yxd179-2021","_type":"yd","_id":"65965969996688"
"id":"65965969996688","HDSD0001002":"sdff","HDSD0001008":"fsdf","HDSD0001006":"000000000000000000","create_time":"2021-07-29","cancel_flag":0
"index":"_index":"yxd179-2021","_type":"yd","_id":"66049829996688"
"id":"66049829996688","HDSD0001002":"sdgsdg","HDSD0001008":"fsdfsdf","HDSD0001006":"000000000000000000","create_time":"2021-07-29","cancel_flag":1
#外层交集查询
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"bool":
"should": [
"match":
"regNumber": "国sd20182642128"
]
,
"term":
"status": "1"
]
,
"sort": [
"createTime":
"order": "desc"
],
"from": 0,
"size": 10
#复杂bool带权重查询-得分排序
GET /yxd179-2021/yd/_search
"from": 0,
"size": 10,
"query":
"bool":
"must": [
"bool":
"must": [
"term":
"cancelFlag":
"value": "0",
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"bool":
"should": [
"match":
"yhe":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
,
"match":
"yhr":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
,
"match":
"yht":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
,
"match":
"yhg":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"explain": true,
"sort": [
"id":
"order": "desc"
]
#查询耗时统计分析profile
GET /yxd179-2021/yd/_search
"profile": true,
"query":
"term":
"tu":6583120
#根据ID修改
POST /yxd179-2021/yd/b00e89b652484b0b8da16e090302e012/_update
"doc":
"fd":"1"
#修改_update_by_query脚本引擎painless
POST /yxd179-2021/_update_by_query
"query":
"term":
"fdh":6583120
,
"script":
"lang":"painless",
"source": "ctx._source.cancelFlag=params.cancelFlag;ctx._source.updateTime=params.updateTime",
"params":
"cancelFlag":"0",
"updateTime":"2021-07-28T01:17:36.000Z"
#交集查询-且保留-全
GET /yxd179-2021/yd/_search
"query":
"bool":
"must": [
"term":
"cancelFlag": "0"
,
"bool":
"must": [
"wildcard":
"hdsd0001002.keyword": "*yxd179*"
,
"match":
"hdsd0001003": "2"
]
]
,
"sort": [
"id":
"order": "desc"
],
"highlight":
"pre_tags": [
"<span class='title-key'>"
],
"post_tags": [
"</span>"
],
"fields":
"hdsd0001002":
"type": "plain"
,
"from": 0,
"size": 30
#外层交集查询-里层交集查询
GET /yxd179-2021/yd/_search
"from": 0,
"size": 10,
"query":
"bool":
"must": [
"bool":
"must": [
"term":
"cancelFlag":
"value": "0",
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"bool":
"must": [
"match":
"hdsd0001002":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
,
"match":
"hdsd0001003":
"query": "2",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"explain": true
#并集查询
GET /yxd179-2021/yd/_search
"from": 0,
"size": 10,
"query":
"bool":
"must": [
"term":
"cancelFlag":
"value": "0",
"boost": 1
],
"should": [
"match":
"hdsd0001002":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"explain": true
#并集查询-字段显示
GET /yxd179-2021/yd/_search
"from": 0,
"size": 10,
"query":
"bool":
"must": [
"match":
"cancelFlag":
"query": "0",
"operator": "AND",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
],
"should": [
"match":
"hdsd0001002":
"query": "张",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
,
"match":
"hdsd0001002.pinyin":
"query": "zhang",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
],
"adjust_pure_negative": true,
"boost": 1
,
"explain": true,
"_source":
"includes": [
"id",
"th001Id",
"createTime",
"updateTime",
"hdsd0001001",
"hdsd0001002",
"cancelFlag"
],
"excludes": []
#若需要更频繁的更新,可以使用es api强制更新
GET /yxd179-2021/_refresh
#根据ID删除
DELETE /yxd179-2021/yd/ud6-5XkBwVbB7HKjg5k0
#删除索引
DELETE /yxd179-2021
#删除模板-动态mapping
DELETE /_template/yxd179_tpl
#排序
GET /yxd179-2021/yd/_search
"sort": [
"createTime":
"order": "desc"
],
"from": 0,
"size": 30
三、Elasticsearch Script ActualCombat-分布式全文搜索-脚本引擎实战
这里仅以Update-By-Query为例:
其中,lang指定脚本引擎:painless,source中为script脚本片段,params为脚本参数值。
之所以通过params传递,可突破ES对脚本编译限制,虽然也可以通过下面操作来修改该解析上限的配置:
PUT /_cluster/settings
"transient":
"script.max_compilations_per_minute": 40
重要:对于大批量数据,ES都需要单独的编译解析,当进行bulk update时,若是每一个脚本都实时编译的话,可想而知很快就会达到上限。知其然知其所以然,对于ES中都只会在第一次进行解析这个脚本,之后便无需再次解析,当脚本中有常数变量时,ES会实时编译脚本,故结合script中的param功能,设法将脚本中的变量通过param传递进去,从而可以从根本上解决脚本编译解析限制的问题。
接下来,我们看下在Java中怎么样基于6.8.6版本构建tcp client执行painless脚本引擎?
补充:对updateByQuery API的调用从获取索引快照开始,索引使用内部版本控制找到任何文档。
试想当一个文档在快照的时间和索引请求过程之间发生变化时,会发生版本冲突。当版本匹配时,updateByQuery更新文档并增加版本号。上述为了防止版本冲突导致updateByQuery中止,还可以设abortOnVersionConflict(false)
,之所以这么做,是有可能它试图获取在线映射更改,而版本冲突意味着在相同时间开始updateByQuery和试图更新文档的冲突文档,该更新将获取在线映射更新,updateByQuery也可以通过指定pipeline来使用ingest节点。其中UpdateByQueryRequestBuilder API可支持过滤更新的文档,限制要更新的文档总数,并使用脚本更新文档,即时刷入磁盘,重试次数等。
Retry:
当客户端A、B几乎同时获取同一个文档, 一并获得_version
版本信息, 假设此时_version=1。
接着,客户端A修改文档中的部分内容, 将修改写入索引。而Elasticsearch在写入索引时, 检查客户端A提交的文档的版本信息(这里仍然是1) 和 现存的文档的版本信息(这里也是1), 发现相同后, 执行写入操作, 并修改版本号_version=2。
然后客户端B也修改文档中的部分内容, 其操作写回索引的速度稍慢,此时同样执行写入过程,ES发现客户端B提交的文档的版本为1, 而现存文档的版本为2,即发生冲突,此次partial update将失败-重试。
并发控制策略:partial update并发控制策略-乐观锁
小试牛刀案例:如何通过脚本引擎指定多个字段update?
方式No.1:
ctx._source.putAll(params)
方式No.2:
for (k in params.keySet())if (!k.equals('ctx'))ctx._source.put(k, params.get(k))
以上是关于Elasticsearch进阶篇 | 记一次kibana执行dsl脚本实战过程的主要内容,如果未能解决你的问题,请参考以下文章
Elasticsearch进阶篇 | 记一次kibana执行dsl脚本实战过程
记一次ElasticSearch重启之后shard未分配问题的解决