记录es的基本使用
Posted 泛舟五湖之间
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记录es的基本使用相关的知识,希望对你有一定的参考价值。
在上一篇的文章中,我们已经将es基本安装好了,并且kibana也已经安装好了,在本章中我们就利用kibana来使用es,
实践一下。主要的版本是
es7.9.3
kibana7.9.3
当然在使用es之前,我们需要新增一批数据进去,为了验证后面的用法而准备的数据。
本篇博客的思路基本就是按照es的基本概念来写的,用法上也是先从集群-->索引-->文档的基本的应用。
更复杂的应用我们会放到后续的博客中。
一、准备es示例数据
首先我们把示例数据下载下来,示例数据被我保存在了gitee中的helloes项目下:
https://gitee.com/xiezuozhen/hello-world/tree/master/HelloEs/src/test/resources/exampledata
可自行下载使用
我这里使用的是accounts.json,其中部分示例数据如下所示:
"index":"_id":"1"
"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"
"index":"_id":"6"
"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"
"index":"_id":"13"
"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"
"index":"_id":"18"
"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"
"index":"_id":"20"
"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"
"index":"_id":"25"
"account_number":25,"balance":40540,"firstname":"Virginia","lastname":"Ayala","age":39,"gender":"F","address":"171 Putnam Avenue","employer":"Filodyne","email":"virginiaayala@filodyne.com","city":"Nicholson","state":"PA"
"index":"_id":"32"
"account_number":32,"balance":48086,"firstname":"Dillard","lastname":"Mcpherson","age":34,"gender":"F","address":"702 Quentin Street","employer":"Quailcom","email":"dillardmcpherson@quailcom.com","city":"Veguita","state":"IN"
"index":"_id":"37"
"account_number":37,"balance":18612,"firstname":"Mcgee","lastname":"Mooney","age":39,"gender":"M","address":"826 Fillmore Place","employer":"Reversus","email":"mcgeemooney@reversus.com","city":"Tooleville","state":"OK"
"index":"_id":"44"
"account_number":44,"balance":34487,"firstname":"Aurelia","lastname":"Harding","age":37,"gender":"M","address":"502 Baycliff Terrace","employer":"Orbalix","email":"aureliaharding@orbalix.com","city":"Yardville","state":"DE"
"index":"_id":"49"
"account_number":49,"balance":29104,"firstname":"Fulton","lastname":"Holt","age":23,"gender":"F","address":"451 Humboldt Street","employer":"Anocha","email":"fultonholt@anocha.com","city":"Sunriver","state":"RI"
"index":"_id":"51"
"account_number":51,"balance":14097,"firstname":"Burton","lastname":"Meyers","age":31,"gender":"F","address":"334 River Street","employer":"Bezal","email":"burtonmeyers@bezal.com","city":"Jacksonburg","state":"MO"
"index":"_id":"56"
"account_number":56,"balance":14992,"firstname":"Josie","lastname":"Nelson","age":32,"gender":"M","address":"857 Tabor Court","employer":"Emtrac","email":"josienelson@emtrac.com","city":"Sunnyside","state":"UT"
当我们拿到示例数据后,我们可以通过进入宿主进行导入,
curl -H 'Content-Type:application/x-ndjson' -XPOST '192.168.47.210:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
这里需要将accounts.json最后留一个空行,才能成功。
也可以利用kibana的可视化工具导入,不过可视化导入我这里导入的数据有点问题,所以这里就不展示kibana可视化导入了。
现在数据准备完毕,我们可以开始应用了。
二、查看es集群的相关信息的方法
主要分为了两类,一类为_cat,一类为_cluster
一、_cat的用法
我这里仅仅是列举了部分常用的用法,如果想要查看关于cat的其他的用法可以参考官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
GET /_cat/nodes?v&pretty #查看所有节点信息
GET /_cat/shards?v&pretty #查看各shard的详细情况
GET /_cat/master?v&pretty #查看master节点信息
GET /_cat/indices?v&pretty #查看集群中所有index的详细信息
GET /_cat/segments?v&pretty #查看各index的segment详细信息,包括segment名, 所属shard, 内存(磁盘)占用大小, 是否刷盘
GET /_cat/count?v&pretty #查看当前集群的doc数量
GET /_cat/recovery?v&pretty #查看集群内每个shard的recovery过程.调整replica。
GET /_cat/health?v&pretty #查看集群当前状态:红、黄、绿
GET /_cat/pending_tasks?v&pretty #查看当前集群的pending task,即挂起的任务
GET /_cat/aliases?v&pretty #查看集群中所有alias信息,路由配置等
GET /_cat/thread_pool?v&pretty #查看集群各节点内部不同类型的threadpool的统计信息,
GET /_cat/plugins?v&pretty #查看集群各个节点上的plugin信息
GET /_cat/fielddata?v&pretty #查看当前集群各个节点的fielddata内存使用情况
GET /_cat/nodeattrs?v&pretty #查看单节点的自定义属性
GET /_cat/repositories?v&pretty #输出集群中注册快照存储库
GET /_cat/templates?v&pretty #输出当前正在存在的模板信息
我选择几个在生产环境中用的比较多的进行讲解:
一、节点信息查询
GET /_cat/nodes:查看所有的节点信息
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.47.210 64 91 2 0.05 0.10 0.13 dilmrt * es3
192.168.47.210 58 91 2 0.05 0.10 0.13 dilmrt - es1
192.168.47.210 70 91 2 0.05 0.10 0.13 dilmrt - es2
heap.percent 堆内存占用百分比
ram.percent 内存占用百分比
cpu CPU占用百分比
master *表示节点是集群中的主节点
name 节点名
二、别名信息查询
GET /_cat/aliases?v&pretty 查询所有的别名信息
alias index filter routing.index routing.search is_write_index
.kibana-event-log-7.9.3 .kibana-event-log-7.9.3-000001 - - - true
ilm-history-2 ilm-history-2-000001 - - - true
.kibana .kibana_1 - - - -
.security .security-7 - - - -
.kibana_task_manager .kibana_task_manager_1 - - - -
alias 索引的别名
index 索引名
routing.index 索引路由
routing.search 搜索路由
filter 过滤
别名是es中非常重要的一个功能。不仅生产中经常用到,而且面试里面也经常会提起
三、索引信息查询
GET /_cat/indices?v&pretty
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .security-7 eY2dGhuKRpKcrnW_1dcAHg 1 1 7 0 51kb 25.5kb
green open .kibana-event-log-7.9.3-000001 FR7mcABbTZOWS_rLSnmHPQ 1 1 2 0 21.8kb 10.9kb
green open .apm-custom-link Xwzboc0UTWaAxXIRBFm0jw 1 1 0 0 416b 208b
green open .kibana_task_manager_1 6rW9n3yJRuSqZGR44QjlGw 1 1 6 3741 1mb 588kb
green open .apm-agent-configuration aNXRDK5FR0q-ZNbwVx54XA 1 1 0 0 416b 208b
green open .async-search 1Q_H4j1URoucJXkurKelZA 1 1 0 1 6.6kb 3.3kb
green open .kibana_1 EjDRMq5iSEOK6QitQqO4QQ 1 1 74 14 20.9mb 10.4mb
green open account LbMvFKxOR3mjfT9rflN7JA 1 1 1999 0 743kb 371.5kb
health 索引的健康状态
index 索引名
pri 索引主分片数量
rep 索引复制分片 数
store.size 索引主分片 复制分片 总占用存储空间
pri.store.size 索引总占用空间, 不计算复制分片 占用空间
还有其他的用法就不一一列举了,如果想查询更详细的信息自己可以翻阅官方文档
二、_cluster的用法
其实_cluster和_cat的用法差不多,返回的信息要比_cat详细的多,_cat一般是查看集群的相关信息,_cluster不仅可以查看,而且还可以对集群进行操作。
主要的命令有:
GET _cluster/health #查看集群健康状态接口
GET _cluster/state #查看集群状况接口
GET _cluster/stats #查看集群统计信息接口
GET _cluster/pending_tasks #查看集群挂起的任务接口
POST _cluster/reroute #集群重新路由操作(这个操作一般是在分片失败时使用,可以使用此命令重新尝试分片)
PUT _cluster/settings #更新集群设置
GET _nodes/stats #节点状态
GET _nodes #节点信息
GET _nodes/hot_threads #节点的热线程
GET nodes/_master/_shutdown #关闭节点
这里详细列举其中几个即可:
一、集群健康信息
GET _cluster/health
"cluster_name" : "elasticsearch-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 9,
"active_shards" : 18,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
status:es集群状态
green:所有的主分片和副本父分片都已分配,集群100%可用。
yellow:所有的主分片已经分片,但至少还有一个副本分片是缺失的。不会有数据丢失,所以搜索结果依然是完整的。
不过高可用性在某种程度上被弱化。如果更多的分片消失,就会丢数据。
red:至少一个主分片(以及它的全部副本)都在丢失中。搜索只能返回部分数据,而分配到这个分片的写入请求会返回一个异常。
timed_out:是否超时
number_of_nodes :节点数
number_of_data_nodes:数据节点数
active_primary_shards:集群中所有索引的主分配数
active_shards:集群中所有索引的分片数
relocating_shards:当前正在节点间迁移的分片,该值在正常情况下为0,但在ES集群有节点加入或移除时,集群会发现分片分布不均衡,便会开始进行分片迁移。
initializing_shards:初始化中的分片数。往往在分片刚被创建或节点重启时,会经历短暂的initializing状态。
unassigned_shards:未分配的分片数。表示集群中存在分片但实际又找不到分片,常见于存在未分配的副本,如集群中有1个节点,有一个索引有5个分片1个副本,由于ES的灾备原则,副本分片不能与主分配保存在同一个节点中,那么就会有5个副本分片处于未分配状态。
二、集群状态信息
GET _cluster/state
"cluster_name" : "elasticsearch-cluster",
"cluster_uuid" : "uY2HW2E1S4m_Y2N5YSBeJQ",
"version" : 492,
"state_uuid" : "sksdPfrxSDKBZTMWJBTTRg",
"master_node" : "jW8PbSdhTOOpESX13DRBJQ",
"blocks" : ,
"nodes" : ***,
"metadata" : ***,
"routing_table" : ***,
"routing_nodes" :
由于数据量比较大,所以我这里仅仅是把其中重要的几部分罗列出来了
nodes:节点
metadata:主要是元数据信息,包括集群元信息和索引元信息
routing_table:路由表
routing_nodes:路由节点
三、关于index的一些用法
前面的基本都是对集群的查询,接下来我们学习一下关于索引的增删改查。
一、索引删除功能
DELETE bank 表示删除bank索引
当然还可以批量删除
DELETE bank*就是将以bank开头的索引全部删除
二、索引新增功能
如代码所示,我们新增一个banks索引。
PUT /banks/
"aliases" : ,
"mappings" :
"properties" :
"account_number" :
"type" : "long"
,
"address" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"age" :
"type" : "long"
,
"balance" :
"type" : "long"
,
"city" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"email" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"employer" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"firstname" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"gender" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"lastname" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"state" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"settings" :
"index" :
"number_of_shards" : "1",
"number_of_replicas" : "1"
三、索引修改功能
由于在搜索引擎中,单纯对索引的修改,一般是先删除原来的索引,然后重建索引。
四、索引查询功能
GET bank
查询索引的相关信息
返回结果:
"bank" :
"aliases" : ,
"mappings" :
"properties" :
"account_number" :
"type" : "long"
,
"address" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"age" :
"type" : "long"
,
"balance" :
"type" : "long"
,
"city" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"email" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"employer" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"firstname" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"gender" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"lastname" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"state" :
"type" : "text",
"fields" :
"keyword" :
"type" : "keyword",
"ignore_above" : 256
,
"settings" :
"index" :
"creation_date" : "1679827093469",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "FbFyX5TdRGC88ZlbvnemmQ",
"version" :
"created" : "7090399"
,
"provided_name" : "bank"
四、关于文档的一些用法
根据id查询索引下的某一个文档
GET bank/_doc/1
返回结果:
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" :
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
根据id删除某一个文档
DELETE bank/_doc/1
返回结果:
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" :
"total" : 2,
"successful" : 2,
"failed" : 0
,
"_seq_no" : 1000,
"_primary_term" : 1
根据id(自定义id)新增
PUT bank/_doc/1
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
返回结果:
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" :
"total" : 2,
"successful" : 2,
"failed" : 0
,
"_seq_no" : 1001,
"_primary_term" : 1
根据id进行修改文档
PUT bank/_doc/1
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber_xzz",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
我们将firstname=Amber修改为firstname=Amber_xzz,然后提交
返回结果为:
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" :
"total" : 2,
"successful" : 2,
"failed" : 0
,
"_seq_no" : 1002,
"_primary_term" : 1
我们这里仅仅是一些简单的使用,后续还会有很多关于查询的功能的使用。敬请期待…
es的基本查询api的使用
基本查询种类
term查询
{ "query": { "term": { "title": "crime" } } }
-
指定权重
{ "query": { "term": { "title": { "value":"crime", "boost":10.0 } } } }
-
多term查询查询tags中包含novel或book
{ "query": { "terms": { "tags": ["novel","book"] } } }
常用词查询
简单理解就是去除停用词的高权限,分高低频两组去查询,像停用词就是高频的,cutoff_frequency表示低于这个概率的词将出现在低频组中。
{ "query": { "common": { "title":{ "query":"crime and punishment", "cutoff_frequency":0.001 } } } }
match查询( 不支持lucene查询语法 )
查询title包含crime或and或punishment的文档
{ "query": { "match": { "title": "crime and punishment" } } }
operator操作符
要求and或者or匹配文本的分词
{ "query": { "match": { "title": { "query":"crime and punishment", "operator":"and" } } } }
短语查询
{ "query": { "match_phrase": { "title": { "query":"crime punishment", "slop":1 } } } }
前缀查询
对查询关键词的最后一个词条做前缀匹配
{ "query": { "match_phrase_prefix": { "title": { "query":"crime punish", "slop":1, "max_expansions":20 } } } }
multi_match( 针对多个字段查询 )
{ "query": { "multi_match": { "query":"crime heller", "fields":["title","author"] } } }
query_string查询( 支持lucene的查询语法 )
title字段包含crime,且权重为10,也要包含punishment,但是otitle不包含cat,同事author字段包含Fyodor和dostoevsky。
{ "query": { "query_string": { "query":"title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)", "default_field":"title" } } }
针对多字段查询
use_dis_max使用最大分查询,max指对于给定的关键词,只有最高分才会包括在最后的文档的评分中,而不是所有包含该词条的所有字段分数之和。
{ "query": { "query_string": { "query":"crime heller", "fields":["title","author"], "use_dis_max":true } } }
simple_query_string查询
解析出错时不抛异常,丢弃查询无效的部分
{ "query": { "simple_query_string": { "query":"title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)", "default_operator":"or" } } }
标识符查询
使用唯一表示uid来说查找
{ "query": { "ids": { "type":"book", "values":["1","2","3"] } } }
前缀查询
前缀匹配给定的关键词
{ "query": { "prefix": { "title":"cri" } } }
-
指定权重
{ "query": { "prefix": { "title":{ "value":"cri", "boost":3.0 } } } }
fuzzy查询
使用编辑距离的模糊查询,计算量较大,但是对用户拼写错的场景比较有用
{ "query": { "fuzzy": { "title":"crme" } } }
-
指定最小相似度偏差
{ "query": { "fuzzy": { "title":{ "value":"crme", "min_similarity":1 } } } }
通配符查询
支持*和?等通配符
{ "query": { "wildcard": { "title": "cr?me" } } }
范围查询
只能针对单个字段,可以是数值型的,也可以是基于字符串的。
{ "query": { "range": { "year": { "gte" :1890, "lte":1900 } } } }
正则表达式查询
查询性能取决于正则表达式
{ "query": { "regexp": { "title": { "value" :"cr.m[ae]", "boost":10.0 } } } }
布尔查询( 组合查询 )
{ "query": { "bool": { "must": { "term": { "title": "crime" } }, "should": { "range": { "year": { "from": 1900, "to": 2000 } } }, "must_not": { "term": { "otitle": "nothing" } } } } }
以上是关于记录es的基本使用的主要内容,如果未能解决你的问题,请参考以下文章