记录es的基本使用

Posted 泛舟五湖之间

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记录es的基本使用相关的知识,希望对你有一定的参考价值。

在上一篇的文章中,我们已经将es基本安装好了,并且kibana也已经安装好了,在本章中我们就利用kibana来使用es,
实践一下。主要的版本是
es7.9.3
kibana7.9.3
当然在使用es之前,我们需要新增一批数据进去,为了验证后面的用法而准备的数据。
本篇博客的思路基本就是按照es的基本概念来写的,用法上也是先从集群-->索引-->文档的基本的应用。
更复杂的应用我们会放到后续的博客中。

一、准备es示例数据

首先我们把示例数据下载下来,示例数据被我保存在了gitee中的helloes项目下:
https://gitee.com/xiezuozhen/hello-world/tree/master/HelloEs/src/test/resources/exampledata
可自行下载使用

我这里使用的是accounts.json,其中部分示例数据如下所示:

"index":"_id":"1"
"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"
"index":"_id":"6"
"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"
"index":"_id":"13"
"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"
"index":"_id":"18"
"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"
"index":"_id":"20"
"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"
"index":"_id":"25"
"account_number":25,"balance":40540,"firstname":"Virginia","lastname":"Ayala","age":39,"gender":"F","address":"171 Putnam Avenue","employer":"Filodyne","email":"virginiaayala@filodyne.com","city":"Nicholson","state":"PA"
"index":"_id":"32"
"account_number":32,"balance":48086,"firstname":"Dillard","lastname":"Mcpherson","age":34,"gender":"F","address":"702 Quentin Street","employer":"Quailcom","email":"dillardmcpherson@quailcom.com","city":"Veguita","state":"IN"
"index":"_id":"37"
"account_number":37,"balance":18612,"firstname":"Mcgee","lastname":"Mooney","age":39,"gender":"M","address":"826 Fillmore Place","employer":"Reversus","email":"mcgeemooney@reversus.com","city":"Tooleville","state":"OK"
"index":"_id":"44"
"account_number":44,"balance":34487,"firstname":"Aurelia","lastname":"Harding","age":37,"gender":"M","address":"502 Baycliff Terrace","employer":"Orbalix","email":"aureliaharding@orbalix.com","city":"Yardville","state":"DE"
"index":"_id":"49"
"account_number":49,"balance":29104,"firstname":"Fulton","lastname":"Holt","age":23,"gender":"F","address":"451 Humboldt Street","employer":"Anocha","email":"fultonholt@anocha.com","city":"Sunriver","state":"RI"
"index":"_id":"51"
"account_number":51,"balance":14097,"firstname":"Burton","lastname":"Meyers","age":31,"gender":"F","address":"334 River Street","employer":"Bezal","email":"burtonmeyers@bezal.com","city":"Jacksonburg","state":"MO"
"index":"_id":"56"
"account_number":56,"balance":14992,"firstname":"Josie","lastname":"Nelson","age":32,"gender":"M","address":"857 Tabor Court","employer":"Emtrac","email":"josienelson@emtrac.com","city":"Sunnyside","state":"UT"

当我们拿到示例数据后,我们可以通过进入宿主进行导入,

curl -H 'Content-Type:application/x-ndjson' -XPOST '192.168.47.210:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"

这里需要将accounts.json最后留一个空行,才能成功。
也可以利用kibana的可视化工具导入,不过可视化导入我这里导入的数据有点问题,所以这里就不展示kibana可视化导入了。
现在数据准备完毕,我们可以开始应用了。

二、查看es集群的相关信息的方法

主要分为了两类,一类为_cat,一类为_cluster

一、_cat的用法

我这里仅仅是列举了部分常用的用法,如果想要查看关于cat的其他的用法可以参考官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
GET /_cat/nodes?v&pretty #查看所有节点信息
GET /_cat/shards?v&pretty #查看各shard的详细情况
GET /_cat/master?v&pretty #查看master节点信息
GET /_cat/indices?v&pretty #查看集群中所有index的详细信息
GET /_cat/segments?v&pretty  #查看各index的segment详细信息,包括segment名, 所属shard, 内存(磁盘)占用大小, 是否刷盘
GET /_cat/count?v&pretty  #查看当前集群的doc数量
GET /_cat/recovery?v&pretty #查看集群内每个shard的recovery过程.调整replica。
GET /_cat/health?v&pretty #查看集群当前状态:红、黄、绿
GET /_cat/pending_tasks?v&pretty #查看当前集群的pending task,即挂起的任务
GET /_cat/aliases?v&pretty #查看集群中所有alias信息,路由配置等
GET /_cat/thread_pool?v&pretty #查看集群各节点内部不同类型的threadpool的统计信息,
GET /_cat/plugins?v&pretty #查看集群各个节点上的plugin信息
GET /_cat/fielddata?v&pretty #查看当前集群各个节点的fielddata内存使用情况
GET /_cat/nodeattrs?v&pretty #查看单节点的自定义属性
GET /_cat/repositories?v&pretty #输出集群中注册快照存储库
GET /_cat/templates?v&pretty #输出当前正在存在的模板信息

我选择几个在生产环境中用的比较多的进行讲解:

一、节点信息查询

GET /_cat/nodes:查看所有的节点信息

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.47.210           64          91   2    0.05    0.10     0.13 dilmrt    *      es3
192.168.47.210           58          91   2    0.05    0.10     0.13 dilmrt    -      es1
192.168.47.210           70          91   2    0.05    0.10     0.13 dilmrt    -      es2

heap.percent 堆内存占用百分比
ram.percent 内存占用百分比
cpu CPU占用百分比
master *表示节点是集群中的主节点
name 节点名

二、别名信息查询

GET /_cat/aliases?v&pretty 查询所有的别名信息

alias                   index                          filter routing.index routing.search is_write_index
.kibana-event-log-7.9.3 .kibana-event-log-7.9.3-000001 -      -             -              true
ilm-history-2           ilm-history-2-000001           -      -             -              true
.kibana                 .kibana_1                      -      -             -              -
.security               .security-7                    -      -             -              -
.kibana_task_manager    .kibana_task_manager_1         -      -             -              -

alias 索引的别名
index 索引名
routing.index 索引路由
routing.search 搜索路由
filter 过滤

别名是es中非常重要的一个功能。不仅生产中经常用到,而且面试里面也经常会提起

三、索引信息查询

GET /_cat/indices?v&pretty

health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .security-7                    eY2dGhuKRpKcrnW_1dcAHg   1   1          7            0       51kb         25.5kb
green  open   .kibana-event-log-7.9.3-000001 FR7mcABbTZOWS_rLSnmHPQ   1   1          2            0     21.8kb         10.9kb
green  open   .apm-custom-link               Xwzboc0UTWaAxXIRBFm0jw   1   1          0            0       416b           208b
green  open   .kibana_task_manager_1         6rW9n3yJRuSqZGR44QjlGw   1   1          6         3741        1mb          588kb
green  open   .apm-agent-configuration       aNXRDK5FR0q-ZNbwVx54XA   1   1          0            0       416b           208b
green  open   .async-search                  1Q_H4j1URoucJXkurKelZA   1   1          0            1      6.6kb          3.3kb
green  open   .kibana_1                      EjDRMq5iSEOK6QitQqO4QQ   1   1         74           14     20.9mb         10.4mb
green  open   account                        LbMvFKxOR3mjfT9rflN7JA   1   1       1999            0      743kb        371.5kb

health 索引的健康状态
index 索引名
pri 索引主分片数量
rep 索引复制分片 数
store.size 索引主分片 复制分片 总占用存储空间
pri.store.size 索引总占用空间, 不计算复制分片 占用空间

还有其他的用法就不一一列举了,如果想查询更详细的信息自己可以翻阅官方文档

二、_cluster的用法

其实_cluster和_cat的用法差不多,返回的信息要比_cat详细的多,_cat一般是查看集群的相关信息,_cluster不仅可以查看,而且还可以对集群进行操作。
主要的命令有:

GET _cluster/health 	#查看集群健康状态接口
GET _cluster/state	#查看集群状况接口
GET _cluster/stats	#查看集群统计信息接口
GET _cluster/pending_tasks	#查看集群挂起的任务接口
POST _cluster/reroute	#集群重新路由操作(这个操作一般是在分片失败时使用,可以使用此命令重新尝试分片)
PUT _cluster/settings	#更新集群设置
GET _nodes/stats	#节点状态
GET _nodes	#节点信息
GET _nodes/hot_threads	#节点的热线程
GET nodes/_master/_shutdown	#关闭节点

这里详细列举其中几个即可:

一、集群健康信息

GET _cluster/health


  "cluster_name" : "elasticsearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 9,
  "active_shards" : 18,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0


status:es集群状态

green:所有的主分片和副本父分片都已分配,集群100%可用。
yellow:所有的主分片已经分片,但至少还有一个副本分片是缺失的。不会有数据丢失,所以搜索结果依然是完整的。
不过高可用性在某种程度上被弱化。如果更多的分片消失,就会丢数据。
red:至少一个主分片(以及它的全部副本)都在丢失中。搜索只能返回部分数据,而分配到这个分片的写入请求会返回一个异常。

timed_out:是否超时
number_of_nodes :节点数
number_of_data_nodes:数据节点数
active_primary_shards:集群中所有索引的主分配数
active_shards:集群中所有索引的分片数
relocating_shards:当前正在节点间迁移的分片,该值在正常情况下为0,但在ES集群有节点加入或移除时,集群会发现分片分布不均衡,便会开始进行分片迁移。
initializing_shards:初始化中的分片数。往往在分片刚被创建或节点重启时,会经历短暂的initializing状态。
unassigned_shards:未分配的分片数。表示集群中存在分片但实际又找不到分片,常见于存在未分配的副本,如集群中有1个节点,有一个索引有5个分片1个副本,由于ES的灾备原则,副本分片不能与主分配保存在同一个节点中,那么就会有5个副本分片处于未分配状态。

二、集群状态信息

GET _cluster/state


  "cluster_name" : "elasticsearch-cluster",
  "cluster_uuid" : "uY2HW2E1S4m_Y2N5YSBeJQ",
  "version" : 492,
  "state_uuid" : "sksdPfrxSDKBZTMWJBTTRg",
  "master_node" : "jW8PbSdhTOOpESX13DRBJQ",
  "blocks" :  ,
  "nodes" : ***,
  "metadata" : ***,
  "routing_table" : ***,
  "routing_nodes" : 

由于数据量比较大,所以我这里仅仅是把其中重要的几部分罗列出来了
nodes:节点
metadata:主要是元数据信息,包括集群元信息和索引元信息
routing_table:路由表
routing_nodes:路由节点

三、关于index的一些用法

前面的基本都是对集群的查询,接下来我们学习一下关于索引的增删改查。

一、索引删除功能

DELETE bank 表示删除bank索引
当然还可以批量删除
DELETE bank*就是将以bank开头的索引全部删除

二、索引新增功能

如代码所示,我们新增一个banks索引。

PUT /banks/

    "aliases" :  ,
    "mappings" : 
      "properties" : 
        "account_number" : 
          "type" : "long"
        ,
        "address" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "age" : 
          "type" : "long"
        ,
        "balance" : 
          "type" : "long"
        ,
        "city" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "email" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "employer" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "firstname" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "gender" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "lastname" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "state" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        
      
    ,
    "settings" : 
      "index" : 
        "number_of_shards" : "1",
        "number_of_replicas" : "1"
      
    
  

三、索引修改功能

由于在搜索引擎中,单纯对索引的修改,一般是先删除原来的索引,然后重建索引。

四、索引查询功能

GET bank
查询索引的相关信息
返回结果:


  "bank" : 
    "aliases" :  ,
    "mappings" : 
      "properties" : 
        "account_number" : 
          "type" : "long"
        ,
        "address" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "age" : 
          "type" : "long"
        ,
        "balance" : 
          "type" : "long"
        ,
        "city" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "email" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "employer" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "firstname" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "gender" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "lastname" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        ,
        "state" : 
          "type" : "text",
          "fields" : 
            "keyword" : 
              "type" : "keyword",
              "ignore_above" : 256
            
          
        
      
    ,
    "settings" : 
      "index" : 
        "creation_date" : "1679827093469",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "FbFyX5TdRGC88ZlbvnemmQ",
        "version" : 
          "created" : "7090399"
        ,
        "provided_name" : "bank"
      
    
  

四、关于文档的一些用法

根据id查询索引下的某一个文档

GET bank/_doc/1
返回结果:


  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : 
    "account_number" : 1,
    "balance" : 39225,
    "firstname" : "Amber",
    "lastname" : "Duke",
    "age" : 32,
    "gender" : "M",
    "address" : "880 Holmes Lane",
    "employer" : "Pyrami",
    "email" : "amberduke@pyrami.com",
    "city" : "Brogan",
    "state" : "IL"
  

根据id删除某一个文档

DELETE bank/_doc/1
返回结果:


  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : 
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  ,
  "_seq_no" : 1000,
  "_primary_term" : 1


根据id(自定义id)新增

 PUT bank/_doc/1
 
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"  

返回结果:


  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : 
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  ,
  "_seq_no" : 1001,
  "_primary_term" : 1


根据id进行修改文档

PUT bank/_doc/1

"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber_xzz",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"

我们将firstname=Amber修改为firstname=Amber_xzz,然后提交
返回结果为:


  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : 
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  ,
  "_seq_no" : 1002,
  "_primary_term" : 1

我们这里仅仅是一些简单的使用,后续还会有很多关于查询的功能的使用。敬请期待…

es的基本查询api的使用

基本查询种类

term查询

{ 
    "query": {
        "term": {
            "title": "crime"
        }
    }
}
  • 指定权重

{ 
    "query": {
        "term": {
            "title": {
                "value":"crime",
                "boost":10.0
             }
        }
    }
}
  • 多term查询查询tags中包含novel或book

{ 
    "query": {
        "terms": {
            "tags": ["novel","book"]
        }
    }
}

常用词查询

简单理解就是去除停用词的高权限,分高低频两组去查询,像停用词就是高频的,cutoff_frequency表示低于这个概率的词将出现在低频组中。

{ 
    "query": {
        "common": {
             "title":{
                 "query":"crime and punishment",
                 "cutoff_frequency":0.001
             }
        }
    }
}

match查询( 不支持lucene查询语法 )

查询title包含crime或and或punishment的文档

{ 
    "query": {
        "match": {
            "title": "crime and punishment"
        }
    }
}

operator操作符

要求and或者or匹配文本的分词

{ 
    "query": {
        "match": {
            "title": {
                 "query":"crime and punishment",
                 "operator":"and"
            }
        }
    }
}

短语查询

{ 
    "query": {
        "match_phrase": {
            "title": {
                 "query":"crime  punishment",
                 "slop":1
            }
        }
    }
}

前缀查询

对查询关键词的最后一个词条做前缀匹配

{ 
    "query": {
        "match_phrase_prefix": {
            "title": {
                 "query":"crime  punish",
                 "slop":1,
                 "max_expansions":20
            }
        }
    }
}

multi_match( 针对多个字段查询 )

{ 
    "query": {
        "multi_match": {
             "query":"crime  heller",
             "fields":["title","author"]
        }
    }
}

query_string查询( 支持lucene的查询语法 )

title字段包含crime,且权重为10,也要包含punishment,但是otitle不包含cat,同事author字段包含Fyodor和dostoevsky。

{ 
    "query": {
        "query_string": {
             "query":"title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)",
             "default_field":"title"
        }
    }
}

针对多字段查询

use_dis_max使用最大分查询,max指对于给定的关键词,只有最高分才会包括在最后的文档的评分中,而不是所有包含该词条的所有字段分数之和。

{ 
    "query": {
        "query_string": {
             "query":"crime heller",
             "fields":["title","author"],
              "use_dis_max":true
        }
    }
}

simple_query_string查询

解析出错时不抛异常,丢弃查询无效的部分

{ 
    "query": {
        "simple_query_string": {
             "query":"title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)",
             "default_operator":"or"
        }
    }
}

标识符查询

使用唯一表示uid来说查找

{ 
    "query": {
        "ids": {
             "type":"book",
             "values":["1","2","3"]
        }
    }
}

前缀查询

前缀匹配给定的关键词

{ 
    "query": {
        "prefix": {
             "title":"cri"
        }
    }
}
  • 指定权重

{ 
    "query": {
        "prefix": {
             "title":{
                 "value":"cri",
                 "boost":3.0
             }
        }
    }
}

fuzzy查询

使用编辑距离的模糊查询,计算量较大,但是对用户拼写错的场景比较有用

{ 
    "query": {
        "fuzzy": {
             "title":"crme"
        }
    }
}
  • 指定最小相似度偏差

{ 
    "query": {
        "fuzzy": {
             "title":{
                 "value":"crme",
                 "min_similarity":1
              }
        }
    }
}

通配符查询

支持*和?等通配符

{ 
    "query": {
        "wildcard": {
             "title": "cr?me"
        }
    }
}

范围查询

只能针对单个字段,可以是数值型的,也可以是基于字符串的。

{ 
    "query": {
        "range": {
             "year": {
                  "gte" :1890,
                  "lte":1900
              }
        }
    }
}

正则表达式查询

查询性能取决于正则表达式

{ 
    "query": {
        "regexp": {
             "title": {
                  "value" :"cr.m[ae]",
                  "boost":10.0
              }
        }
    }
}

布尔查询( 组合查询 )

{
    "query": {
        "bool": {
            "must": {
                "term": {
                    "title": "crime"
                }
            }, 
            "should": {
                "range": {
                    "year": {
                        "from": 1900, 
                        "to": 2000
                    }
                }
            }, 
            "must_not": {
                "term": {
                    "otitle": "nothing"
                }
            }
        }
    }
}

 

以上是关于记录es的基本使用的主要内容,如果未能解决你的问题,请参考以下文章

elasticsearch多实例怎么配置

elasticsearch多实例怎么配置

如何设定elasticsearch的启动内存

Elasticsearch入门学习:安装ES7.0.1

elasticsearch多实例怎么配置

ES02# Elasticsearch术语与部署架构梳理