elasticsearch操作

Posted 骑台风走

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了elasticsearch操作相关的知识,希望对你有一定的参考价值。

1.倒排索引的介绍

1 倒排索引:对文章进行分词,对每个词建立索引,
由于这样建,会出现索引爆炸,索引索引跟标题建关系,标题再跟文章建索引,如下:
分词---文章建立索引                             |

| 今天(索引)    | (文章1,<2,10>,2) (文章3,<8>,1)       |
| 星期天(索引) | (文章2,<12,25,100>,3)                 |
| 出去玩(索引) | (文章5,<11,24,89>,3)(文章1,<8,19>,2) |

今天出现在哪个文章,出现的位置和出现的次数

2.索引操作(数据库)

2.1 创建索引

PUT ymq

  "settings": 
    "index":
      "number_of_shards":5,
      "number_of_replicas":1
    
  

2.2 查看索引

# 查看单个
GET ymq/_settings
# 查看所有
GET _all/_settings
# 查看特定
GET ymq,ymq2/_settings
# 查看所有
GET _settings

2.3 修改索引(一般不太用,只能用来修改副本数量)

#修改索引副本数量为2  分片的数量一开始就要定好
# 副本数量可以改(有可能会出错)
PUT ymq/_settings

  "number_of_replicas": 2


PUT  _all/_settings

"index": 
  "blocks": 
    "read_only_allow_delete": false
    
  

2.4 删除索引

DELETE ymq

3. 映射管理(类型)(表)

3.1 介绍

在Elasticsearch 6.0.0或更高版本中创建的索引只包含一个mapping type。

在5.x中使用multiple mapping types创建的索引将继续像以前一样在Elasticsearch 6.x中运行。 Mapping types将在Elasticsearch 7.0.0中完全删除

##索引如果不创建,只有插入文档,会自动创建

3.2 创建映射(类型,表)

PUT books

  "mappings": 
    "properties":
      "title":
        "type":"text"
      ,
      "price":
        "type":"integer"
      ,
      "addr":
        "type":"keyword"
      ,
      "company":
        "properties":
          "name":"type":"text",
          "company_addr":"type":"text",
          "employee_count":"type":"integer"
        
      ,
      "publish_date":"type":"date","format":"yyy-MM-dd"
      
    
    
  

3.3 查看映射

GET books/_mapping
GET _all/_mapping

3.4 特殊说明索引映射都不存在,也可以插入文档

PUT ymq2/_doc/1

  "title":"白雪公主和十个小矮人",
  "price":"99",
  "addr":"黑暗森里",
  "publish_date":"2018-05-19",
  "name":"ymq"

4. 文档基本增删查改(一行一行数据)

4.1 插入文档

PUT books/_doc/1

  "title":"大头儿子小偷爸爸",
  "price":100,  
  "addr":"北京天安门",
  "company":
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  ,
  "publish_date":"2019-08-19"


PUT books/_doc/2

  "title":"白雪公主和十个小矮人",
  "price":"99", 
  "addr":"黑暗森里",
  "publish_date":"2018-05-19"


PUT books/_doc/3

  "title":"白雪公主和十个小矮人",
  "price":"99", 
  "addr":"黑暗森里",
  "publish_date":"2018-05-19",
   "name":"lqz"


4.2 查看文档


# 格式:索引名称/默认类型名称/id
GET books/_doc/1

4.3 修改文档两种方式

4.3.1 第一种(不推荐,全部修改)

PUT lqz/_doc/1

  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]

4.3.2 局部修改

POST lqz/_doc/1/_update

  "doc": 
    "desc": "皮肤很safasdfsda黄,武器很长,性格很直",
    "tags": ["很黄","很长", "很直"]
  

4.4 删除文档

DELETE lqz/_doc/4

5. 文档查询

5.1 term与match的区别

5.1.1 介绍

term:是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个

match:查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配,因此相比于term的精确搜索,match是分词匹配搜索

5.1.2 创建索引+映射(无ik)+插入数据

# 创建索引跟映射
PUT lqz

  "settings": 
		"number_of_shards": 5,
		"number_of_replicas": 2
	,
  "mappings": 
    "properties":
      "title":
        "type":"text"
      ,
      "desc":
        "type":"text"
      ,
      "price":
        "type":"integer"
      ,
      "addr":
        "type":"keyword"
      ,
      "company":
        "properties":
          "name":"type":"text",
          "company_addr":"type":"text",
          "employee_count":"type":"integer"
        
      ,
      "publish_date":"type":"date","format":"yyy-MM-dd"
      
    
    
  


# 插入数据

PUT lqz/_doc/1

  "title":"so beautiful zero",
  "price":100,  
  "addr":"北京天安门",
  "desc":"beautiful cat",
  "company":
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  ,
  "publish_date":"2019-08-19"

 
PUT lqz/_doc/2

  "title":"so beautiful one",
  "price":200,  
  "addr":"北京天安门",
  "desc":"beautiful dog",
  "company":
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  ,
  "publish_date":"2019-08-19"



PUT lqz/_doc/3

  "title":"so beautiful tow",
  "price":698,  
  "addr":"北京天安门",
  "desc":"dog",
  "company":
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  ,
  "publish_date":"2019-08-19"

5.2 term

5.2.1 term与terms

term:不会分词,按照指定的词查询

terms:可指定多个词查询

# term查的不会分词
GET lqz/_doc/_search
        
      "query": 
        "term": 
          "desc": "beautiful"
        
      
    
# terms由于部分词,想查多个,terms
GET lqz/_doc/_search
  
    "query": 
      "terms": 
        "title": ["beautiful", "so"]
      
    
  

5.3 match

5.3.1 match和match_all

match:查询相当于模糊匹配,只包含其中一部分关键词就行 

match_all:能够匹配索引中的所有文件。 

match_phrase:短语匹配查询,要求必须全部精确匹配,且顺序必须与指定的短语相同

# match查的短语会分词
GET lqz/_doc/_search
    
      "query": 
        "match_all": 
      
    
  
GET lqz/_doc/_search
    
      "query": 
        "match": 
          "title": "beautiful tow"
        
      
    

5.4 排序查询

不是所有字段都支持排序,只有数字类型,字符串不支持

# 排序查询
# 1.普通查询
GET lqz/_doc/_search

  "query": 
    "match": 
      "addr": "北京天安门"
    
  


# 2.降序
GET lqz/_doc/_search

  "query": 
    "match": 
      "addr": "北京天安门"
    
  ,
  "sort": [
    
      "price": 
        "order": "desc"
      
    
  ]


#3.升序
GET lqz/_doc/_search

  "query": 
    "match": 
      "addr": "北京天安门"
    
  ,
  "sort": [
    
      "price": 
        "order": "asc"
      
    
  ]


# 4.match_all+升序
GET lqz/_doc/_search

  "query": 
    "match_all": 
    
  ,
  "sort": [
    
      "price": 
        "order": "asc"
      
    
  ]

5.5 分页查询

所有的条件都是可插拔的,彼此之间用 , 分割

# 分页
#从第二条开始,取一条

GET lqz/_doc/_search

  "query": 
    "match_all": 
  ,
  "sort": [
    
      "price": 
        "order": "desc"
      
    
  ], 
  "from": 2,
  "size": 2





###注意:对于`elasticsearch`来说,所有的条件都是可插拔的,彼此之间 , 分割
GET lqz/_doc/_search

  "query": 
    "match_all": 
  , 
  "from": 2,
  "size": 2

5.6 布尔查询

  • must:与关系,相当于关系型数据库中的and

  • should:或关系,相当于关系型数据库中的or

  • must_not:非关系,相当于关系型数据库中的not

  • filter:过滤条件。

  • range:条件筛选范围。

  • gt:大于,相当于关系型数据库中的>

  • gte:大于等于,相当于关系型数据库中的>=

  • lt:小于,相当于关系型数据库中的<

  • lte:小于等于,相当于关系型数据库中的<=

##布尔查询之should or条件
GET lqz/_doc/_search

  "query": 
    "bool": 
      "should": [
        
          "match": 
            "addr": "北京天安门"
          
        ,
        
          "match": 
            "desc": "beautiful"
          
        
      ]
    
  






### must_not条件   都不是
GET lqz/_doc/_search

  "query": 
    "bool": 
      "must_not": [
        
          "match": 
            "addr": "北京天安门"
          
        ,
        
          "match": 
            "desc": "beautiful"
          
        ,
        
          "match": 
            "price": 698
          
        
      ]
    
  





###filter,大于小于的条件   gt lt  gte  lte
GET lqz/_doc/_search

  "query": 
    "bool": 
      "must": [
        
          "match": 
            "addr": "北京天安门"
          
        
      ],
      "filter": 
        "range": 
          "price": 
            "lt": 200
          
        
      
    
  



### 范围查询
GET lqz/_doc/_search

  "query": 
    "bool": 
      "must": [
        
          "match": 
            "addr": "北京天安门"
          
        
      ],
      "filter": 
        "range": 
          "price": 
            "gte": 100,
            "lte": 150
          
        
      
    
  

5.7 查询结果过滤


###基本使用
GET lqz/_doc/_search

  "query": 
    "match_all": 
      
  ,
  "_source":["name","age"]



####_source和query是平级的

GET lqz/_doc/_search

  "query": 
    "bool": 
      "must":
        "match":"from":"gu"
      ,
      
      "filter": 
        "range": 
          "age": 
            "lte": 25
          
        
      
    
  ,
  "_source":["name","age"]






5.8 高亮查询(未能高亮)

GET lqz/_doc/_search

  "query": 
    "match": 
      "price": "698"
    
  ,
  "highlight": 
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": 
    "from": 
    
  

5.9 聚合函数


# sum ,avg, max ,min

# select max(age) as my_avg from 表 where from=gu;
GET lqz/_doc/_search

  "query": 
    "match": 
      "from": "gu"
    
  ,
  "aggs": 
    "my_avg": 
      "avg": 
        "field": "age"
      
    
  ,
  "_source": ["name", "age"]


#最大年龄
GET lqz/_doc/_search

  "query": 
    "match": 
      "from": "gu"
    
  ,
  "aggs": 
    "my_max": 
      "max": 
        "field": "age"
      
    
  ,
  "_source": ["name", "age"]


#最小年龄
GET lqz/_doc/_search

  "query": 
    "match": 
      "from": "gu"
    
  ,
  "aggs": 
    "my_min": 
      "min": 
        "field": "age"
      
    
  ,
  "_source": ["name", "age"]


# 总年龄
#最小年龄
GET lqz/_doc/_search

  "query": 
    "match": 
      "from": "gu"
    
  ,
  "aggs": 
    "my_sum": 
      "sum": 
        "field": "age"
      
    
  ,
  "_source": ["name", "age"]




#分组


# 现在我想要查询所有人的年龄段,并且按照`15~20,20~25,25~30`分组,并且算出每组的平均年龄。
GET lqz/_doc/_search

  "size": 0, 
  "query": 
    "match_all": 
  ,
  "aggs": 
    "age_group": 
      "range": 
        "field": "age",
        "ranges": [
          
            "from": 15,
            "to": 20
          ,
          
            "from": 20,
            "to": 25
          ,
          
            "from": 25,
            "to": 30
          
        ]
      
    
  

以上是关于elasticsearch操作的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch:使用处理器对数组进行排序

洛谷P2391 白雪皑皑(并查集)

全文检索-Elasticsearch DSL

elasticsearch 之 排序查询

2020.5.26 习题训练三

2020.5.26 习题训练三