hadoop生态--ElasticSearch--ES操作

Posted 2021-12-04 jing-wang

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hadoop生态--ElasticSearch--ES操作相关的知识，希望对你有一定的参考价值。

抛开细节不提，记住ES就是一个数据库（只是整个数据库有些特殊，话说回来，哪个数据库没点自己的特点呢:) ），所以很多ES的中的概念我们可以类比普通的数据库来帮助理解和记忆，为了学习这个数据库呢，我们需要先了解几个概念

一、ES中几个重要概念：

索引：Index　　注意不是luence中的索引的概念，相当于数据库中的DataBase ，在ES中建一个索引，可以类比为在mysql中创建了一个database

类型：Type　　相当于数据库中的table

主键：id　　相当于数据库中的主键

所以，向ES中存储数据，就是往ES中的Index下的Type中存储数据，数据格式为JSON。

那么如何对这个数据库进行操作呢？建索引、存数据，查数据…… ES提供了多种方式：如RESTFul风格api、java api等。

二、RESTFul风格API

这种方式就是通过http形式发送请求，对ES进行操作。

接口URL格式：

http://host_IP:9200/<Index>/<Type>/[<id>]

index和type必须提供，id是可选的，不提供es会自动生成。index、type将信息进行分层，利于管理。index可以理解为数据库；type理解为数据表；id相当于数据库表中记录的主键，是唯一的。

查询的请求方式　　GET

#在linux中通过curl的方式查询
curl -XGET ‘http://192.168.10.18:9200/store/books/1‘

# 通过_source获取指定的字段
curl -XGET ‘http://192.168.10.16:9200/store/books/1?_source=title‘
curl -XGET ‘http://192.168.10.16:9200/store/books/1?_source=title,price‘
curl -XGET ‘http://192.168.10.16:9200/store/books/1?_source‘

删除　　DELETE

#删除一个文档
curl -XDELETE ‘http://192.168.10.16:9200/store/books/1‘

添加　　PUT/POST

#向store索引中添加一些书籍，如果数据已经存在，会通过覆盖的方式对数据进行更新
curl -XPUT ‘http://192.168.10.16:9200/store/books/1‘ -d ‘{
  "title": "Elasticsearch: The Definitive Guide",
  "name" : {
    "first" : "Zachary",
    "last" : "Tong"
  },
  "publish_date":"2015-02-06",
  "price":"49.99"
}‘

修改　　PUT/POST　　

#可以通过覆盖的方式更新
curl -H "Content-Type: application/json" -XPUT ‘http://192.168.10.16:9200/store/books/1‘ -d ‘{

  "title": "Elasticsearch: The Definitive Guide",
  "name" : {
    "first" : "Zachary",
    "last" : "Tong"
  },
  "publish_date":"2016-02-06",
  "price":"99.99"
}‘

# 或者通过 _update  API的方式单独更新你想要更新的
curl -XPOST ‘http://192.168.10.16:9200/store/books/1/_update‘ -d ‘{
  "doc": {
     "price" : 88.88
  }
}‘

创建索引

#创建索引名字叫news
curl -XPUT http://192.168.100.211:9200/news

为索引创建mapping

创建mapping（相当于数据中的schema信息，约束type表名和字段名以及字段的类型）

curl -XPOST http://192.168.100.211:9200/news/fulltext/_mapping -d‘
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }
}‘

curl -XPUT ‘https://192.168.100.211:9200/iktest?pretty‘ -d ‘{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik_max_word"
                }
            }
        }
    },
    "mappings" : {
        "article" : {
            "dynamic" : true,
            "properties" : {
                "subject" : {
                    "type" : "string",
                    "analyzer" : "ik_max_word"
                }
            }
        }
    }
}‘

添加文本内容

curl -H "Content-Type: application/json" -XPOST http://192.168.1.237:9200/news/fulltext/1 -d‘
{"content":"美国留给伊拉克的是个烂摊子吗"}‘

curl -H "Content-Type: application/json" -XPOST http://192.168.1.237:9200/news/fulltext/2 -d‘
{"content":"公安部：各地校车将享最高路权"}‘

curl -H "Content-Type: application/json" -XPOST http://192.168.1.237:9200/news/fulltext/3 -d‘
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}‘

curl -H "Content-Type: application/json" -XPOST http://192.168.1.237:9200/news/fulltext/4 -d‘
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}‘

curl -H "Content-Type: application/json" -XPOST http://192.168.1.2371:9200/news/fulltext/_search  -d‘
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<font color=‘red‘>", "<tag2>"],
        "post_tags" : ["</font>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}‘

发起分词请求

指定最大分词或智能分词

curl -XGET ‘http://192.168.100.211:9200/_analyze?pretty&analyzer=ik_max_word‘ -d ‘联想是全球最大的笔记本厂商‘

curl -XGET ‘https://192.168.100.211:9200/_analyze?pretty&analyzer=ik_smart‘ -d ‘联想是全球最大的笔记本厂商‘

三、ES查询

Hats off to the shares

作者：2zju
来源：CSDN
原文：https://blog.csdn.net/ifenggege/article/details/86103918

---------------------
作者：梧桐和风
来源：CSDN
原文：https://blog.csdn.net/wthfeng/article/details/53001218

https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html

-------------------------------------------------------------------------------------

准确来说，ES中的查询操作分为2种：查询（query）和过滤（filter）。查询即是之前提到的query查询，它（查询）默认会计算每个返回文档的得分，然后根据得分排序。而过滤（filter）只会筛选出符合的文档，并不计算得分，且它可以缓存文档。所以，单从性能考虑，过滤比查询更快。

换句话说，过滤适合在大范围筛选数据，而查询则适合精确匹配数据。一般应用时，应先使用过滤操作过滤数据，然后使用查询匹配数据。

3-1 term query 和 terms query

查询的字段只有一个值的时候，应该使用term而不是terms，在查询字段包含多个的时候才使用terms，使用terms语法，json中必须包含数组。

match在匹配时会对所查找的关键词进行分词，然后按分词匹配查找，而term会直接对关键词进行查找。一般模糊查找的时候，多用match，而精确查找时可以使用term
3-2 match query
1、match

2、match_all

3、multi_match 多字段查询

如果想要再一个搜索中查询多个字段，比如使用一个查询关键字同时在title和summery中查询，可以使用multi_match查询。

#name和author都必须包含Guide，并且价钱等于33.99或者188.99
curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "operator": "and",
                    "fields": [
                        "name",
                        "author"
                    ],
                    "query": "Guide"
                }
            },
            "filter": {
                "terms": {
                    "price": [
                        35.99,
                        188.99
                    ]
                }
            }
        }
    }
}‘

3-3 bool query

bool查询包含must，should，must_not，用于连接多个查询条件

格式如下：
{
　　"bool" : {
　　　　"must" : [],
　　　　"should" : [],
　　　　"must_not" : [],
　　}
}

must: 条件必须满足，相当于 and
should: 条件可以满足也可以不满足，相当于 or
must_not: 条件不需要满足，相当于 not

SELECT * FROM books WHERE (price = 35.99 OR price = 99.99) AND publish_date != "2016-02-06"

curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
  "query" : {
    "bool" : {
      "should" : [
        { "term" : {"price" : 35.99}},
        { "term" : {"price" : 99.99}}
      ],
      "must_not" : {
        "term" : {"publish_date" : "2016-02-06"}
      }
    }
  }
}‘

技术图片

3-1 term query 和 terms query

查询的字段只有一个值得时候，应该使用term而不是terms，在查询字段包含多个的时候才使用terms，使用terms语法，json中必须包含数组。

match在匹配时会对所查找的关键词进行分词，然后按分词匹配查找，而term会直接对关键词进行查找。一般模糊查找的时候，多用match，而精确查找时可以使用term

3-2 match query  
1、match

2、match_all

3、multi_match 多条件查询

如果想要再一个搜索中查询多个字段，比如使用一个查询关键字同时在title和summery中查询，可以使用multi_match查询。

3-3 bool query

bool查询包含四个子句，must，filter，should，must_not



 

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "_type": {
                            "value": "age"
                        }
                    }
                },
                {
                    "term": {
                        "account_grade": {
                            "value": "23"
                        }
                    }
                }
            ]
        }
    }
}
 

{
    "bool":{
            "must":{
                "term":{"user":"lucy"}
            },
            "filter":{
                "term":{"tag":"teach"}    
            },
            "should":[
                  {"term":{"tag":"wow"}},
                {"term":{"tag":"elasticsearch"}}
            ],
               "mininum_should_match":1,
               "boost":1.0                      
        }
}

boolquery

技术图片

{
    "bool":{
            "must":{
                "term":{"user":"lucy"}
            },
            "filter":{
                "term":{"tag":"teach"}    
            },
            "should":[
                  {"term":{"tag":"wow"}},
                {"term":{"tag":"elasticsearch"}}
            ],
               "mininum_should_match":1,
               "boost":1.0                      
        }
}

View Code

3-4 Filter

query和filter的区别：query查询的时候，会先比较查询条件，然后计算分值，最后返回文档结果；而filter是先判断是否满足查询条件，如果不满足会缓存查询结果（记录该文档不满足结果），满足的话，就直接缓存结果。

SELECT * FROM books WHERE price = 35.99
filtered 查询价格是35.99的

# 返回的得分是1.0
curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "price": 35.99
        }
      }
    }
  }
}‘

# 返回的得分是1.0
curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "price": 35.99
        }
      }
    }
  }
}‘

# 返回的得分是0.0
curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query": {
        "bool": {
           "filter" : {
                "term" : {
                  "price" : 35.99
                }
            }
        }
    }
}‘

指定多个值，or的关系

#指定多个值
curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query" : {
        "bool" : {
            "filter" : {
                "terms" : {
                    "price" : [35.99, 99.99]
                  }
              }
        }
    }
}‘

curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query" : {
        "bool" : {
            "must": {
                "match_all": {}
            },
            "filter" : {
                "terms" : {
                    "price" : [35.99, 99.99]
                  }
              }
        }
    }
}‘

3-5 嵌套查询

# SELECT * FROM books WHERE price = 35.99 OR ( publish_date = "2016-02-06" AND price = 99.99 )

curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "price": 35.99
                    }
                },
                {
                    "bool": {
                        "must": [
                            {
                                "term": {
                                    "publish_date": "2016-02-06"
                                }
                            },
                            {
                                "term": {
                                    "price": 99.99
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}‘

3-6 range范围过滤

range范围过滤
gt : > 大于
lt : < 小于
gte : >= 大于等于
lte : <= 小于等于

SELECT * FROM books WHERE price >= 10 AND price < 99

curl -XGET ‘http://192.168.10.16:9200/store/books/_search‘ -d ‘{
    "query": {
        "range" : {
            "price" : {
                "gte" : 10,
                "lt" : 99
            }
        }
    }
}

以上是关于hadoop生态--ElasticSearch--ES操作的主要内容，如果未能解决你的问题，请参考以下文章