Elasticsearch 常用基本查询

Posted 2020-09-07 笨小孩

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch 常用基本查询相关的知识，希望对你有一定的参考价值。

安装启动很简单，参考官网步骤：https://www.elastic.co/downloads/elasticsearch

　　为了介绍Elasticsearch中的不同查询类型，我们将对带有下列字段的文档进行搜索：title（标题），authors（作者），summary（摘要），release date（发布时间）以及number of reviews（评论数量），首先，让我们创建一个新的索引，并通过bulk API查询文档：　　

　　为了展示Elasticsearch中不同查询的用法，首先在Elasticsearch里面创建了employee相关的documents，每本书主要涉及以下字段： first_name, last_name, age,about,interests,操作如下：

1 curl -XPUT ‘localhost:9200/megacorp/employee/3‘ -d ‘{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about" : "I like to build cabinets", "interests": "forestry" }‘
2 curl -XPUT ‘localhost:9200/megacorp/employee/2‘ -d ‘{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": "music" }‘
3 curl -XPUT ‘localhost:9200/megacorp/employee/1‘ -d ‘{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }‘

1. 基本匹配查询(Basic Match Query)

　　基本匹配查询主要有两种形式：（1）使用Search Lite API，并将所有的搜索参数都通过URL传递；

　　　　　　　　　　　　　　　　（2）使用Elasticsearch DSL，其可以通过传递一个JSON请求来获取结果。下面是在所有的字段中搜索带有"John"的结果

1 curl -XGET ‘localhost:9200/megacorp/employee/_search?q=John‘

如果我们使用Query DSL来展示出上面一样的结果可以这么来写：

curl -XGET ‘localhost:9200/megacorp/_search‘ -d ‘
{
    "query": {
        "multi_match" : {
            "query" : "John",
            "fields" : ["_all"]
        }
    }
}‘

　　其输出和上面使用/_search?q=john的输出一样。上面的multi_match关键字通常在查询多个fields的时候作为match关键字的简写方式。fields属性指定需要查询的字段，如果我们想查询所有的字段，这时候可以使用_all关键字，正如上面的一样。以上两种方式都允许我们指定查询哪些字段。比如，我们想查询interest中出现music的员工，那么我们可以这么查询：

1 curl -XGET ‘localhost:9200/megacorp/employee/_search?q=interests:music‘

　　然而，DSL方式提供了更加灵活的方式来构建更加复杂的查询（我们将在后面看到），甚至指定你想要的返回结果。下面的例子中，我将指定需要返回结果的数量，开始的偏移量（这在分页的情况下非常有用），需要返回document中的哪些字段以及高亮关键字：

curl -XGET ‘localhost:9200/megacorp/employee/_search?pretty‘ -d ‘{"query": { "match" : { "interests" : "music" }},"size": 2,"from": 0,"_source": [ "first_name", "last_name", "interests" ],"highlight": {"fields" : { "interests" : { } } } }‘

　　需要注意的是：对于查询多个关键字，match关键字允许我们使用and操作符来代替默认的or操作符。你也可以指定minimum_should_match操作符来调整返回结果的相关性(tweakrelevance)。

2. Multi-field Search

　　正如我们之前所看到的，想在一个搜索中查询多个 document field （比如使用同一个查询关键字同时在title和summary中查询），你可以使用multi_match查询，使用如下：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "multi_match" : {
            "query" : "rock",
            "fields": ["about", "interests"]
        }
    }
}‘

3. Boosting
　　我们上面使用同一个搜索请求在多个field中查询，你也许想提高某个field的查询权重,在下面的例子中，我们把interests的权重调成3，这样就提高了其在结果中的权重，这样把_id=4的文档相关性大大提高了，如下：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "multi_match" : {
            "query" : "rock",
            "fields": ["about", "interests^3"]
        }
    }
}‘

Boosting不仅仅意味着计算出来的分数(calculated score)直接乘以boost factor，最终的boost value会经过归一化以及其他一些内部的优化

4. Bool Query
　　我们可以在查询条件中使用AND/OR/NOT操作符，这就是布尔查询(Bool Query)。布尔查询可以接受一个must参数(等价于AND)，一个must_not参数(等价于NOT)，以及一个should参数(等价于OR)。比如，我想查询about中出现music或者climb关键字的员工，员工的名字是John，但姓氏不是smith，我们可以这么来查询：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "bool": {
                "must": {
                    "bool" : { 
                        "should": [
                            { "match": { "about": "music" }},
                            { "match": { "about": "climb" }} ] 
                    }
                },
                "must": {
                    "match": { "first_nale": "John" }
                },
                "must_not": {
                    "match": {"last_name": "Smith" }
                }
            }
    }
}‘

5. Fuzzy Queries（模糊查询）

　　模糊查询可以在Match和 Multi-Match查询中使用以便解决拼写的错误，模糊度是基于Levenshteindistance计算与原单词的距离。使用如下：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "multi_match" : {
            "query" : "rock climb",
            "fields": ["about", "interests"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["about", "interests", "first_name"],
    "size": 1
}‘

　　上面我们将fuzziness的值指定为AUTO，其在term的长度大于5的时候相当于指定值为2，然而80%的人拼写错误的编辑距离(edit distance)为1，所有如果你将fuzziness设置为1可能会提高你的搜索性能

6. Wildcard Query(通配符查询)

　　通配符查询允许我们指定一个模式来匹配，而不需要指定完整的trem。?将会匹配如何字符；*将会匹配零个或者多个字符。比如我们想查找所有名字中以J字符开始的记录，我们可以如下使用：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
            "wildcard" : {
                "first_name" : "s*"
            }
        },
        "_source": ["first_name", "last_name"],
    "highlight": {
            "fields" : {
                "first_name" : {}
            }
        }
}‘

7. Regexp Query(正则表达式查询)
　　ElasticSearch还支持正则表达式查询，此方式提供了比通配符查询更加复杂的模式。比如我们先查找作者名字以J字符开头，中间是若干个a-z之间的字符，并且以字符n结束的记录，可以如下查询：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "regexp" : {
            "first_name" : "J[a-z]*n"
        }
    },
    "_source": ["first_name", "age"],
    "highlight": {
        "fields" : {
            "first_name" : {}
        }
    }
}‘

8. Match Phrase Query(匹配短语查询)
　　匹配短语查询要求查询字符串中的trems要么都出现Document中、要么trems按照输入顺序依次出现在结果中。在默认情况下，查询输入的trems必须在搜索字符串紧挨着出现，否则将查询不到。不过我们可以指定slop参数，来控制输入的trems之间有多少个单词仍然能够搜索到，如下所示：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "multi_match": {
            "query": "climb rock",
            "fields": [
                "about",
                "interests"
            ],
            "type": "phrase",
            "slop": 3
        }
    },
    "_source": [
        "title",
        "about",
        "interests"
    ]
}‘

　　从上面的例子可以看出，id为4的document被搜索（about字段里面精确匹配到了climb rock），并且分数比较高；而id为1的document也被搜索到了，虽然其about中的climb和rock单词并不是紧挨着的，但是我们指定了slop属性，所以被搜索到了。如果我们将"slop":3条件删除，那么id为1的文档将不会被搜索到。

9. Match Phrase Prefix Query(匹配短语前缀查询)
　　匹配短语前缀查询可以指定单词的一部分字符前缀即可查询到该单词，和match phrase query一样我们也可以指定slop参数；同时其还支持max_expansions参数限制被匹配到的terms数量来减少资源的使用,使用如下：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "match_phrase_prefix": {
            "summary": {
                "query": "cli ro",
                "slop": 3,
                "max_expansions": 10
            }
        }
    },
    "_source": [
        "about",
        "interests",
        "first_name"
    ]
}‘

10. Query String
　　query_string查询提供了一种手段可以使用一种简洁的方式运行multi_match queries, bool queries, boosting, fuzzy matching, wildcards, regexp以及range queries的组合查询。在下面的例子中，我们运行了一个模糊搜索(fuzzy search)，搜索关键字是search algorithm，并且作者包含grant ingersoll或者tom morton。并且搜索了所有的字段，其中summary字段的权重为2：

curl -XGET ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "query_string" : {
            "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
            "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}‘

11. Simple Query String(简单查询字符串)
　　simple_query_string是query_string的另一种版本，其更适合为用户提供一个搜索框中，因为其使用+/|/- 分别替换AND/OR/NOT，如果用输入了错误的查询，其直接忽略这种情况而不是抛出异常。使用如下：

curl -POST ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "simple_query_string" : {
        "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
        "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}‘

12. Term/Terms Query
　　前面的例子中我们已经介绍了全文搜索(full-text search)，但有时候我们对结构化搜索中能够精确匹配并返回搜索结果更感兴趣。这种情况下我们可以使用term和terms查询。在下面例子中，我们想搜索所有兴趣中有music的人：

curl -POST ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "term" : {
            "interests": "music"
        }
    },
    "_source" : ["first_name","last_name","interests"]
}‘

我们还可以使用terms关键字来指定多个terms，如下：

{
    "query": {
        "terms" : {
            "publisher": ["oreilly", "packt"]
        }
    }
}

13. Term Query - Sorted

　　查询结果和其他查询结果一样可以很容易地对其进行排序，而且我们可以对输出结果按照多层进行排序：

curl -XPOST ‘localhost:9200/megacorp/employee/_search‘ -d ‘
{
    "query": {
        "term" : {
            "interests": "music"
        }
    },
    "_source" : ["interests","first_name","about"],
    "sort": [
        { "publish_date": {"order":"desc"}},
        { "id": { "order": "desc" }}
    ]
}‘

14. Range Query(范围查询)
另一种结构化查询就是范围查询。在下面例子中，我们搜索所有发行年份为2015的图书：

curl -XPOST ‘localhost:9200/person/worker/_search?pretty‘ -d ‘
{
    "query": {
        "range" : {
            "birthday": {
                "gte": "2017-02-01",
                "lte": "2017-05-01"
            }
        }
    },
    "_source" : ["first_name","last_name","birthday"]
}‘

范围查询可以应用于日期，数字以及字符类型的字段。

以上是关于Elasticsearch 常用基本查询的主要内容，如果未能解决你的问题，请参考以下文章

ElasticSearch学习问题记录——Invalid shift value in prefixCoded bytes (is encoded value really an INT?)(代码片段

ElasticSearch-常用搜索

Java操作Elasticsearch6实现基本查询

elasticsearch代码片段，及工具类SearchEsUtil.java

Elasticsearch笔记九之优化

常用ElasticSearch 查询语句