Elasticsearch Search API

Posted 2021-01-27 狂乱的贵公子

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch Search API相关的知识，希望对你有一定的参考价值。

当执行一个搜索时，它将这个搜索请求广播给所有的索引分片。可以通过提供路由参数来控制要搜索哪些分片。例如，当检索tweets这个索引时，路由参数可以设置为用户名：

curl -X POST "localhost:9200/twitter/_search?routing=kimchy" -H \'Content-Type: application/json\' -d\'
{
    "query": {
        "bool" : {
            "must" : {
                "query_string" : {
                    "query" : "some query string here"
                }
            },
            "filter" : {
                "term" : { "user" : "kimchy" }
            }
        }
    }
}
\'

1. Search

查询可以提供一个简单的查询字符串作为参数，也可以用一个请求体。

1.1. URI Search

这种方式用的很少，就不细说了，举个例子吧：

curl -X GET "localhost:9200/product/_search?q=category:honor&sort=price:asc"

1.2. Request Body Search

同样，举个例子：

curl -X GET "localhost:9200/twitter/_search" -H \'Content-Type: application/json\' -d\'
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

1.2.1. Query

可以用 Query DSL 定义一个query

curl -X GET "localhost:9200/_search" -H \'Content-Type: application/json\' -d\'
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

1.2.2. From / Size

通过 from 和 size 参数，可以分页查询。from 表示从第几条开始取，size 表示最多取多少条。from默认值是0，size默认值是10

curl -X GET "localhost:9200/product/_search" -H \'Content-Type: application/json\' -d\'
{
    "from" : 0, 
    "size" : 10,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

1.2.3. Sort

可以按一个或多个字段排序

有一些特殊的排序字段：_score 表示按分数排序，_doc 表示按索引顺序排序

假设有这样一个索引：

curl -X PUT "localhost:9200/my_index" -H \'Content-Type: application/json\' -d\'
{
    "mappings": {
        "_doc": {
            "properties": {
                "post_date": { "type": "date" },
                "user": {
                    "type": "keyword"
                },
                "name": {
                    "type": "keyword"
                },
                "age": { "type": "integer" }
            }
        }
    }
}
\'

针对这个索引，我们这样来查询：

curl -X GET "localhost:9200/my_index/_search" -H \'Content-Type: application/json\' -d\'
{
    "sort" : [
        { "post_date" : {"order" : "asc"}},
        "user",
        { "name" : "desc" },
        { "age" : "desc" },
        "_score"
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

这个例子，依次按照 post_date升序、user升序、name降序、age降序、分数升序排序

（PS：_doc是最有效的排序，如果不关心文档的返回顺序的话）

Elasticsearch支持按数组或者多值字段排序，mode选项用来控制基于数组中的那个值来对文档进行排序。mode选项的可选值有：

min ：最小值
max ：最大值
sum ：用所有值的和来作为排序值
avg ：用所有值的平均值作为排序值
median ：用所有值的中间值作为排序值

举个例子：

curl -X PUT "localhost:9200/my_index/_doc/1?refresh" -H \'Content-Type: application/json\' -d\'
{
   "product": "chocolate",
   "price": [20, 4]
}
\'
curl -X POST "localhost:9200/_search" -H \'Content-Type: application/json\' -d\'
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}
\'

什么意思呢？也就说，字段的值可能是一个数组，或者该字段值有多个，那么当我们按这种字段排序的时候就必须确定在排序的时候这个字段的值是什么，也就是该字段的排序值

所谓的mode选项就是用来确定这种字段的最终排序值的，比如：取字段值数组中最小的那个值作为该字段的排序值，或者取最大、或者平均值等等

上面的例子中，price字段值是一个数组，数组有两个元素，后面的查询指定的mode是avg，意味着price字段在排序的时候它的排序值是 (20+4)/2=12

上例中，对结果集按price字段升序排序，price字段的排序值是price字段值求平均

Mission

mission 参数用于指定当文档没有这个字段时该如何处理，可选值是：_last 和 _first ，默认是 _last

类似于关系型数据库中字段为NULL的记录都放在最后

curl -X GET "localhost:9200/_search" -H \'Content-Type: application/json\' -d\'
{
    "sort" : [
        { "price" : {"missing" : "_last"} }
    ],
    "query" : {
        "term" : { "product" : "chocolate" }
    }
}
\'

1.2.4. Source filtering

可以控制 _source 字段怎样返回

默认返回 _source字段的内容，当然你可以设置不返回该字段，例如：

curl -X GET "localhost:9200/_search" -H \'Content-Type: application/json\' -d\'
{
    "_source": false,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

正常情况下，返回是这样的：

{
        "_index" : "product",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "productName" : "Honor Note10",
          "category" : "Honor",
          "price" : 2499
        }
}

禁用后是这样的：

{
        "_index" : "product",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0
}

还可以用通配符，以进一步控制_source中返回那些字段：

curl -X GET "localhost:9200/product/_search?pretty" -H \'Content-Type: application/json\' -d\'
{
    "_source": "product*",
    "query" : {
        "match_all" : {}
    }
}
\'

或者

curl -X GET "localhost:9200/product/_search?pretty" -H \'Content-Type: application/json\' -d\'
{
    "_source": ["product*", "abc*"],
    "query" : {
        "match_all" : {}
    }
}
\'

1.2.5. 高亮

curl -X GET "localhost:9200/product/_search?pretty" -H \'Content-Type: application/json\' -d\'
{
    "query" : {
        "match" : { "category" : "MI" }
    },
    "highlight" : {
        "fields" : {
            "productName": {}
        }
    }
}
\'

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

1.2.6. Explain

执行计划可以看到分数是怎样计算出来的

curl -X GET "localhost:9200/_search" -H \'Content-Type: application/json\' -d\'
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
\'

1.3. Count

curl -X GET "localhost:9200/product/_doc/_count?pretty&q=category:honor"

curl -X GET "localhost:9200/product/_doc/_count?pretty" -H \'Content-Type: application/json\' -d\'
{
    "query" : {
        "term" : { "category" : "honor" }
    }
}
\'

{
  "count" : 3,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  }
}

2. Aggregations

相当于关系型数据库中的聚集函数（或者叫聚合函数）

聚合可以嵌套！聚合可以嵌套！！聚合可以嵌套！！！

聚合主要有4种类型：

Bucketing
Mertric
Matrix
Pipeline

基本的聚合结构是这样的：

aggregations 是一个JSON对象，它代表一个聚合。（PS：这个关键字也可以用 aggs )

每个聚合都关联了一个逻辑名称（例如：如果聚合计算平均价格，那么在这个场景下我可以给这个聚合起个名字叫“avg_price”）
在响应结果中，这些逻辑名称用于唯一标识一个聚合
每个聚合都有一个指定的类型（比如：sum ，avg ，max ，min 等等）
每个聚合类型都定义了自己的body

2.1. Metrics Aggregations

这种类型的聚合是基于以某种方式从聚合的文档中提取的值来计算度量。这个值通常取自文档的字段值，也可以通过脚本计算得到的。

数值度量聚合是一种特殊的度量聚合，它输出数值。根据输出值的多少，分为单值数值度量聚合（比如：avg）和多值数值度量聚合（比如：stats）。

2.1.1. Avg

从文档的数值字段中提取值进行计算

假设，我们的文档是学生成绩（0~100），我们可以求平均分数：

curl -X POST "localhost:9200/exams/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs":{
        "avg_grade":{
            "avg":{
                "field":"grade"
            }
        }
    }
}
\'

上面的聚合例子，计算所有学生的平均成绩。这里的聚合类型是avg，field指定哪个字段用于计算。

再来一个例子：

请求：
curl -X POST "localhost:9200/product/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs":{
        "avg_price":{
            "avg":{
                "field":"price"
            }
        }
    }
}
\'

响应：
{
    "took":13,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":7,
        "max_score":0,
        "hits":[

        ]
    },
    "aggregations":{
        "avg_price":{
            "value":2341.5714285714284
        }
    }
}

默认情况下，没有那个字段的文档将被忽略（PS：就像关系型数据库中求平均值时会忽略NULL的记录一样），我们可以给它指定一个值，例如：

curl -X POST "localhost:9200/exams/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs" : {
        "grade_avg" : {
            "avg" : {
                "field" : "grade",
                "missing": 10 
            }
        }
    }
}
\'

如果文档没有grade字段，那么用10作为该字段值参与计算

2.1.2. Sum

从文档的数值字段中提取值进行计算

请求：
curl -X POST "localhost:9200/product/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "query":{
        "constant_score":{
            "filter":{
                "match":{
                    "category":"vivo"
                }
            }
        }
    },
    "aggs":{
        "vivo_prices":{
            "sum":{
                "field":"price"
            }
        }
    }
}
\'

响应：
{
    "took":3,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":0,
        "hits":[

        ]
    },
    "aggregations":{
        "vivo_prices":{
            "value":3796
        }
    }
}

求category字段值匹配vivo的商品的价格总和

相当于，select sum(price) from product where category like \'%vivo%\' group by category

2.1.3. Max

从文档的数值字段中提取值进行计算

curl -X POST "localhost:9200/sales/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs" : {
        "max_price" : { 
            "max" : { 
                "field" : "price" 
            } 
        }
    }
}
\'

2.1.4. Stats

这是一个多值聚合，它返回 min ，max ，sum ，count ，avg 的组合结果

curl -X POST "localhost:9200/exams/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs" : {
        "grades_stats" : { 
            "stats" : { 
                "field" : "grade" 
            } 
        }
    }
}
\'

它的返回可能是这样的：

{
    ...

    "aggregations": {
        "grades_stats": {
            "count": 2,
            "min": 50.0,
            "max": 100.0,
            "avg": 75.0,
            "sum": 150.0
        }
    }
}

再来一个例子：

请求：
curl -X POST "localhost:9200/product/_search?size=0" -H \'Content-Type: application/json\' -d\'
{
    "aggs" : {
        "product_stats" : { 
            "stats" : { 
                "field" : "price" 
            } 
        }
    }
}
\'

响应：
{
    "took":4,
    "timed_out":false,
    

   
 (c)2006-2024 SYSTEM All Rights Reserved  IT常识