分布式搜索引擎ElasticSearch之高级运用

Posted mirson

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了分布式搜索引擎ElasticSearch之高级运用相关的知识,希望对你有一定的参考价值。

一、过滤查询(分页、模糊、filter)

1. 搜索符合匹配条件的信息:

创建数据:

PUT account/_doc/1
{ "account": 10001, "balance": 10000, "name": "test1"} 

PUT account/_doc/2
{ "account": 10002, "balance": 20000, "name": "test2"} 

PUT account/_doc/3
{ "account": 10003, "balance": 30000, "name": "张三"} 

PUT account/_doc/4
{ "account": 10004, "balance": 30000, "name": "王五"} 

根据账号编号查找:

GET /account/_search 
{
  "query": { 
    "match": {
      "accountNo": "10001"
    }
  }
}

返回结果:

{
  ...
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "account" : 10002,
          "balance" : 20000,
          "name" : "test2"
        }
      }
    ]
  }
  ...
}

匹配成功,返回所要查询的数据。

2. 支持分页查询:

GET /account/_search 
{
  "query": { 
    "match_all": {}
  },
  "from": 0,
  "size": 2
}

能够返回2条数据。

"hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account" : 10001,
          "balance" : 10000,
          "name" : "test1"
        }
      },
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "account" : 10002,
          "balance" : 20000,
          "name" : "test2"
        }
      }
    ]
  }

3. 模糊查询:

数值类型不利于模糊匹配, 这里通过字符类型进行测试:

GET /account/_search 
{
  "query": { 
    "match": {
      "name": "三四"
    }
  }
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "account",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "accountNo" : 10009,
          "balance" : 1000000,
          "name" : "张三"
        }
      }
    ]
  }
}

注意, 这里默认会采用单个汉字分词, 所查询的关键字“三四”会拆成“三”和“四”进行模糊匹配。

4. filter过滤查询:

GET /account/_search 
{
  "query": { 
    "bool": {
      "filter": [
        {
          "term": {
            "name": "张三"
          }
        }
      ]
    }
  }
}

term是精准查询, 代表完全匹配, 不需要查询评分计算。

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

可以看到没有匹配到任何结果,因为term是拿整个词“张三”进行匹配, 而ES默认是做单字分词, 将“张三”划分为了“张”和“三”, 所以匹配不到结果。

二、bool查询(should、must)

  1. should查询: 只要其中一个为true则成立。

    GET /movies/_search
    {
      "query":{
       "bool": {
         "must": [
           {"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }
  2. must查询: 必须所有条件都成立。

    GET /movies/_search
    {
    
      "query":{
       "bool": {
         "must": [
           {"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }
  3. must_not查询:必须所有条件都不成立。

    GET /movies/_search
    {
    
      "query":{
       "bool": {
         "must_not": [
           {"match": {"title": "good hearts sea"}},
           {"match": {"overview": "good hearts sea"}}
         ]
       }
      }
    }

三、聚合查询操作(aggs)

  1. 根据用户的资金balance来做分组统计:

    GET /account/_search 
    {
      "query": { 
        "bool": {
          "filter": [
            {
              "range": {
                "account": {
                  "gte": 10001
                }
              }
            }
          ]
        }
      },
      "sort": [
        {
          "balance": {
            "order": "desc"
          }
        }
      ],
      "aggs":{
        "group_by_balance": {
          "terms": {
            "field": "balance"
          }
        }
      }
    }

    找出账户编号大于等于10001的数据, 根据balance做倒序排列,采用aggs根据balance做分组汇总统计:

    "aggregations" : {
        "group_by_balance" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 30000,
              "doc_count" : 2
            },
            {
              "key" : 10000,
              "doc_count" : 1
            },
            {
              "key" : 20000,
              "doc_count" : 1
            }
          ]
        }
      }

    可以看到, 最后会输出分组统计的汇总信息。


本文由mirson创作分享,如需进一步交流,请加QQ群:19310171或访问www.softart.cn

以上是关于分布式搜索引擎ElasticSearch之高级运用的主要内容,如果未能解决你的问题,请参考以下文章

分布式爬虫之elasticsearch基础1

Day121.ElasticSearch:概述安装基本操作DSL高级查询

分布式全文检索引擎之ElasticSearch

分布式全文检索引擎之ElasticSearch

干货 | 分布式全文检索引擎之ElasticSearch

3.高并发教程-基础篇-之分布式全文搜索引擎elasticsearch的搭建