关于elasticsearch(es)中查询minimum_should_match 参数的记录

Posted 2023-02-05 往日不在

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了关于elasticsearch(es)中查询minimum_should_match 参数的记录相关的知识，希望对你有一定的参考价值。

minimum_should_match 参数意思，自行查看官网或者搜索下
直接上结论：以"minimum_should_match" ：“2” 为例
1、搜索分词和文档分词匹配时，重复词语算作一个词语，也就是至少匹配到搜索分词词语中的2个不同的词语
2、分词器直接影响分词结果，可以设置不同的搜索分词器和文档分词器(或者通过fields 实现多分词器对文档分词，在搜索时指定分词器),如下

//搜索分词器和文档分词器设置为不同的分词器
PUT xxxxxxx

    "mappings": 
        "properties": 
            "title": 
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart",
                "fields": 
                    "keyword": 
                        "ignore_above": 256,
                        "type": "keyword"
                    
                
            
        
    

// 通过fields 实现多分词器对文档分词
PUT xxxxxxx

    "mappings": 
        "properties": 
            "title": 
                "type": "text",
                "fields": 
                    "keyword": 
                        "ignore_above": 256,
                        "type": "keyword"
                    ,
                    "spy": 
                        "type": "text",
                        "analyzer": "ik_smart"
                    ,
                    "standard": 
                        "type": "text",
                        "analyzer": "standard"
                    ,
                    "fpy": 
                        "type": "text",
                        "analyzer": "ik_max_word"

3、通过 _analyze 命令查看分词结果，根据搜索内容分词和文档内容分词结果匹配是否至少匹配上2个不同的词语

GET _analyze

  
  "analyzer":"ik_max_word",
  "text": "搜索内容或文档内容"

以下是验证结论 1 ，结论2 3 都是实际使用中的一些技巧

1、 创建索引mapping
 PUT test_search

  "mappings": 
    "properties": 
      "id": 
        "type": "long"
      ,
      "title": 
         "analyzer": "ik_max_word",
        "type": "text",
        "fields": 
          "keyword": 
            "ignore_above": 256,
            "type": "keyword"

2、添加数据
POST /test_search/_doc/4

  "id": 4,
  "title": "蕉叶的蕉是一种叶子"

POST /test_search/_doc/5

  "id": 5,
  "title": "蕉叶的香是一种叶子"

3、搜索
GET test_search/_search

  "explain": false,
  "query": 
    "match": 
      "title": 
        "query": "香蕉",
        "minimum_should_match": "2"
      
    
  ,
  "highlight": 
    "fields": 
     "title":  
    
  

搜索结果

  "took" : 3,
  "timed_out" : false,
  "_shards" : 
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  ,
  "hits" : 
    "total" : 
      "value" : 1,
      "relation" : "eq"
    ,
    "max_score" : 0.6312536,
    "hits" : [
      
        "_index" : "test_search",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.6312536,
        "_source" : 
          "id" : 5,
          "title" : "蕉叶的香是一种叶子"
        ,
        "highlight" : 
          "title" : [
            "<em>蕉</em>叶的<em>香</em>是一种叶子"
          ]
        
      
    ]

4、查看分词匹配情况
GET _analyze

  
  "analyzer":"ik_max_word",
  "text": "香蕉"

分词结果

  "tokens" : [
    
      "token" : "香蕉",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    ,
    
      "token" : "香",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_WORD",
      "position" : 1
    ,
    
      "token" : "蕉",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    
  ]

同理获取到
蕉叶的香是一种叶子的结果，能匹配到 一个蕉和一个香

  "tokens" : [
    
      "token" : "蕉叶",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    ,
    
      "token" : "蕉",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_WORD",
      "position" : 1
    ,
    
      "token" : "叶的",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    ,
    
      "token" : "叶",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 3
    ,
    
      "token" : "的",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 4
    ,
    
      "token" : "香",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    ,
    
      "token" : "是",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 6
    ,
    
      "token" : "一种",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 7
    ,
    
      "token" : "一",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 8
    ,
    
      "token" : "种",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 9
    ,
    
      "token" : "叶子",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 10
    ,
    
      "token" : "叶",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 11
    ,
    
      "token" : "子",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 12
    
  ]

蕉叶的蕉是一种叶子，能匹配到 两个蕉

  "tokens" : [
    
      "token" : "蕉叶",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    ,
    
      "token" : "蕉",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_WORD",
      "position" : 1
    ,
    
      "token" : "叶的",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    ,
    
      "token" : "叶",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 3
    ,
    
      "token" : "的",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 4
    ,
    
      "token" : "蕉",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    ,
    
      "token" : "是",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 6
    ,
    
      "token" : "一种",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 7
    ,
    
      "token" : "一",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 8
    ,
    
      "token" : "种",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 9
    ,
    
      "token" : "叶子",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 10
    ,
    
      "token" : "叶",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 11
    ,
    
      "token" : "子",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 12
    
  ]

“蕉叶的香是一种叶子” 的结果，能匹配到 一个 “蕉“和一个“香“
“蕉叶的蕉是一种叶子” 的结果，能匹配到 两个“蕉“
搜索结果显示只 有 “蕉叶的香是一种叶子”，所以 "minimum_should_match" ："2"  表示是不同的词语的个数

以上是关于关于elasticsearch(es)中查询minimum_should_match 参数的记录的主要内容，如果未能解决你的问题，请参考以下文章