多字段，多词，不匹配query_string

Posted 2023-02-23

技术标签:

【中文标题】多字段，多词，不匹配query_string【英文标题】：Multi-field, multi-word, match without query_string 【发布时间】：2013-03-03 14:44:36 【问题描述】：

我希望能够将多词搜索与多个字段相匹配，其中搜索的每个词都包含在 any 字段的任意组合中。问题是我想避免使用 query_string。

curl -X POST "http://localhost:9200/index/document/1" -d '"id":1,"firstname":"john","middlename":"clark","lastname":"smith"'
curl -X POST "http://localhost:9200/index/document/2" -d '"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"'

我希望搜索“John Smith”仅匹配文档 1。以下查询可以满足我的需要，但我宁愿避免使用 query_string，以防用户通过“OR”、“AND”和其他任何一个高级参数。

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '
  "query": 
    "query_string": 
      "query": "john smith",
      "default_operator": "AND",
      "fields": [
        "firstname",
        "lastname",
        "middlename"
      ]
    
  
'

【问题讨论】：

我一遍又一遍地提出这个问题。很好的常青问题！ 【参考方案1】：

您正在寻找的是 multi-match query，但它的性能并不完全符合您的要求。

比较validate 与multi_match 与query_string 的输出。

multi_match（带有运算符and）将确保所有术语至少存在于一个字段中：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '

   "multi_match" : 
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   

'

# 
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    ,
#    "explanations" : [
#       
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       
#    ],
#    "valid" : true
#

而query_string（使用default_operator AND）将检查每个术语是否存在于至少一个字段中：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '

   "query_string" : 
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   

'

# 
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    ,
#    "explanations" : [
#       
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       
#    ],
#    "valid" : true
#

所以你有几个选择来实现你所追求的：

在使用query_string之前预解析搜索词，删除通配符等内容

预解析搜索词以提取每个词，然后为每个词生成一个multi_match 查询

在名称字段的映射中使用 index_name 将其数据索引到单个字段中，然后您可以将其用于搜索。（就像你自己的自定义 all 字段）：

如下：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '

   "mappings" : 
      "test" : 
         "properties" : 
            "firstname" : 
               "index_name" : "name",
               "type" : "string"
            ,
            "lastname" : 
               "index_name" : "name",
               "type" : "string"
            
         
      
   

'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '

   "firstname" : "john",
   "lastname" : "smith"

'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '

   "query" : 
      "match" : 
         "name" : 
            "operator" : "and",
            "query" : "john smith"
         
      
   

'

# 
#    "hits" : 
#       "hits" : [
#          
#             "_source" : 
#                "firstname" : "john",
#                "lastname" : "smith"
#             ,
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    ,
#    "timed_out" : false,
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    ,
#    "took" : 33
#

但请注意，firstname 和 lastname 不再可独立搜索。这两个字段的数据都已被索引到name。

您可以将multi-fields 与path 参数一起使用，使它们既可以独立搜索也可以一起搜索，如下所示：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '

   "mappings" : 
      "test" : 
         "properties" : 
            "firstname" : 
               "fields" : 
                  "firstname" : 
                     "type" : "string"
                  ,
                  "any_name" : 
                     "type" : "string"
                  
               ,
               "path" : "just_name",
               "type" : "multi_field"
            ,
            "lastname" : 
               "fields" : 
                  "any_name" : 
                     "type" : "string"
                  ,
                  "lastname" : 
                     "type" : "string"
                  
               ,
               "path" : "just_name",
               "type" : "multi_field"
            
         
      
   

'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '

   "firstname" : "john",
   "lastname" : "smith"

'

搜索any_name 字段有效：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '

   "query" : 
      "match" : 
         "any_name" : 
            "operator" : "and",
            "query" : "john smith"
         
      
   

'

# 
#    "hits" : 
#       "hits" : [
#          
#             "_source" : 
#                "firstname" : "john",
#                "lastname" : "smith"
#             ,
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    ,
#    "timed_out" : false,
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    ,
#    "took" : 11
#

在firstname 中搜索john AND smith 不起作用：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '

   "query" : 
      "match" : 
         "firstname" : 
            "operator" : "and",
            "query" : "john smith"
         
      
   

'

# 
#    "hits" : 
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    ,
#    "timed_out" : false,
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    ,
#    "took" : 2
#

但是在firstname 中搜索john 可以正常工作：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '

   "query" : 
      "match" : 
         "firstname" : 
            "operator" : "and",
            "query" : "john"
         
      
   

'

# 
#    "hits" : 
#       "hits" : [
#          
#             "_source" : 
#                "firstname" : "john",
#                "lastname" : "smith"
#             ,
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    ,
#    "timed_out" : false,
#    "_shards" : 
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    ,
#    "took" : 3
#

【讨论】：

我很困惑为什么第一个multi_match 查询会返回一些东西。我假设当您说“所有术语”时，您的意思是“约翰”和“史密斯”。 first_name 字段中并非都存在“john”和“smith”。 last_name 字段中并非都存在“john”和“smith”。【参考方案2】：

如果用户传递“OR”、“AND”和任何其他高级参数，我宁愿避免使用 query_string。

根据我的经验，用反斜杠转义特殊字符是一种简单有效的解决方案。该列表可以在文档http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description 中找到，以及 AND/OR/NOT/TO。

【讨论】：

【参考方案3】：

现在您可以使用cross_fields 输入multi_match

GET /_validate/query?explain

    "query": 
        "multi_match": 
            "query":       "peter smith",
            "type":        "cross_fields", 
            "operator":    "and",
            "fields":      [ "firstname", "lastname", "middlename" ]

跨领域采用以术语为中心的方法。它将所有字段视为一个大字段，并在任何字段中查找每个术语。

但要注意的一点是，如果您希望它以最佳方式工作，则分析的所有字段都应具有相同的分析器（标准、英语等）：

为了使 cross_fields 查询类型发挥最佳作用，所有字段都应有相同的分析仪。共享分析器的字段被分组一起作为混合字段。

如果您包含具有不同分析链的字段，它们将是以与 best_fields 相同的方式添加到查询中。例如，如果我们将标题字段添加到前面的查询中（假设它使用不同的分析仪），解释如下：

(+title:peter +title:smith) ( +blended("peter", fields: [first_name, last_name]) +blend("smith", fields: [first_name, last_name]) )

【讨论】：

【参考方案4】：

我认为“匹配”查询是您正在寻找的：

“匹配族查询不经过“查询解析”过程。它不支持字段名称前缀、通配符或其他“高级”功能。因此，它失败的机会非常小 /不存在，并且在仅分析和运行该文本作为查询行为（这通常是文本搜索框所做的）时提供了出色的行为"

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

【讨论】：

以上是关于多字段，多词，不匹配query_string的主要内容，如果未能解决你的问题，请参考以下文章