Elasticsearch 自动完成或按令牌自动建议

Posted

技术标签:

【中文标题】Elasticsearch 自动完成或按令牌自动建议【英文标题】:Elasticsearch autocomplete or autosuggest by token 【发布时间】:2014-08-15 00:12:38 【问题描述】:

我想就如何根据标记完成一个术语提出建议,类似于 google 之类的自动完成功能,但仅使用一个标记或单词。

我想搜索将被标记的文件名。例如。 “BRAND_Connect_A1233.jpg”被标记为“brand”、“connect”、“a1234”和“jpg”。

现在我想请教一些建议,例如“骗局”。 该建议应提供完整的匹配令牌,而不是完整的文件名:

连接 轮廓 概念 ...

“A12”的建议应该是“A1234”、“A1233”、“A1233”...

示例

使用查询、构面和过滤器可以正常工作。

首先我创建了一个包含分词器和过滤器的映射:

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '

   "settings" : 
      "analysis" : 
         "analyzer" : 
            "filename_search" : 
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            ,
            "filename_index" : 
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            
         ,
         "tokenizer" : 
            "filename" : 
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            
         ,
         "filter" : 
            "edge_ngram" : 
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            
         
      
   ,
   "mappings" : 
      "file" : 
         "properties" : 
            "filename" : 
               "type" : "string",
               "search_analyzer" : "filename_search",
               "index_analyzer" : "filename_index"
            
         
      
   
'

两种分析器都运行良好:

curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'

现在我添加了一些示例数据

curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConnectBlue_A1234.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_Connect_A1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConceptSpace_A1244.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Connect_A1222.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Concept_A1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Connect_B1234_.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Contour21_B1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_ConceptCube_B2233.jpg"'
curl -X POST "localhost:9200/files/_refresh"

获得所需建议的各种方法都无法达到预期结果。我曾尝试命名分析器并尝试分析器和通配符的各种组合。

curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '
    "text" : "con",
    "simple_phrase" : 
      "phrase" : 
        "field" : "filename",
        "size" : 15,
        "real_word_error_likelihood" : 0.75,
        "max_errors" : 0.1,
        "gram_size" : 3
      
    
'
curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '
    "my-suggestion" : 
    "text" : "con",
    "term" : 
        "field" : "filename",
        "analyzer": "filename_index"
        
    
'

【问题讨论】:

【参考方案1】:

您需要添加一个特殊的映射来使用完成建议,如文档 in the official ElasticSearch docs 所述。我已经修改了你的例子来展示它是如何工作的。

首先创建索引。注意filename_suggest 映射。

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '

   "settings" : 
      "analysis" : 
         "analyzer" : 
            "filename_search" : 
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            ,
            "filename_index" : 
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            
         ,
         "tokenizer" : 
            "filename" : 
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            
         ,
         "filter" : 
            "edge_ngram" : 
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            
         
      
   ,
   "mappings" : 
      "file" : 
         "properties" : 
            "filename" : 
               "type" : "string",
               "analyzer": "filename_index",
               "search_analyzer" : "filename_search"
            ,
            "filename_suggest": 
              "type": "completion",
              "analyzer": "simple",
              "search_analyzer": "simple",
              "payloads": true
            
         
      
   
'

添加一些数据。请注意 filename_suggest 如何具有 input 字段,其中包含要匹配的关键字。

curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConnectBlue_A1234.jpg", "filename_suggest":  "input": ["BRAND", "ConnectBlue", "A1234", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_Connect_A1233.jpg", "filename_suggest":  "input": ["BRAND", "Connect", "A1233", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConceptSpace_A1244.jpg", "filename_suggest":  "input": ["BRAND", "ConceptSpace", "A1244", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Connect_A1222.jpg", "filename_suggest":  "input": ["COMPANY", "Connect", "A1222", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Concept_A1233.jpg", "filename_suggest":  "input": ["COMPANY", "Concept", "A1233", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Connect_B1234_.jpg", "filename_suggest":  "input": ["DEALER", "Connect", "B1234", "jpg"], "payload":   '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Contour21_B1233.jpg", "filename_suggest":  "input": ["DEALER", "Contour21", "B1233", "jpg"], "payload":  '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_ConceptCube_B2233.jpg", "filename_suggest":  "input": ["DEALER", "ConceptCube", "B2233", "jpg"], "payload":  '
curl -X POST "localhost:9200/files/_refresh"

现在执行查询:

curl -XPOST 'localhost:9200/files/_suggest?pretty=true'  -d '
    "filename_suggest" : 
        "text" : "con",
        "completion": 
            "field": "filename_suggest", "size": 10
        
    
'

结果:


  "_shards" : 
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  ,
  "filename_suggest" : [ 
    "text" : "con",
    "offset" : 0,
    "length" : 3,
    "options" : [ 
      "text" : "Connect",
      "score" : 2.0,
      "payload":
    , 
      "text" : "Concept",
      "score" : 1.0,
      "payload":
    , 
      "text" : "ConceptSpace",
      "score" : 1.0,
      "payload":
    , 
      "text" : "ConnectBlue",
      "score" : 1.0,
      "payload":
    , 
      "text" : "Contour21",
      "score" : 1.0,
      "payload":
     ]
   ]

【讨论】:

以上是关于Elasticsearch 自动完成或按令牌自动建议的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch Suggester API(自动补全)

我应该自己保存firebase令牌还是在注册远程通知时自动完成?

CSRF 令牌丢失或不正确 - 在 Django 中使用自动完成灯

Elasticsearch:构建自动补全功能 - Autocomplete

如何在 Swift 中创建自动完成文本字段

如何在 wordpress 中创建自动完成文本框?