Elasticsearch 自动完成或按令牌自动建议
Posted
技术标签:
【中文标题】Elasticsearch 自动完成或按令牌自动建议【英文标题】:Elasticsearch autocomplete or autosuggest by token 【发布时间】:2014-08-15 00:12:38 【问题描述】:我想就如何根据标记完成一个术语提出建议,类似于 google 之类的自动完成功能,但仅使用一个标记或单词。
我想搜索将被标记的文件名。例如。 “BRAND_Connect_A1233.jpg”被标记为“brand”、“connect”、“a1234”和“jpg”。
现在我想请教一些建议,例如“骗局”。 该建议应提供完整的匹配令牌,而不是完整的文件名:
连接 轮廓 概念 ...“A12”的建议应该是“A1234”、“A1233”、“A1233”...
示例
使用查询、构面和过滤器可以正常工作。
首先我创建了一个包含分词器和过滤器的映射:
curl -XPUT 'localhost:9200/files/?pretty=1' -d '
"settings" :
"analysis" :
"analyzer" :
"filename_search" :
"tokenizer" : "filename",
"filter" : ["lowercase"]
,
"filename_index" :
"tokenizer" : "filename",
"filter" : ["lowercase","edge_ngram"]
,
"tokenizer" :
"filename" :
"pattern" : "[^[;_\\.\\/]\\d]+",
"type" : "pattern"
,
"filter" :
"edge_ngram" :
"side" : "front",
"max_gram" : 20,
"min_gram" : 2,
"type" : "edgeNGram"
,
"mappings" :
"file" :
"properties" :
"filename" :
"type" : "string",
"search_analyzer" : "filename_search",
"index_analyzer" : "filename_index"
'
两种分析器都运行良好:
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'
现在我添加了一些示例数据
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConnectBlue_A1234.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_Connect_A1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConceptSpace_A1244.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Connect_A1222.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Concept_A1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Connect_B1234_.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Contour21_B1233.jpg"'
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_ConceptCube_B2233.jpg"'
curl -X POST "localhost:9200/files/_refresh"
获得所需建议的各种方法都无法达到预期结果。我曾尝试命名分析器并尝试分析器和通配符的各种组合。
curl -XGET 'localhost:9200/files/_suggest?pretty=true' -d '
"text" : "con",
"simple_phrase" :
"phrase" :
"field" : "filename",
"size" : 15,
"real_word_error_likelihood" : 0.75,
"max_errors" : 0.1,
"gram_size" : 3
'
curl -XGET 'localhost:9200/files/_suggest?pretty=true' -d '
"my-suggestion" :
"text" : "con",
"term" :
"field" : "filename",
"analyzer": "filename_index"
'
【问题讨论】:
【参考方案1】:您需要添加一个特殊的映射来使用完成建议,如文档 in the official ElasticSearch docs 所述。我已经修改了你的例子来展示它是如何工作的。
首先创建索引。注意filename_suggest
映射。
curl -XPUT 'localhost:9200/files/?pretty=1' -d '
"settings" :
"analysis" :
"analyzer" :
"filename_search" :
"tokenizer" : "filename",
"filter" : ["lowercase"]
,
"filename_index" :
"tokenizer" : "filename",
"filter" : ["lowercase","edge_ngram"]
,
"tokenizer" :
"filename" :
"pattern" : "[^[;_\\.\\/]\\d]+",
"type" : "pattern"
,
"filter" :
"edge_ngram" :
"side" : "front",
"max_gram" : 20,
"min_gram" : 2,
"type" : "edgeNGram"
,
"mappings" :
"file" :
"properties" :
"filename" :
"type" : "string",
"analyzer": "filename_index",
"search_analyzer" : "filename_search"
,
"filename_suggest":
"type": "completion",
"analyzer": "simple",
"search_analyzer": "simple",
"payloads": true
'
添加一些数据。请注意 filename_suggest
如何具有 input
字段,其中包含要匹配的关键字。
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConnectBlue_A1234.jpg", "filename_suggest": "input": ["BRAND", "ConnectBlue", "A1234", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_Connect_A1233.jpg", "filename_suggest": "input": ["BRAND", "Connect", "A1233", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "BRAND_ConceptSpace_A1244.jpg", "filename_suggest": "input": ["BRAND", "ConceptSpace", "A1244", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Connect_A1222.jpg", "filename_suggest": "input": ["COMPANY", "Connect", "A1222", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "COMPANY_Concept_A1233.jpg", "filename_suggest": "input": ["COMPANY", "Concept", "A1233", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Connect_B1234_.jpg", "filename_suggest": "input": ["DEALER", "Connect", "B1234", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_Contour21_B1233.jpg", "filename_suggest": "input": ["DEALER", "Contour21", "B1233", "jpg"], "payload": '
curl -X POST "localhost:9200/files/file" -d ' "filename" : "DEALER_ConceptCube_B2233.jpg", "filename_suggest": "input": ["DEALER", "ConceptCube", "B2233", "jpg"], "payload": '
curl -X POST "localhost:9200/files/_refresh"
现在执行查询:
curl -XPOST 'localhost:9200/files/_suggest?pretty=true' -d '
"filename_suggest" :
"text" : "con",
"completion":
"field": "filename_suggest", "size": 10
'
结果:
"_shards" :
"total" : 5,
"successful" : 5,
"failed" : 0
,
"filename_suggest" : [
"text" : "con",
"offset" : 0,
"length" : 3,
"options" : [
"text" : "Connect",
"score" : 2.0,
"payload":
,
"text" : "Concept",
"score" : 1.0,
"payload":
,
"text" : "ConceptSpace",
"score" : 1.0,
"payload":
,
"text" : "ConnectBlue",
"score" : 1.0,
"payload":
,
"text" : "Contour21",
"score" : 1.0,
"payload":
]
]
【讨论】:
以上是关于Elasticsearch 自动完成或按令牌自动建议的主要内容,如果未能解决你的问题,请参考以下文章
Elasticsearch Suggester API(自动补全)
我应该自己保存firebase令牌还是在注册远程通知时自动完成?
CSRF 令牌丢失或不正确 - 在 Django 中使用自动完成灯