Elasticsearch：Completion suggester - 实现 Search-As-You-Type

Posted 2021-05-21 Elastic 中国社区官方博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch：Completion suggester - 实现 Search-As-You-Type相关的知识，希望对你有一定的参考价值。

Completion suggester 提供自动完成/输入时搜索功能。这是一个导航功能，可以在用户键入内容时指导他们获得相关结果，从而提高搜索精度。它并不用于拼写纠正或诸如为 term 及 phrase 提供建议。对于 suggester，请你阅读我之前的文章：

理想情况下，自动完成功能应与用户键入的速度一样快，以提供与用户已经键入的内容相关的即时反馈。因此，completion suggester 已针对速度进行了优化。该建议使用的数据结构可实现快速查找，但构建成本很高，并且存储在内存中。它仅基于前缀工作。存储为特殊的数据结构在内存中以提高速度。

在今天的练习中，我将使用一个实例来展示如何使用 completion suggester。

例子

创建索引 mapping

我们首先来创建一个叫做 test 的索引：

PUT test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "suggest": {
        "type": "completion"
      }
    }
  }
}

在上面，我们创建的 test 索引具有两个字段：name 及 suggest。这两个名字可以是任意的。

导入实验数据

我们执行如下的命令来导入一些使用数据：

POST test/_doc
{
  "name": "Pitch Fork",
  "suggest": ["Pitch", "Fork"]
}

POST test/_doc
{
  "name": "Spading Fork",
  "suggest": ["Spading", "Fork"]
}

POST test/_doc
{
  "name": "Fountain",
  "suggest": ["Fountain"]
}

这样我们就创建了三个文档。我们可以使用如下的命令来查询三个文档：

GET test/_search

接下来，我们使用 suggest 来继续搜索：

GET test/_search
{
  "suggest": {
    "completer": {
      "prefix": "fo",
      "completion": {
        "field": "suggest"
      }
    }
  }
}

在上面 completer 可以是任意我们喜欢的名字。我们搜索 prefix 是 fo 的所有文档。我们针对字段 suggest 来做 completion。上面的命令返回结果：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "completer" : [
      {
        "text" : "fo",
        "offset" : 0,
        "length" : 2,
        "options" : [
          {
            "text" : "Fork",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "6WQfgnkBGKQL9OLXKyKE",
            "_score" : 1.0,
            "_source" : {
              "name" : "Pitch Fork",
              "suggest" : [
                "Pitch",
                "Fork"
              ]
            }
          },
          {
            "text" : "Fork",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "6mQfgnkBGKQL9OLX-SJO",
            "_score" : 1.0,
            "_source" : {
              "name" : "Spading Fork",
              "suggest" : [
                "Spading",
                "Fork"
              ]
            }
          },
          {
            "text" : "Fountain",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "62QggnkBGKQL9OLXsiIU",
            "_score" : 1.0,
            "_source" : {
              "name" : "Fountain",
              "suggest" : [
                "Fountain"
              ]
            }
          }
        ]
      }
    ]
  }
}

也就是说三个文档都被搜索到。当然，如果我们搜索：

GET test/_search
{
  "suggest": {
    "completer": {
      "prefix": "fon",
      "completion": {
        "field": "suggest"
      }
    }
  }
}

我们将搜索不到任何的文档，这是因为没有文档含有 fon 开头的术语。

在上面的返回结果中，我们可以看到有两个文档含有 fork。它们在搜索的时候都被返回。如果我们只想有一个文档返回的话，我们可以这样搜索：

GET test/_search
{
  "suggest": {
    "completer": {
      "prefix": "fo",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

在上面，我们添加了 "skip_duplicates": true。执行上面的命令显示：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "completer" : [
      {
        "text" : "fo",
        "offset" : 0,
        "length" : 2,
        "options" : [
          {
            "text" : "Fork",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "6WQfgnkBGKQL9OLXKyKE",
            "_score" : 1.0,
            "_source" : {
              "name" : "Pitch Fork",
              "suggest" : [
                "Pitch",
                "Fork"
              ]
            }
          },
          {
            "text" : "Fountain",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "62QggnkBGKQL9OLXsiIU",
            "_score" : 1.0,
            "_source" : {
              "name" : "Fountain",
              "suggest" : [
                "Fountain"
              ]
            }
          }
        ]
      }
    ]
  }
}

在这一次的搜索中，我们只看到两个结果。其中 fork 的文档只有一个。如果你不想要 _source 字段的返回，你可以这么做：

GET test/_search
{
  "_source": false, 
  "suggest": {
    "completer": {
      "prefix": "fo",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

这样返回的结果就是：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "completer" : [
      {
        "text" : "fo",
        "offset" : 0,
        "length" : 2,
        "options" : [
          {
            "text" : "Fork",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "6WQfgnkBGKQL9OLXKyKE",
            "_score" : 1.0
          },
          {
            "text" : "Fountain",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "62QggnkBGKQL9OLXsiIU",
            "_score" : 1.0
          }
        ]
      }
    ]
  }
}

在上面，它显示了 Fork 以及 Fountain 两个 text。

在实际的使用中，有时我们输入时可能会输错，那么我们怎么办呢？比如：

GET test/_search
{
  "_source": false, 
  "suggest": {
    "completer": {
      "prefix": "foe",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

在上面，我们搜索 “foe”，但是我们没有搜索到任何的结果，这是因为没有术语是以 foe 开头的。在实际的使用中针对这种情况，我们可以使用 fuzzy 选项：

GET test/_search
{
  "_source": false, 
  "suggest": {
    "completer": {
      "prefix": "foe",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true,
        "fuzzy": {
          "fuzziness": "auto"
        }
      }
    }
  }
}

在上面，我使用了 fuzzy 选项。在这里面的 fuzziness 我设置为 auto。如果大家对这个 auto 还不是很清楚的话，请参阅我之前的文章 “fuzzy 搜索（模糊搜索）”。重新执行上面的命令：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "completer" : [
      {
        "text" : "foe",
        "offset" : 0,
        "length" : 3,
        "options" : [
          {
            "text" : "Fork",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "6WQfgnkBGKQL9OLXKyKE",
            "_score" : 2.0
          },
          {
            "text" : "Fountain",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "62QggnkBGKQL9OLXsiIU",
            "_score" : 2.0
          }
        ]
      }
    ]
  }
}

这次我们又重新看到两个文档了。显然在有一个错误输入的情况下，我们还是可以搜索到想要的文档。

以上是关于Elasticsearch：Completion suggester - 实现 Search-As-You-Type的主要内容，如果未能解决你的问题，请参考以下文章