Elasticsearch：计数分词中的 token

Posted 2023-02-01 Elastic 中国社区官方博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch：计数分词中的 token相关的知识，希望对你有一定的参考价值。

在我们针对 text 类型的字段进行分词时，分词器会把该字段分解为一个个的 token。如果你对分词器还不是很理解的话，请参考我之前的文章 “Elasticsearch: analyzer”。在分词时，有一个叫做 token_count 的类型。该类型是 token 的计数器，也就是说，我们可以使用它来了解在索引字段时在字符串中生成的 token 数量。

我们下面用一个比较简单的例子来进行展示。在我们的示例中，我们将索引一些书名，并且我们将过滤标题中只有 2 个 token 的书。

PUT book_token_count_test

  "mappings": 
    "properties": 
      "book_name": 
        "type": "text",
        "fields": 
          "size": 
            "type": "token_count",
            "analyzer": "standard"

我们使用如下的命令来写入一下文档：

POST book_token_count_test/_bulk
"index":
 "book_name": "Ulysses" 
"index":
 "book_name": "Don Quixote" 
"index":
 "book_name": "One Hundred Years of Solitude"

我们使用如下的命令来搜索 token 数为 2 的文档：

GET book_token_count_test/_search

  "query": 
    "term": 
      "book_name.size": 
        "value": "2"

上面搜索的结果为：


  "took": 273,
  "timed_out": false,
  "_shards": 
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  ,
  "hits": 
    "total": 
      "value": 1,
      "relation": "eq"
    ,
    "max_score": 1,
    "hits": [
      
        "_index": "book_token_count_test",
        "_id": "cxczBoYB6OPboMnB7TQu",
        "_score": 1,
        "_source": 
          "book_name": "Don Quixote"
        
      
    ]

我们可以使用 range 查询来检索 book_name 中包含 3 个以上 token 的文档，我们只会得到标题为 “One Hundred Years of Solitude” 的文档。

GET book_token_count_test/_search

  "query": 
    "range": 
      "book_name.size": 
        "gte": 3

上面搜索的结果为：


  "took": 1,
  "timed_out": false,
  "_shards": 
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  ,
  "hits": 
    "total": 
      "value": 1,
      "relation": "eq"
    ,
    "max_score": 1,
    "hits": [
      
        "_index": "book_token_count_test",
        "_id": "dBczBoYB6OPboMnB7TQu",
        "_score": 1,
        "_source": 
          "book_name": "One Hundred Years of Solitude"
        
      
    ]

希望这个小小的建议能帮助到你的工作！

以上是关于Elasticsearch：计数分词中的 token的主要内容，如果未能解决你的问题，请参考以下文章

Elasticsearch：计数分词中的 token

Elasticsearch中的分词器比较及使用方法

Elasticsearch：分词器中的 token 过滤器使用示例

Docker中的elasticsearch安装ik分词器